CN105869117A - Method for accelerating GPU directed at deep learning super-resolution technology - Google Patents

Method for accelerating GPU directed at deep learning super-resolution technology Download PDF

Info

Publication number
CN105869117A
CN105869117A CN201610184129.1A CN201610184129A CN105869117A CN 105869117 A CN105869117 A CN 105869117A CN 201610184129 A CN201610184129 A CN 201610184129A CN 105869117 A CN105869117 A CN 105869117A
Authority
CN
China
Prior art keywords
gpu
convolution
super
degree
resolution technique
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610184129.1A
Other languages
Chinese (zh)
Other versions
CN105869117B (en
Inventor
宋利
赵章宗
解蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610184129.1A priority Critical patent/CN105869117B/en
Publication of CN105869117A publication Critical patent/CN105869117A/en
Application granted granted Critical
Publication of CN105869117B publication Critical patent/CN105869117B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Abstract

The invention discloses a method for accelerating a GPU directed at a deep learning super-resolution technology. The method conducts concurrent processing on all the steps of a super-resolution technology which is based on deep learning and a convolutional neural network, and operates on a GPU. The concurrent processing refers to conducting concurrent task dividing on convolutions of the super-resolution technology which is based on deep learning and the convolutional neural network into millions of micro-tasks which are irrelevant to one another and can be concurrently executed in any order so as to fully exhibit the super-strong calculating capability of the GPU. Further, the method uses features of a GPU memory to cache convolution nuclear data and input image data to a shared memory and a register so as to substantially optimize calculating speeds of the convolutions. The method integrates the convolutions and non-linear layers. The method selects the optimal method for the sizes of different convolutions. According to the invention, the method accelerates the high quality super-resolution method to meet velocity requirements for processing videos, and does not cause any image quality loss.

Description

A kind of GPU accelerated method for degree of depth study super-resolution technique
Technical field
The present invention relates to a kind of image super-resolution field and the method for GPU acceleration, specifically a kind of for degree of depth study The GPU accelerated method of super-resolution technique.
Background technology
One secondary low-resolution image is converted to high-definition picture by image super-resolution exactly, and it is in post processing of image and regards Frequently non-linear editing has a wide range of applications.Super-resolution method (such as bicubic) in early days is often based upon simple inserting Value, can work with fast and reliable, be also easy to integrated chip;But the high-definition picture quality that these methods obtain is not Good, significant artificial trace can be produced, such as ring, aliasing, the effect such as fuzzy.So super-resolution method of quality is difficult to Meet current high quality video requirement.The super-resolution method of current performance advanced person can generate high-quality image, but accompanies Along with huge computing cost, it is difficult to meet reality application needs.There is the super-resolution method that some GPU accelerate at present, These methods have reached the sufficiently fast speed of service, but also sacrifice the running quality of method.
Degree of depth learning art has obtained huge advance in recent years, and Computer Vision Recognition accuracy rate is obviously improved, based on the degree of depth Study, the super-resolution technique of convolutional neural networks are also arisen at the historic moment, and are published in European Computer visual conference in 2014 Super-resolution method based on convolutional neural networks (Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang.Learning a Deep Convolutional Network for Image Super-Resolution,in Proceedings Of European Conference on Computer Vision (ECCV), 2014, pp.184-199, referred to as SRCNN) It it is one of performance the best way.By well-designed 3 layers of convolution and 2 layers of RELU (non-linear layer), instruction of magnanimity Practicing data, ingenious careful training parameter fine setting, SRCNN was once becoming the super-resolution method that performance is optimal.So And the method depends on the computing cost of flood tide, using CPU to perform each frame of the method needs 300 seconds (1920*1080 To 3840*2160, single channel, the most all tests are all based on this resolution), even if using based on GEMM GPU convolution, accelerated method, each frame is also required to close to 1 second, it is impossible to meet reality application needs.
Summary of the invention
For making to disclosure satisfy that actual application needs, the present invention based on degree of depth study, the super-resolution technique of convolutional neural networks A kind of GPU accelerated method for degree of depth study super-resolution technique is provided.
For achieving the above object, the GPU accelerated method for degree of depth study super-resolution technique of the present invention, institute Method of stating is by based on degree of depth study, the most all parallelizations of super-resolution technique of convolutional neural networks, and at GPU Run.Parallelization of the present invention is to carry out convolution based on degree of depth study, the super-resolution technique of convolutional neural networks Parallel task divides, convolution operation is divided into millions of orthogonal, can be with micro-of random order executed in parallel Business, so that the superpower computation capability of GPU is played.
Further, in described method: carry out task division according to convolution output pixel, the calculating of each output pixel is appointed Business is assigned to a micro-task, thus convolution task can be performed in parallel on a large scale, and the meter of neighbor The data calculating the dependence of micro-task are also adjacent, and perfection reaches to merge and accesses, thus makes full use of the video memory bit wide of GPU And bandwidth.
Further, in described method: utilize the cache sharing memorizer as convolution kernel parameter, thus reduce complete Intra-office is deposited I/O and accelerates convolution.Specifically, first by concurrent thread block, convolution kernel parameter read sharing of thread block In memorizer, the most each thread again from this shared memorizer obtain needed for convolution kernel parameter.The method can reduce GPU reads the global memory's handling capacity needed for convolution kernel parameter, thus is greatly optimized, accelerates the execution speed of convolution.
Further, in described method: utilize the cache sharing memorizer or depositor as input picture, thus Reduce global memory I/O and accelerate convolution.Specifically, first find out the input picture region that concurrent thread block is relied on, Then, during this area data is read the shared memorizer of thread block by thread block, last each thread is from this shared memorizer Input image data needed for acquisition;If also or time every thread required input view data is sufficiently small, the most once by institute Need data to be read in the depositor in thread, then calculate.The method can reduce GPU and read input picture Required global memory's handling capacity, thus it is greatly optimized, accelerates the execution speed of convolution.
Further, in described method: use deep neural network GPU speed technology, merge convolution algorithm and non-thread Property computing, the method can reduce the global memory's handling capacity needed for convolution, non-linear layer, thus accelerates whole process Perform speed.Specifically, described deep neural network GPU speed technology, refer to: by the processing procedure of non-linear layer Merging in convolutional calculation, convolutional calculation carries out non-linear layer computing after completing the most in a register, thus eliminates one The wheel I/O to global memory.
Further, in described method: use degree of depth convolutional network GPU speed technology, for different convolution size choosings Take optimum optimization method.Described degree of depth convolutional network GPU speed technology, refers to: to different size convolutional layer, test Each optimizes speed technology, and then selects the fastest speed technology with the method obtaining the fastest overall operation speed.
Compared with prior art, the present invention has a following significant advantage:
The present invention is directed to convolution carry out parallelization and optimize acceleration, accelerate to meet by a high-quality super-resolution method The rate request of Video processing, and any image quality loss will not be brought.
Further, carry out task according to output pixel and divide realization, thus realize the parallelization of convolution;Will be based on the degree of depth Study, the most all parallelizations of super-resolution technique of convolutional neural networks such that it is able to utilize GPU superpower Computation capability;
Further, utilize the characteristic of GPU bin, be cached to convolution kernel data and input image data share and deposit Reservoir and depositor, thus the calculating speed of convolution is greatly optimized;
Further, merge convolution and non-linear layer, choose optimum optimization for different convolution sizes.
The present invention takes full advantage of the hardware of GPU and stores characteristic, significantly accelerates the calculating speed of convolution, so that One high-quality super-resolution method can run to meet the speed of real work requirement.
Accompanying drawing explanation
The detailed description with reference to the following drawings, non-limiting example made by reading, other features of the present invention, mesh And advantage will become apparent from:
Fig. 1 is the flow chart of SRCNN;
Fig. 2 is the schematic diagram of convolution parallelization in the present invention one preferred embodiment;
Fig. 3 is the schematic diagram in the present invention one preferred embodiment by shared memory cache convolution kernel parameter improvement;
Fig. 4 is the schematic diagram improved by shared memory cache input picture block in the present invention one preferred embodiment;
Fig. 5 is by merging convolution and the schematic diagram of NONLINEAR CALCULATION improvement in the present invention one preferred embodiment.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is described in detail.Following example will assist in those skilled in the art Member is further appreciated by the present invention, but limits the present invention the most in any form.It should be pointed out that, the common skill to this area For art personnel, without departing from the inventive concept of the premise, it is also possible to make some deformation and improvement.These broadly fall into Protection scope of the present invention.
As it is shown in figure 1, the flow chart of SRCNN.As one embodiment of the present invention, super-resolution GPU of the present invention Speed technology for SRCNN, its flow chart as it is shown in figure 1, contain bicubic pretreatment (not shown), three Individual convolutional layer and two RELU layers (respectively after first convolution and second convolution).Three convolutional layer sizes are divided It is not (according to output channel * convolution width * convolution height * input channel): 64*9*9*1,32*1*1*64,1*5*5*32. In 1080P to a 4K image super-resolution, required floating number multiply-add operation amount is 66.6G, required storage I/O is 800GBytes.The biggest amount of calculation obviously cannot calculate the time meeting real work, production by CPU Requirement.Therefore for this kind of situation, the present invention uses GPU process, by each step of SRCNN flow process Parallelization, GPU realize, and make full use of GPU hardware characteristic and be optimized and accelerate.
The present invention carries out parallelization for convolution, optimizes and accelerate, this is because bicubic pretreatment computing cost is very emphatically GPU low, easy realizes, and the parallelization of non-linear layer RELU simultaneously realizes being it is clear that time more than 95% Expense is all in convolution.
For understanding how adaptation enters GPU to SRCNN method, and how to optimize GPU concurrent program, first introduce GPU Framework.Due to the restriction of physical factor, in the past few years the operating frequency of processor cannot be substantially improved, and computer industry is led to Crossing the core amounts lifting computing capability increasing processor, typical product has multi-core central processing unit (CPU) and has The graphic process unit (GPU) of numerous cores.Wherein GPU has thousands of computing units and the video memory of ultra high bandwidth, example As Nvidia GTX 980TI has global memory's bandwidth of 2816 CUDA cores and 336GB/s.If by one Individual mass computing task is divided into tens thousand of or even millions of micro-tasks, then gives GPU when process, and GPU can be by This task scheduling slightly distributes to these CUDA cores, and numerous CUDA cores can concomitantly, process efficiently Micro-task, so that GPU performs speed and reaches the hundreds times of CPU.
GPU has the bin mechanism of a stratification, and the global memory (global of GPU is employed herein Memory), memorizer (shared memory) and depositor (register) are shared.The access bandwidth of this three classes bin, Time delay, capacity and addressable unit have the biggest difference.Global memory can be had the biggest by all thread accesses Capacity (number GB), but access bandwidth is minimum, often becomes the bottleneck of whole process.Shared memorizer is a kind of by journey The Cache that sequence person controls, in GPU, whole computing unit is divided into several thread block, has in each thread block A number of thread and an independent shared memorizer, this shared memorizer can be by all threads in this thread block Access, have the highest access bandwidth and relatively low time delay.Register-bit, inside each thread, has the highest bandwidth With minimum time delay, but capacity is the least, storage will be greatly reduced in a register if the data of Reusability are stored The expense accessed.
The GPU that the present invention illustrates accelerates, in SRCNN technology, first to transfer to video memory by input image data from internal memory, Carry out bicubic pretreatment, carry out the most successively ground floor convolution (conv1), relu, second layer convolution (conv2), relu, Then data are transferred to internal memory by third layer convolution (conv3) from video memory.Carrying out each layer of convolution when, all adopt With the parallelization carrying out task division according to output pixel such that it is able to utilize the computation capability that GPU is superpower; In order to further speed up the calculating speed of convolution, the present invention proposes to use shared memory cache convolution kernel data, make With shared memorizer or register cache input picture blocks of data, merge convolution and nonlinear operation;Further, The present invention is directed to different size of convolution, test the execution speed of different convolution method, have chosen the combination of prestissimo So that whole flow process is the fastest.Key technology details of the present invention is as follows.
In a preferred embodiment, in order to make convolution parallelization, convolution task is divided into number according to output pixel by the present invention Million micro-tasks, referred to as convolution direct computing method (direct GPU implementation of convolution), such as Fig. 2 Shown in.Whole convolution seeks to calculate the value of each output pixel, and the calculating task of the most each output pixel can conduct One independent micro-task is assigned on a GPU thread perform, this between task be slightly independent, can be concurrent , without be in communication with each other.This dividing mode additional advantage is that the adjacent thread accesses concurrently performed Input image data is also adjacent, such as thread (x, y) access I (a, b) while, thread (x+1, y) also access I (a+1, b), therefore can automatically be merged into by GPU hardware and once access, thus makes full use of GPU by these access request Video memory bit wide and bandwidth.The remainder (relu) of SRCNN is also parallelized so that whole SRCNN flow process all may be used To perform on GPU, it is to avoid data transfer between CPU/GPU repeatedly.By the parallelization of convolution, SRCNN Execution speed accelerate to 1 second/frame from 300 seconds/frame (use CPU).
Utilizing the mechanism of GPU stratification bin, convolution kernel data and input image data are cached to share by the present invention In memorizer or depositor, thus accelerate convolution 2 to 10 times.
In a preferred embodiment, the present invention is by taking convolution kernel data pre-head to shared memorizer, it is possible to save superfluous Remaining convolution kernel data global memory I/O expense, is referred to as sharing convolution Nuclear Data method (shared kernel), as shown in Figure 3. In above-mentioned convolution direct computing method, each thread have read identical convolution kernel data, redundancy repeat read Take and create substantial amounts of global memory I/O waste.In shared convolution Nuclear Data method, a thread block is first by convolution kernel Data pre-head takes to shared memorizer, the volume needed for then all threads obtain from this shared memorizer again in thread block Long-pending Nuclear Data.This shared memorizer is actually the cache of convolution kernel data, saves and reads convolution kernel in a large number Global memory's I/O expense of data.
In a preferred embodiment, the present invention is by extremely sharing input picture blocks of data pre-read with memorizer or depositor In, it is possible to save the input image data global memory I/O expense of redundancy, be referred to as sharing input picture block method (shared Patch) or the input picture block method (registered pixel) of register cache, as shown in Figure 4.Carry out wide or tall and big in 1 convolution time, adjacent output pixel depends on the input picture block overlapped each other.Convolution direct computing method does not accounts for This overlapping relation, therefore input image data is read in each thread redundantly, also brings global memory I/O Waste.When convolution width and height are bigger when, this I/O waste can become the most serious.Shared defeated in the present invention Enter in the input picture block method of Image module algorithm and register cache, first find the input picture block that a thread block is relied on Region, then this region is read into shared memorizer or depositor, and (only smaller in region, depositor can accommodate In the case of feasible), the most each thread obtains required input image data again from shared memorizer.At this moment share and deposit Reservoir or depositor are exactly the cache of input image data, save in a large amount of overall situation reading input image data Deposit I/O expense.
In a preferred embodiment, the present invention, by merging convolution and non-linear layer, eliminates the I/O expense of non-linear layer, As shown in Figure 5.Traditional acceleration to convolutional neural networks concentrates in the acceleration to convolution, this is because convolution is meter The bottleneck of evaluation time, the most also because non-linear layer is difficult to obtain acceleration.But, when convolution accelerates to the fastest degree Time, the time overhead of non-linear layer can not ignore.In convolutional neural networks, non-linear layer is always followed at volume After lamination, and each output pixel of non-linear layer depends only on an input pixel of correspondence.Therefore, the present invention Permeate a process by convolution and non-linear layer, and after performing convolution, output valve is carried out non-by each thread immediately Linear operation, is then written back global memory.Such way can be removed convolutional layer from and be write back global memory and non-linear layer reading The expense of global memory, this calculating time being equivalent to almost completely eliminate non-linear layer.
In a preferred embodiment, when the present invention is directed to operation that different size convolutional layer tests each optimization speed technology Between, and then select the fastest speed technology with the method obtaining the fastest overall operation speed, run time test result It is shown in Table 1.Method in wherein cuDNN is the convolution algorithm storehouse that Nvidia provides.Therefrom it will be seen that ground floor is rolled up The long-pending cuDNN of employing, second layer convolution use shares convolution nuclear parameter and the input picture block of register cache, third layer Convolution uses shares convolution nuclear parameter and during input picture block, and whole flow process can be accomplished the fastest, be finally reached 0.15 second/ Frame, this is 2000 times of CPU speed.
When table 1 uses each to optimize accelerated method, the operation time of each convolutional layer
In upper table: use Nvidia GTX980TI and two-way Intel E5-2697V2@2.7GHz 12 cores Processers, tests 1920*1080 to 3840*2160 single channel super-resolution.
To sum up, a high-quality super-resolution method is accelerated to meet the rate request of Video processing by the present invention, and Any image quality loss will not be brought.
Above the specific embodiment of the present invention is described.It is to be appreciated that the invention is not limited in Stating particular implementation, those skilled in the art can make various deformation or amendment within the scope of the claims, This has no effect on the flesh and blood of the present invention.

Claims (10)

1. the GPU accelerated method for degree of depth study super-resolution technique, it is characterised in that: described method is by base In degree of depth study, the most all parallelizations of the super-resolution technique of convolutional neural networks, and run at GPU;Institute Stating parallelization is that convolution based on degree of depth study, the super-resolution technique of convolutional neural networks is carried out parallel task division, Convolution operation is divided into millions of orthogonal, can be with micro-task of random order executed in parallel so that GPU Superpower computation capability is played.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 1, its feature exists In: in described method: carry out task division according to convolution output pixel, the calculating task of each output pixel is assigned to One micro-task, thus convolution task can be performed in parallel on a large scale, and the micro-task of calculating of neighbor depends on The data relied also are adjacent, and perfection reaches to merge and accesses, thus makes full use of video memory bit wide and the bandwidth of GPU.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 1, its feature exists In: in described method: utilize the cache sharing memorizer as convolution kernel parameter, thus reduce global memory I/O And accelerate convolution.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 3, its feature exists In: the memorizer cache as convolution kernel parameter is shared in described utilization, refers to: first by concurrent thread block by convolution Nuclear parameter reads in the shared memorizer of thread block, the most each thread again from this shared memorizer obtain needed for convolution Nuclear parameter.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 1, its feature exists In: in described method: utilize the cache sharing memorizer or depositor as input picture, thus reduce in the overall situation Deposit I/O and accelerate convolution.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 5, its feature exists In: memorizer or the depositor cache as input picture is shared in described utilization, refers to: first find out concurrent thread The input picture region that block is relied on, during then this area data is read the shared memorizer of thread block by thread block, Rear each thread input image data needed for this shared memorizer obtains;If also or every thread required input picture number According to time sufficiently small, the most once desired data be read in the depositor in thread, then calculate.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 1, its feature exists In: in described method: use deep neural network GPU speed technology, merge convolution algorithm and nonlinear operation, subtract Global memory's handling capacity needed for few convolution, non-linear layer, thus accelerate the execution speed of whole process.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 7, its feature exists In: described deep neural network GPU speed technology, refer to: the processing procedure of non-linear layer is merged in convolutional calculation In, convolutional calculation carries out non-linear layer computing after completing the most in a register, thus eliminates one and take turns global memory I/O。
9. according to the GPU accelerated method for degree of depth study super-resolution technique described in any one of claim 1-8, It is characterized in that: in described method: use degree of depth convolutional network GPU speed technology, choose for different convolution sizes Optimum optimization method.
GPU accelerated method for degree of depth study super-resolution technique the most according to claim 9, its feature It is: described degree of depth convolutional network GPU speed technology, refers to: to different size convolutional layer, test each optimization and add Speed technology, and then select the fastest speed technology with the method obtaining the fastest overall operation speed.
CN201610184129.1A 2016-03-28 2016-03-28 GPU acceleration method for deep learning super-resolution technology Expired - Fee Related CN105869117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610184129.1A CN105869117B (en) 2016-03-28 2016-03-28 GPU acceleration method for deep learning super-resolution technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610184129.1A CN105869117B (en) 2016-03-28 2016-03-28 GPU acceleration method for deep learning super-resolution technology

Publications (2)

Publication Number Publication Date
CN105869117A true CN105869117A (en) 2016-08-17
CN105869117B CN105869117B (en) 2021-04-02

Family

ID=56626131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610184129.1A Expired - Fee Related CN105869117B (en) 2016-03-28 2016-03-28 GPU acceleration method for deep learning super-resolution technology

Country Status (1)

Country Link
CN (1) CN105869117B (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106447609A (en) * 2016-08-30 2017-02-22 上海交通大学 Image super-resolution method based on depth convolutional neural network
CN106779057A (en) * 2016-11-11 2017-05-31 北京旷视科技有限公司 The method and device of the calculating binary neural network convolution based on GPU
CN107085827A (en) * 2017-04-27 2017-08-22 中国电子科技集团公司第二十八研究所 The super-resolution image recovery method realized based on hardware platform
CN107341127A (en) * 2017-07-05 2017-11-10 西安电子科技大学 Convolutional neural networks accelerated method based on OpenCL standards
CN107515736A (en) * 2017-07-01 2017-12-26 广州深域信息科技有限公司 A kind of method for accelerating depth convolutional network calculating speed on embedded device
WO2018068623A1 (en) * 2016-10-14 2018-04-19 腾讯科技(深圳)有限公司 Machine learning method and system
CN108012156A (en) * 2017-11-17 2018-05-08 深圳市华尊科技股份有限公司 A kind of method for processing video frequency and control platform
CN108052891A (en) * 2017-12-08 2018-05-18 触景无限科技(北京)有限公司 Facial contour parallel calculating method and device
CN108062532A (en) * 2017-12-28 2018-05-22 北京智慧眼科技股份有限公司 Deep learning recognition of face network optimized approach, device and storage medium
CN108073548A (en) * 2016-11-14 2018-05-25 耐能股份有限公司 Convolution algorithm device and convolution algorithm method
CN108268931A (en) * 2016-12-30 2018-07-10 华为技术有限公司 The methods, devices and systems of data processing
CN108268944A (en) * 2016-12-31 2018-07-10 上海兆芯集成电路有限公司 Neural network unit with the memory that can be remolded
TWI634490B (en) * 2016-11-14 2018-09-01 美商耐能股份有限公司 Convolution operation device and convolution operation method
CN108564524A (en) * 2018-04-24 2018-09-21 开放智能机器(上海)有限公司 A kind of convolutional calculation optimization method of visual pattern
WO2018196863A1 (en) * 2017-04-28 2018-11-01 北京市商汤科技开发有限公司 Convolution acceleration and calculation processing methods and apparatuses, electronic device and storage medium
CN109165723A (en) * 2018-08-03 2019-01-08 北京字节跳动网络技术有限公司 Method and apparatus for handling data
CN109409513A (en) * 2018-10-10 2019-03-01 广州市百果园信息技术有限公司 A kind of task processing method neural network based and relevant device
CN109740731A (en) * 2018-12-15 2019-05-10 华南理工大学 A kind of adaptive convolutional layer hardware accelerator design method
CN109754073A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Data processing method, device, electronic equipment and readable storage medium storing program for executing
CN109886407A (en) * 2019-02-27 2019-06-14 上海商汤智能科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN110009644A (en) * 2019-03-26 2019-07-12 深兰科技(上海)有限公司 A kind of method and apparatus of characteristic pattern row pixel segmentation
CN110084361A (en) * 2017-10-30 2019-08-02 上海寒武纪信息科技有限公司 A kind of arithmetic unit and method
CN110188863A (en) * 2019-04-30 2019-08-30 杭州电子科技大学 A kind of convolution kernel and its compression algorithm of convolutional neural networks
CN110321998A (en) * 2018-03-31 2019-10-11 北京深鉴智能科技有限公司 Convolutional neural networks implementation method, device, acceleration equipment, storage medium
CN110399883A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 Image characteristic extracting method, device, equipment and computer readable storage medium
US10497258B1 (en) 2018-09-10 2019-12-03 Sony Corporation Vehicle tracking and license plate recognition based on group of pictures (GOP) structure
CN110633785A (en) * 2018-06-21 2019-12-31 清华大学 Method and system for calculating convolutional neural network
WO2020177250A1 (en) * 2019-03-06 2020-09-10 上海熠知电子科技有限公司 Data reading system and method
CN111914985A (en) * 2019-05-10 2020-11-10 杭州海康威视数字技术股份有限公司 Configuration method and device of deep learning network model and storage medium
WO2021047118A1 (en) * 2019-09-12 2021-03-18 浪潮电子信息产业股份有限公司 Image processing method, device and system
US20210118095A1 (en) * 2019-10-17 2021-04-22 Samsung Electronics Co., Ltd. Image processing apparatus and method
CN113286174A (en) * 2021-05-21 2021-08-20 浙江商汤科技开发有限公司 Video frame extraction method and device, electronic equipment and computer readable storage medium
CN113806044A (en) * 2021-08-31 2021-12-17 天津大学 Heterogeneous platform task bottleneck elimination method for computer vision application
CN114445687A (en) * 2021-12-31 2022-05-06 苏州浪潮智能科技有限公司 Image identification reasoning method, system, storage medium and equipment
WO2022121474A1 (en) * 2020-12-11 2022-06-16 苏州浪潮智能科技有限公司 Method and system for optimizing convolutional residual structure of neural network, device, and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140085318A1 (en) * 2012-09-26 2014-03-27 Siemens Corporation Multi-GPU FISTA Implementation for MR Reconstruction with Non-Uniform K-Space Sampling
CN104778659A (en) * 2015-04-15 2015-07-15 杭州电子科技大学 Single-frame image super-resolution reconstruction method on basis of deep learning
CN105279741A (en) * 2015-11-17 2016-01-27 集美大学 Image super-resolution reconstruction method and system based on graph-cut algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140085318A1 (en) * 2012-09-26 2014-03-27 Siemens Corporation Multi-GPU FISTA Implementation for MR Reconstruction with Non-Uniform K-Space Sampling
CN104778659A (en) * 2015-04-15 2015-07-15 杭州电子科技大学 Single-frame image super-resolution reconstruction method on basis of deep learning
CN105279741A (en) * 2015-11-17 2016-01-27 集美大学 Image super-resolution reconstruction method and system based on graph-cut algorithm

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
LINGQI ZHANG等: ""High accuracy digital image correlation powered by GPU-based parallel computing"", 《OPTICS AND LASERS IN ENGINEERING》 *
LIUBOV A. FLORES等: ""Parallel CT image reconstruction based on GPUs"", 《RADIATION PHYSICS AND CHEMISTRY》 *
刘进锋 等: ""一种简洁高效的加速卷积神经网络的方法"", 《科学技术与工程》 *
李佳骏 等: ""基于GPU 的HOTPANTS 并行优化"", 《天文研究与技术》 *
李大霞: ""cuda-convnet深层卷积神经网络算法的一种速度优化"", 《中国优秀硕士学位论文全文数据库 信息科技辑(月刊 )》 *
胡传平等: ""基于深度学习的图像超分辨率算法研究"", 《铁道警察学院学报》 *
金鹭: ""基于GPU的表面形貌测量系统的研究"", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 *
陈湘骥等: ""基于GPU 加速的实时视频超分辨率重建"", 《计算机应用》 *
马永军 等: ""面向CPU+GPU 异构平台的模板匹配目标识别并行算法"", 《天津科技大学学报》 *

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106447609A (en) * 2016-08-30 2017-02-22 上海交通大学 Image super-resolution method based on depth convolutional neural network
WO2018068623A1 (en) * 2016-10-14 2018-04-19 腾讯科技(深圳)有限公司 Machine learning method and system
CN106779057A (en) * 2016-11-11 2017-05-31 北京旷视科技有限公司 The method and device of the calculating binary neural network convolution based on GPU
CN106779057B (en) * 2016-11-11 2020-04-17 北京旷视科技有限公司 Method and device for calculating binary neural network convolution based on GPU
CN108073548B (en) * 2016-11-14 2021-09-10 耐能股份有限公司 Convolution operation device and convolution operation method
CN108073548A (en) * 2016-11-14 2018-05-25 耐能股份有限公司 Convolution algorithm device and convolution algorithm method
US10936937B2 (en) 2016-11-14 2021-03-02 Kneron, Inc. Convolution operation device and convolution operation method
TWI634490B (en) * 2016-11-14 2018-09-01 美商耐能股份有限公司 Convolution operation device and convolution operation method
CN108268931A (en) * 2016-12-30 2018-07-10 华为技术有限公司 The methods, devices and systems of data processing
CN108268944A (en) * 2016-12-31 2018-07-10 上海兆芯集成电路有限公司 Neural network unit with the memory that can be remolded
CN108268944B (en) * 2016-12-31 2020-09-11 上海兆芯集成电路有限公司 Neural network unit with remodelable memory
CN107085827A (en) * 2017-04-27 2017-08-22 中国电子科技集团公司第二十八研究所 The super-resolution image recovery method realized based on hardware platform
CN107085827B (en) * 2017-04-27 2020-06-16 中国电子科技集团公司第二十八研究所 Super-resolution image restoration method based on hardware platform
WO2018196863A1 (en) * 2017-04-28 2018-11-01 北京市商汤科技开发有限公司 Convolution acceleration and calculation processing methods and apparatuses, electronic device and storage medium
US11429852B2 (en) 2017-04-28 2022-08-30 Beijing Sensetime Technology Development Co., Ltd. Convolution acceleration and computing processing method and apparatus, electronic device, and storage medium
CN107515736B (en) * 2017-07-01 2021-01-15 广州深域信息科技有限公司 Method for accelerating computation speed of deep convolutional network on embedded equipment
CN107515736A (en) * 2017-07-01 2017-12-26 广州深域信息科技有限公司 A kind of method for accelerating depth convolutional network calculating speed on embedded device
CN107341127A (en) * 2017-07-05 2017-11-10 西安电子科技大学 Convolutional neural networks accelerated method based on OpenCL standards
CN107341127B (en) * 2017-07-05 2020-04-14 西安电子科技大学 Convolutional neural network acceleration method based on OpenCL standard
CN110084361A (en) * 2017-10-30 2019-08-02 上海寒武纪信息科技有限公司 A kind of arithmetic unit and method
CN110084361B (en) * 2017-10-30 2021-03-23 上海寒武纪信息科技有限公司 Arithmetic device and method
US11922132B2 (en) 2017-10-30 2024-03-05 Shanghai Cambricon Information Technology Co., Ltd. Information processing method and terminal device
CN108012156A (en) * 2017-11-17 2018-05-08 深圳市华尊科技股份有限公司 A kind of method for processing video frequency and control platform
CN108052891A (en) * 2017-12-08 2018-05-18 触景无限科技(北京)有限公司 Facial contour parallel calculating method and device
CN108062532A (en) * 2017-12-28 2018-05-22 北京智慧眼科技股份有限公司 Deep learning recognition of face network optimized approach, device and storage medium
CN110321998A (en) * 2018-03-31 2019-10-11 北京深鉴智能科技有限公司 Convolutional neural networks implementation method, device, acceleration equipment, storage medium
CN110321998B (en) * 2018-03-31 2022-06-14 赛灵思公司 Convolutional neural network implementation method and device, acceleration equipment and storage medium
CN108564524A (en) * 2018-04-24 2018-09-21 开放智能机器(上海)有限公司 A kind of convolutional calculation optimization method of visual pattern
CN110633785B (en) * 2018-06-21 2021-01-05 清华大学 Method and system for calculating convolutional neural network
CN110633785A (en) * 2018-06-21 2019-12-31 清华大学 Method and system for calculating convolutional neural network
CN109165723B (en) * 2018-08-03 2021-03-19 北京字节跳动网络技术有限公司 Method and apparatus for processing data
CN109165723A (en) * 2018-08-03 2019-01-08 北京字节跳动网络技术有限公司 Method and apparatus for handling data
US10497258B1 (en) 2018-09-10 2019-12-03 Sony Corporation Vehicle tracking and license plate recognition based on group of pictures (GOP) structure
CN109409513A (en) * 2018-10-10 2019-03-01 广州市百果园信息技术有限公司 A kind of task processing method neural network based and relevant device
CN109740731A (en) * 2018-12-15 2019-05-10 华南理工大学 A kind of adaptive convolutional layer hardware accelerator design method
WO2020119318A1 (en) * 2018-12-15 2020-06-18 华南理工大学 Self-adaptive selection and design method for convolutional-layer hardware accelerator
CN109754073A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Data processing method, device, electronic equipment and readable storage medium storing program for executing
CN109886407A (en) * 2019-02-27 2019-06-14 上海商汤智能科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN109886407B (en) * 2019-02-27 2021-10-22 上海商汤智能科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
WO2020177250A1 (en) * 2019-03-06 2020-09-10 上海熠知电子科技有限公司 Data reading system and method
CN110009644A (en) * 2019-03-26 2019-07-12 深兰科技(上海)有限公司 A kind of method and apparatus of characteristic pattern row pixel segmentation
CN110188863A (en) * 2019-04-30 2019-08-30 杭州电子科技大学 A kind of convolution kernel and its compression algorithm of convolutional neural networks
CN110188863B (en) * 2019-04-30 2021-04-09 杭州电子科技大学 Convolution kernel compression method of convolution neural network suitable for resource-limited equipment
CN111914985B (en) * 2019-05-10 2023-07-04 杭州海康威视数字技术股份有限公司 Configuration method, device and storage medium of deep learning network model
CN111914985A (en) * 2019-05-10 2020-11-10 杭州海康威视数字技术股份有限公司 Configuration method and device of deep learning network model and storage medium
CN110399883A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 Image characteristic extracting method, device, equipment and computer readable storage medium
US11614964B2 (en) 2019-09-12 2023-03-28 Inspur Electronic Information Industry Co., Ltd. Deep-learning-based image processing method and system
WO2021047118A1 (en) * 2019-09-12 2021-03-18 浪潮电子信息产业股份有限公司 Image processing method, device and system
US20210118095A1 (en) * 2019-10-17 2021-04-22 Samsung Electronics Co., Ltd. Image processing apparatus and method
US11854159B2 (en) * 2019-10-17 2023-12-26 Samsung Electronics Co., Ltd. Image processing apparatus and method
WO2022121474A1 (en) * 2020-12-11 2022-06-16 苏州浪潮智能科技有限公司 Method and system for optimizing convolutional residual structure of neural network, device, and medium
CN113286174A (en) * 2021-05-21 2021-08-20 浙江商汤科技开发有限公司 Video frame extraction method and device, electronic equipment and computer readable storage medium
CN113806044B (en) * 2021-08-31 2023-11-07 天津大学 Heterogeneous platform task bottleneck eliminating method for computer vision application
CN113806044A (en) * 2021-08-31 2021-12-17 天津大学 Heterogeneous platform task bottleneck elimination method for computer vision application
CN114445687A (en) * 2021-12-31 2022-05-06 苏州浪潮智能科技有限公司 Image identification reasoning method, system, storage medium and equipment
CN114445687B (en) * 2021-12-31 2024-01-19 苏州浪潮智能科技有限公司 Image recognition reasoning method, system, storage medium and device

Also Published As

Publication number Publication date
CN105869117B (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN105869117A (en) Method for accelerating GPU directed at deep learning super-resolution technology
DE102018117813A1 (en) Timely data reconstruction with an external recurrent neural network
CN109389556A (en) The multiple dimensioned empty convolutional neural networks ultra-resolution ratio reconstructing method of one kind and device
CN106991011A (en) It is a kind of for big data task handle it is parallel and cooperate with the method optimized based on CPU multithreadings and many granularities of GPU
DE112020004237T5 (en) VIDEO UPSAMPLING USING ONE OR MORE NEURAL NETWORKS
DE102020104637A1 (en) TECHNIQUES FOR EFFICIENT PARTITIONING OF MEMORY
Wu et al. How many labeled license plates are needed?
Zhou et al. RSANet: towards real-time object detection with residual semantic-guided attention feature pyramid network
CN105931256A (en) CUDA (compute unified device architecture)-based large-format remote sensing image fast segmentation method
CN113392968A (en) Micro-training for iterative small sample refinement of neural networks
DE112020000865T5 (en) STORAGE MANAGEMENT SYSTEM
JP2019212243A (en) Learning identification device and learning identification method
CN109191392A (en) A kind of image super-resolution reconstructing method of semantic segmentation driving
DE102021205690A1 (en) Training neural networks with limited data using invertible augmentation operators
JP2020077066A (en) Learning device and method for learning
DE102022121509A1 (en) SINGLE FRAME INVERSE RENDERING
CN106802787A (en) MapReduce optimization methods based on GPU sequences
CN106484532B (en) GPGPU parallel calculating method towards SPH fluid simulation
DE112021000303T5 (en) BARRIER-FREE AND BARRIESS-FREE SYNCHRONIZATION OF SHARED STORAGE
DE102021116231A1 (en) INTERFERENCE-FREE MULTIPLEXER
CN103942095B (en) Two-dimensional phase position unwrapping method based on heterogeneous accelerating platform
CN108648213A (en) A kind of implementation method of KCF track algorithms on TMS320C6657
CN105869105A (en) Method for accelerating GPU directed at A+ super-resolution technology
JP2021015523A (en) Learning device and learning method
Tan et al. PPEDNet: Pyramid pooling encoder-decoder network for real-time semantic segmentation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210402