CN114359683B - Text positioning-oriented single-core HOG efficient heterogeneous acceleration method - Google Patents
Text positioning-oriented single-core HOG efficient heterogeneous acceleration method Download PDFInfo
- Publication number
- CN114359683B CN114359683B CN202111671159.2A CN202111671159A CN114359683B CN 114359683 B CN114359683 B CN 114359683B CN 202111671159 A CN202111671159 A CN 202111671159A CN 114359683 B CN114359683 B CN 114359683B
- Authority
- CN
- China
- Prior art keywords
- cell
- pixels
- hog
- pixel
- hardware
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000001133 acceleration Effects 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 title claims abstract description 19
- 230000015654 memory Effects 0.000 claims abstract description 37
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 12
- 238000010606 normalization Methods 0.000 claims description 4
- 230000017105 transposition Effects 0.000 claims description 3
- 238000005265 energy consumption Methods 0.000 abstract description 8
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 4
- 230000004807 localization Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a text positioning-oriented single-core HOG efficient heterogeneous acceleration method, which comprises the steps of distributing a work item for each pixel, convoluting pixels around each pixel, calculating amplitude and phase of the convolved pixels, calculating a discrete gradient direction of the pixels through a bilinear interpolation algorithm, storing the discrete gradient direction in a local memory of hardware, and releasing the work item distributed for the pixels; distributing a work item for each cell unit and carrying out global indexing of hardware; calculating voting results in the discrete gradient direction, and finishing statistics of each row of pixels; normalizing and summing the counted pixels to form an HOG feature vector, and obtaining the feature vector of the image; the method is realized in a heterogeneous platform, and heterogeneous acceleration is completed. The invention meets the requirements of text positioning instantaneity and low energy consumption, and can further improve the reliability of scene character recognition technology.
Description
Technical Field
The invention relates to the field of scene character recognition, in particular to a single-core HOG efficient heterogeneous acceleration method for text positioning.
Background
With the wide spread of intelligent handheld devices and the rapid development of artificial intelligence, images and videos become the main media information delivery modes. The media information contains a large number of natural scenes, and the text information has important application value. The accurate and rapid extraction of text information from natural scenes is of great importance, where text localization technology is a major concern.
Since text localization faces high complexity implementation algorithms and continuously growing data, the real-time performance of text localization algorithms is challenged. The HOG (Histogram of Oriented Gradient, directional gradient histogram) algorithm is the most commonly used algorithm in text localization calculations. The existing multi-kernel HOG acceleration scheme is to perform global synchronization through a plurality of kernels at the equipment end, and achieve pixel gradient calculation, cell gradient statistics and block normalization of HOG features. However, a high-cost loop operation is generated, and the access and memory overhead of global synchronization and global memory is also high. In heterogeneous system implementations, multi-core acceleration schemes can present significant power consumption issues.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a single-core HOG efficient heterogeneous acceleration method for text positioning, which solves the problems of high memory access cost and high operation amount in the prior art.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
the method for efficiently and isomerically accelerating the single-core HOG oriented to text positioning comprises the following steps:
s1, acquiring pixels of a gray level image, and distributing a work item for each pixel; the Cx×Cy connected pixel areas with uniform sizes form a cell unit;
s2, in each working item, performing row convolution and column convolution on pixels around each pixel by using a differential template and a transposition thereof;
s3, calculating the amplitude and the phase of the convolved pixel;
s4, calculating the discrete gradient direction of the pixel by using the obtained amplitude and phase through a bilinear interpolation algorithm, storing the discrete gradient direction in a local memory of hardware, and releasing a work item allocated to the pixel;
s5, distributing a work item for each cell unit, and carrying out global indexing of hardware;
s6, creating a statistical variable for counting the discrete gradient direction of the pixels, directly adding the discrete gradient direction of each pixel in parallel to the statistical variable, and correspondingly calculating a voting result of the discrete gradient direction of each row of pixels in one cell by using a group of variables to finish the statistics of each row of pixels to obtain the number of different discrete gradient directions in each cell;
s7, carrying out local memory synchronization of hardware, counting pixels of one row in the cell by utilizing each work item based on voting results, carrying out parallel protocol to obtain gradient statistics of all the cell units, storing discrete gradient results of the gradient statistics in the local memory of the hardware, and releasing work items distributed for the cell units;
s8, calculating normalization of discrete gradients after gradient statistics of a cell by using a work item, summing up the normalized results of each cell to obtain a sum value corresponding to each cell, caching the sum value corresponding to each cell in the same block into a local memory of hardware, and carrying out local synchronization of the hardware to obtain a local direction gradient of each block; one image comprises a plurality of blocks;
s9, combining the local direction gradients of each block into an HOG feature vector to obtain the feature vector of the image;
s10, loading the steps onto a heterogeneous platform to realize heterogeneous acceleration.
Further, in step S1, the working item is the smallest working unit in OpenCL; the cell unit is an image minimum dividing unit; each cell comprises a plurality of connected pixel areas with uniform sizes of Cx×Cy, the total pixel size of the window image is Wx×Wy, and the generated two-dimensional index is (Wx, wy).
Further, the differential template in step S2 is [ -1,0,1].
Further, the global index in step S5 has a size of (Wx/Cx, wy/Cy).
Further, the size of the global index at the time of parallel reduction in step S7 is (Wx/Cx, wy), and the total of wx×wy/Cx work items are used.
Further, the first parallel protocol in step S7 uses Cy/2 work items altogether to count two columns of gradients; the second parallel protocol uses Cy/4 work items to count two columns of gradients in total, and then uses the previous 1/2 work items to count gradients in sequence until the gradient statistics is completed.
The beneficial effects of the invention are as follows:
1. in the gradient statistics process, a work item is distributed to each cell, instead of creating a work item for each pixel, so that the problem of access conflict is solved;
2. the pixels of one row are counted corresponding to each work item in the step S7, so that the continuous access of the work items is ensured, the parallelism of Cy times is improved, the utilization rate of GPU resources is improved, the capability of parallel processing of the GPU is fully exerted, and the expenditure is reduced;
3. in the gradient statistics process, the GPU avoids access conflict of local memory through a high-cost atomic function; the FPGA avoids access conflict of the local memory through alternate access of a plurality of pieces of physical memory;
4. in the step S8, the summation result is cached into a local memory of the hardware, so that the access of the global memory is reduced, the reduction time is shortened, and the calculation time is saved;
5. through local memory synchronization, a corresponding computing task is completed by using one equipment kernel, the resource consumption is reduced by more than 50%, and compared with a CPU, the energy efficiency ratio of the scheme on a GPU and an FPGA platform is respectively 22.8 and 42.5, so that the equipment energy consumption can be effectively reduced;
6. the complex protocol operation in the traditional statistical method is avoided by adopting the voting mode, the algorithm calculation time is reduced by more than 50%, and compared with a CPU, the acceleration ratio of the scheme on the GPU and the FPGA platform is respectively 28 and 6.9, so that the calculation time can be effectively reduced;
7. the calculation time of the HOG algorithm on the GPU and the FPGA platform is 25ms and 102ms respectively, and the energy consumption is 4J and 2.14J respectively, so that the requirements of text positioning instantaneity and low energy consumption are met, and the reliability of scene (including images and videos) character recognition can be further improved.
Drawings
FIG. 1 is a flow chart of the present invention;
fig. 2 is a block diagram of the design of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
As shown in fig. 1 and fig. 2, the text positioning-oriented single-core HOG efficient heterogeneous acceleration method includes the following steps:
s1, acquiring pixels of a gray level image, and distributing a work item for each pixel; the Cx×Cy connected pixel areas with uniform sizes form a cell unit;
s2, in each working item, performing row convolution and column convolution on pixels around each pixel by using a differential template and a transposition thereof;
s3, calculating the amplitude and the phase of the convolved pixel;
s4, calculating the discrete gradient direction of the pixel by using the obtained amplitude and phase through a bilinear interpolation algorithm, storing the discrete gradient direction in a local memory of hardware, and releasing a work item allocated to the pixel;
s5, distributing a work item for each cell unit, and carrying out global indexing of hardware;
s6, creating a statistical variable for counting the discrete gradient direction of the pixels, directly adding the discrete gradient direction of each pixel in parallel to the statistical variable, and correspondingly calculating a voting result of the discrete gradient direction of each row of pixels in one cell by using a group of variables to finish the statistics of each row of pixels to obtain the number of different discrete gradient directions in each cell;
s7, carrying out local memory synchronization of hardware, counting pixels of one row in the cell by utilizing each work item based on voting results, carrying out parallel protocol to obtain gradient statistics of all the cell units, storing discrete gradient results of the gradient statistics in the local memory of the hardware, and releasing work items distributed for the cell units;
s8, calculating normalization of discrete gradients after gradient statistics of a cell by using a work item, summing up the normalized results of each cell to obtain a sum value corresponding to each cell, caching the sum value corresponding to each cell in the same block into a local memory of hardware, and carrying out local synchronization of the hardware to obtain a local direction gradient of each block; one image comprises a plurality of blocks;
s9, combining the local direction gradients of each block into an HOG feature vector to obtain the feature vector of the image;
s10, loading the steps onto a heterogeneous platform to realize heterogeneous acceleration.
In the step S1, the working item is the smallest working unit in OpenCL; the cell unit is an image minimum dividing unit; each cell comprises a plurality of connected pixel areas with uniform sizes of Cx×Cy, the total pixel size of the window image is Wx×Wy, and the generated two-dimensional index is (Wx, wy).
The differential template in step S2 is [ -1,0,1].
The global index in step S5 has a size of (Wx/Cx, wy/Cy).
The global index size at the parallel reduction in step S7 is (Wx/Cx, wy), and a total of Wx X Wy/Cx work items are used.
In the step S7, the first parallel protocol uses Cy/2 work items to count two rows of gradients; the second parallel protocol uses Cy/4 work items to count two columns of gradients in total, and then uses the previous 1/2 work items to count gradients in sequence until the gradient statistics is completed.
In step S3, there is no data interaction between the work items, i.e. no global synchronization or no local synchronization.
The high-level description of OpenCL of steps S1 to S9 is converted into a hardware language by AOCL, and a specific hardware circuit is generated.
The scheme is respectively implemented on a CPU+GPU and a CPU+FPGA heterogeneous platform. And taking the CPU as a host computer, executing system scheduling, and taking the GPU and the FPGA as devices respectively. The platform and device are first initialized and a series of configurations are performed. And then controlling the starting equipment to perform other operations. And after the result is obtained, finishing final classification calculation at the host end. Through related experiments, the scheme meets the requirements of text positioning instantaneity and low energy consumption, and can further improve the reliability of scene character recognition technology.
According to the invention, a work item is allocated to each cell in the gradient statistics process, instead of creating a work item for each pixel, so that the problem of access conflict is solved;
the pixels of one row are counted corresponding to each work item in the step S7, so that the continuous access of the work items is ensured, the parallelism of Cy times is improved, the utilization rate of GPU resources is improved, the capability of parallel processing of the GPU is fully exerted, and the expenditure is reduced;
in the gradient statistics process, the GPU avoids access conflict of local memory through a high-cost atomic function; the OpenCL atomic function can perform atomic operation on 32-bit signed and unsigned integers in the global local memory; when one work item accesses the memory, other work items cannot access the memory, and in step S6, when the discrete gradients of a plurality of pixels in the cell are consistent, parallel writing to the same memory can be caused, race conditions can be caused, data can be lost, and the problem can be solved by the atomic function;
in the gradient statistics process, the FPGA avoids access conflict of local memories through alternate access of a plurality of pieces of physical memories; M9K on a plurality of chips of the FPGA is used as a local memory, so that each work item of the same work group is supported to be accessed alternately, and access and memory conflicts of the local memory are avoided; in the step S6, the pixel voting calculation result is stored into a local memory by reasonably dividing the working group, so that the FPGA can avoid the atomic operation of adding floating point numbers at high cost;
in the step S8, the summation result is cached into a local memory of the hardware, so that the access of the global memory is reduced, the reduction time is shortened, and the calculation time is saved;
through local memory synchronization, a corresponding computing task is completed by using one equipment kernel, the resource consumption is reduced by more than 50%, and compared with a CPU, the energy efficiency ratio of the scheme on a GPU and an FPGA platform is respectively 22.8 and 42.5, so that the equipment energy consumption can be effectively reduced;
the complex protocol operation in the traditional statistical method is avoided by adopting the voting mode, the algorithm calculation time is reduced by more than 50%, and compared with a CPU, the acceleration ratio of the scheme on the GPU and the FPGA platform is respectively 28 and 6.9, so that the calculation time can be effectively reduced;
the calculation time of the HOG algorithm on the GPU and the FPGA platform is 25ms and 102ms respectively, and the energy consumption is 4J and 2.14J respectively, so that the requirements of text positioning instantaneity and low energy consumption are met, and the reliability of scene (including images and videos) character recognition can be further improved.
Claims (6)
1. A text positioning-oriented single-core HOG efficient heterogeneous acceleration method is characterized by comprising the following steps of:
s1, acquiring pixels of a gray level image, and distributing a work item for each pixel; the Cx×Cy connected pixel areas with uniform sizes form a cell unit;
s2, in each working item, performing row convolution and column convolution on pixels around each pixel by using a differential template and a transposition thereof;
s3, calculating the amplitude and the phase of the convolved pixel;
s4, calculating the discrete gradient direction of the pixel by using the obtained amplitude and phase through a bilinear interpolation algorithm, storing the discrete gradient direction in a local memory of hardware, synchronizing the local memory of the hardware, and releasing a work item allocated to the pixel;
s5, distributing a work item for each cell unit, and carrying out global indexing of hardware;
s6, creating a statistical variable for counting the discrete gradient direction of the pixels, directly adding the discrete gradient direction of each pixel in parallel to the statistical variable, and correspondingly calculating a voting result of the discrete gradient direction of each row of pixels in one cell by using a group of variables to finish the statistics of each row of pixels to obtain the number of different discrete gradient directions in each cell;
s7, carrying out local memory synchronization of hardware, counting pixels of one row in the cell by utilizing each work item based on voting results, carrying out parallel protocol to obtain gradient statistics of all the cell units, storing discrete gradient results of the gradient statistics in the local memory of the hardware, and releasing work items distributed for the cell units;
s8, calculating normalization of discrete gradients after gradient statistics of a cell by using a work item, summing up the normalized results of each cell to obtain a sum value corresponding to each cell, caching the sum value corresponding to each cell in the same block into a local memory of hardware, and carrying out local synchronization of the hardware to obtain a local direction gradient of each block; one image comprises a plurality of blocks;
s9, combining the local direction gradients of each block into an HOG feature vector to obtain the feature vector of the image;
s10, loading the steps onto a heterogeneous platform to realize heterogeneous acceleration.
2. The text-oriented single-core HOG efficient heterogeneous acceleration method according to claim 1, wherein in step S1, the work item is the smallest work unit in OpenCL; the cell unit is an image minimum dividing unit; each cell comprises a plurality of connected pixel areas with uniform sizes of Cx×Cy, the total pixel size of the window image is Wx×Wy, and the generated two-dimensional index is (Wx, wy).
3. The text-oriented single-core HOG high-efficiency heterogeneous acceleration method of claim 1, wherein the differential template in step S2 is [ -1,0,1].
4. The text-oriented single-core HOG efficient heterogeneous acceleration method according to claim 2, characterized in that the global index in step S5 has a size (Wx/Cx, wy/Cy).
5. The text-oriented single-core HOG heterogeneous acceleration method of claim 4, wherein the global index size at parallel reduction in step S7 is (Wx/Cx, wy), using a total of Wx x Wy/Cx work items.
6. The text-oriented single-core HOG efficient heterogeneous acceleration method according to claim 5, wherein the first parallel reduction in step S7 uses Cy/2 work items in total to count two columns of gradients; the second parallel protocol uses Cy/4 work items to count two columns of gradients in total, and then uses the previous 1/2 work items to count gradients in sequence until the gradient statistics is completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111671159.2A CN114359683B (en) | 2021-12-31 | 2021-12-31 | Text positioning-oriented single-core HOG efficient heterogeneous acceleration method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111671159.2A CN114359683B (en) | 2021-12-31 | 2021-12-31 | Text positioning-oriented single-core HOG efficient heterogeneous acceleration method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114359683A CN114359683A (en) | 2022-04-15 |
CN114359683B true CN114359683B (en) | 2023-10-20 |
Family
ID=81104866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111671159.2A Active CN114359683B (en) | 2021-12-31 | 2021-12-31 | Text positioning-oriented single-core HOG efficient heterogeneous acceleration method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114359683B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750131A (en) * | 2012-06-07 | 2012-10-24 | 中国科学院计算机网络信息中心 | Graphics processing unit (GPU) oriented bitonic merge sort method |
CN104598929A (en) * | 2015-02-03 | 2015-05-06 | 南京邮电大学 | HOG (Histograms of Oriented Gradients) type quick feature extracting method |
CN106095583A (en) * | 2016-06-20 | 2016-11-09 | 国家海洋局第海洋研究所 | Principal and subordinate's nuclear coordination calculation and programming framework based on new martial prowess processor |
CN106780360A (en) * | 2016-11-10 | 2017-05-31 | 西安电子科技大学 | Quick full variation image de-noising method based on OpenCL standards |
CN109726806A (en) * | 2017-10-30 | 2019-05-07 | 上海寒武纪信息科技有限公司 | Information processing method and terminal device |
CN109767637A (en) * | 2019-02-28 | 2019-05-17 | 杭州飞步科技有限公司 | The method and apparatus of the identification of countdown signal lamp and processing |
CN112232372A (en) * | 2020-09-18 | 2021-01-15 | 南京理工大学 | Monocular stereo matching and accelerating method based on OPENCL |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8824742B2 (en) * | 2012-06-19 | 2014-09-02 | Xerox Corporation | Occupancy detection for managed lane enforcement based on localization and classification of windshield images |
US11004205B2 (en) * | 2017-04-18 | 2021-05-11 | Texas Instruments Incorporated | Hardware accelerator for histogram of oriented gradients computation |
-
2021
- 2021-12-31 CN CN202111671159.2A patent/CN114359683B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750131A (en) * | 2012-06-07 | 2012-10-24 | 中国科学院计算机网络信息中心 | Graphics processing unit (GPU) oriented bitonic merge sort method |
CN104598929A (en) * | 2015-02-03 | 2015-05-06 | 南京邮电大学 | HOG (Histograms of Oriented Gradients) type quick feature extracting method |
CN106095583A (en) * | 2016-06-20 | 2016-11-09 | 国家海洋局第海洋研究所 | Principal and subordinate's nuclear coordination calculation and programming framework based on new martial prowess processor |
CN106780360A (en) * | 2016-11-10 | 2017-05-31 | 西安电子科技大学 | Quick full variation image de-noising method based on OpenCL standards |
CN109726806A (en) * | 2017-10-30 | 2019-05-07 | 上海寒武纪信息科技有限公司 | Information processing method and terminal device |
CN109767637A (en) * | 2019-02-28 | 2019-05-17 | 杭州飞步科技有限公司 | The method and apparatus of the identification of countdown signal lamp and processing |
CN112232372A (en) * | 2020-09-18 | 2021-01-15 | 南京理工大学 | Monocular stereo matching and accelerating method based on OPENCL |
Non-Patent Citations (4)
Title |
---|
Guoning Zhang 等.Efficient Heterogeneous Acceleration Using Single-core Histograms of Oriented Gradients.2021 International Conference on UK-China Emerging Technologies (UCET).2022,209-214. * |
刘毅飞 等.形状模型分割中形状对齐GPU加速的OpenCL实现.信息技术.2016,(第03期),28-30+40. * |
胡辉 等.基于多处理机平台并行扩维DFT算法的实现研究.遥测遥控.2002,(第02期),44-50. * |
贺江.面向场景字符识别关键算法的多平台异构加速研究.中国优秀硕士学位论文全文数据库 信息科技辑.2018,(第(2018)02期),I138-1511. * |
Also Published As
Publication number | Publication date |
---|---|
CN114359683A (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108765247B (en) | Image processing method, device, storage medium and equipment | |
CN107341127B (en) | Convolutional neural network acceleration method based on OpenCL standard | |
CN108388537B (en) | Convolutional neural network acceleration device and method | |
US9235769B2 (en) | Parallel object detection method for heterogeneous multithreaded microarchitectures | |
CN109885407B (en) | Data processing method and device, electronic equipment and storage medium | |
CN102647588B (en) | GPU (Graphics Processing Unit) acceleration method used for hierarchical searching motion estimation | |
WO2019184888A1 (en) | Image processing method and apparatus based on convolutional neural network | |
Wai et al. | GPU acceleration of real time Viola-Jones face detection | |
KR20200043617A (en) | Artificial neural network module and scheduling method thereof for highly effective operation processing | |
Poostchi et al. | Efficient GPU implementation of the integral histogram | |
CN117785480B (en) | Processor, reduction calculation method and electronic equipment | |
CN109447239B (en) | Embedded convolutional neural network acceleration method based on ARM | |
CN109740619B (en) | Neural network terminal operation method and device for target recognition | |
CN114359683B (en) | Text positioning-oriented single-core HOG efficient heterogeneous acceleration method | |
CN110796244B (en) | Core computing unit processor for artificial intelligence device and accelerated processing method | |
CN108960203B (en) | Vehicle detection method based on FPGA heterogeneous computation | |
Ibrahim et al. | Gaussian Blur through Parallel Computing. | |
CN110322389A (en) | Pond method, apparatus and system, computer readable storage medium | |
CN114600128A (en) | Three-dimensional convolution in a neural network processor | |
Jinguji et al. | Weight sparseness for a feature-map-split-cnn toward low-cost embedded fpgas | |
Li et al. | VNet: a versatile network to train real-time semantic segmentation models on a single GPU | |
CN111860540B (en) | Neural network image feature extraction system based on FPGA | |
CN111612685B (en) | GPU dynamic self-adaptive acceleration method for remote sensing image | |
TWI798591B (en) | Convolutional neural network operation method and device | |
US20220222509A1 (en) | Processing non-power-of-two work unit in neural processor circuit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |