CN114201726A - Convolution operation optimization method, system, terminal and storage medium - Google Patents

Convolution operation optimization method, system, terminal and storage medium Download PDF

Info

Publication number
CN114201726A
CN114201726A CN202010986153.3A CN202010986153A CN114201726A CN 114201726 A CN114201726 A CN 114201726A CN 202010986153 A CN202010986153 A CN 202010986153A CN 114201726 A CN114201726 A CN 114201726A
Authority
CN
China
Prior art keywords
data
image data
threads
thread
adjacent threads
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010986153.3A
Other languages
Chinese (zh)
Other versions
CN114201726B (en
Inventor
王峥
廖健
刘江佾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhongke Yuanwuxin Technology Co ltd
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202010986153.3A priority Critical patent/CN114201726B/en
Priority to PCT/CN2020/127128 priority patent/WO2022057054A1/en
Publication of CN114201726A publication Critical patent/CN114201726A/en
Application granted granted Critical
Publication of CN114201726B publication Critical patent/CN114201726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Image Processing (AREA)

Abstract

The application relates to a convolution operation optimization method, a convolution operation optimization system, a convolution operation optimization terminal and a convolution operation optimization storage medium. The method comprises the following steps: inputting the image data in the data memory module into the multithreading data cache module, and recording the data characteristics of the image data in each thread; when all threads are filled with image data, performing space-time similarity analysis on the data characteristics of at least two adjacent threads respectively, filtering out the image data of at least one of the at least two adjacent threads when the data characteristics of the at least two adjacent threads have space-time similarity, taking the threads with the filtered image data as idle threads to re-cache the image data input by the data memory module, and performing the space-time similarity analysis again when all threads are filled with the image data again; and performing convolution calculation according to the cached image data, and outputting new image data. The method and the device greatly reduce the actual convolution operation amount, improve the data reusability, reduce the overall network calculation time and improve the chip performance.

Description

Convolution operation optimization method, system, terminal and storage medium
Technical Field
The application belongs to the technical field of deep learning, and particularly relates to a convolution operation optimization method, a convolution operation optimization system, a convolution operation optimization terminal and a convolution operation optimization storage medium.
Background
In recent years, due to the popularization of big data application and the progress of computer hardware, deep learning technology is used to perform feature extraction, classification and recursive operation on data, and has wide application in the fields of computer vision, natural language processing, intelligent system decision and the like. Convolution operation is a very important deep learning feature extraction method, for example, deep learning neural networks such as LeNet1, AlexNet, VGG-16, VGG-19 and the like which are mainstream at present are all stacked by layers of convolution layers, and the accuracy of classification is improved along with the improvement of the number of network layers. However, since the computational power and speed of the general-purpose computer platform cannot keep pace with the large amount of computational power consumed by the convolution operation itself, a special convolution calculation chip needs to be designed.
In the prior art, the performance of a chip is improved by increasing computing nodes, increasing data cache, converting data types and the like, and accompanying with the rapid increase of parameters and calculated amount, the data bandwidth and the calculating capacity of a hardware platform are required to be higher. However, the existing architecture increases the computation power by increasing the operation frequency and increasing the number of computation and storage modules, and has faced the problems of low computation module utilization rate, high implementation cost, limited communication bandwidth, poor expandability and large energy waste.
Disclosure of Invention
The application provides a convolution operation optimization method, a convolution operation optimization system, a convolution operation optimization terminal and a convolution operation optimization storage medium, and aims to solve at least one of the technical problems in the prior art to a certain extent.
In order to solve the above problems, the present application provides the following technical solutions:
a convolution operation optimization method comprises the following steps:
inputting the image data in the data memory module into the multithreading data cache module, and recording the data characteristics of the image data in each thread;
when all threads in the multi-thread data cache module are filled with image data, respectively carrying out space-time similarity analysis on the data characteristics of at least two adjacent threads, and when the data characteristics of at least two adjacent threads have space-time similarity,
filtering out image data of at least one thread in the at least two adjacent threads, taking the thread after filtering the image data as an idle thread to re-cache the image data input by the data memory module, and re-performing the spatiotemporal similarity analysis when all threads are filled with the image data again until all threads in the multi-thread data cache module are filled with the image data and the data characteristics of the at least two adjacent threads do not have spatiotemporal similarity,
and performing convolution calculation according to the image data cached in the multithreading data caching module, and outputting new image data.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the data features include pixel maxima, minima, and means of the image data.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the performing the spatiotemporal similarity analysis on the data characteristics of at least two adjacent threads in the multithread data cache module respectively comprises:
grouping all threads in the multithreading data caching module, wherein each group comprises at least two adjacent threads;
and respectively carrying out time and space similarity analysis on the data characteristics of at least two adjacent threads in each group, and if the maximum value, the minimum value and the mean value difference in the data characteristics of the at least two adjacent threads are smaller than a set threshold value, judging that the image data in the at least two adjacent threads have space-time similarity.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the performing spatiotemporal similarity analysis on the data characteristics of at least two adjacent threads in the multithreaded data caching module respectively further comprises:
and if the differences among the maximum value, the minimum value and the mean value of at least two adjacent threads in each group are smaller than a set similarity threshold, judging that the image data in the at least two adjacent threads are similar.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the filtering out the image data of at least one thread of the at least two adjacent threads and continuing to cache the image data according to the idle thread after filtering the image data comprises:
reordering the idle threads after the image data are filtered;
and continuing to cache the image data from the reordered first thread.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the image data input multithread data cache module further comprises:
the line program number of each buffer memory with image data is recorded by an address register, and a line program number list for recording the buffer memory position of the image data is generated.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the filtering out image data of at least one of the at least two adjacent threads further comprises:
and deleting the line program number of the filtered data in the line program number list.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the performing convolution calculation according to the image data cached in the multithread data caching module includes:
and outputting the new image data and the line program number list to the data memory module.
Another technical scheme adopted by the embodiment of the application is as follows: a convolution operation optimization system comprising:
a multithreading data caching module: the system comprises a cache, a data processing unit and a data processing unit, wherein the cache is used for caching image data and recording the data characteristics of the image data in each thread;
a data filtering module: when all threads in the multi-thread data cache module are filled with image data, performing spatiotemporal similarity analysis on data features of at least two adjacent threads respectively, filtering out image data of at least one thread in the at least two adjacent threads when the data features of the at least two adjacent threads have spatiotemporal similarity, taking the thread after filtering the image data as an idle thread to re-cache the image data, and performing the spatiotemporal similarity analysis again when all threads are filled with the image data again until all threads in the multi-thread data cache module are filled with the image data and the data features of the at least two adjacent threads do not have spatiotemporal similarity;
a convolution operation module: and the image data cache module is used for performing convolution calculation according to the image data cached in the multithread data cache module and outputting new image data.
The embodiment of the application adopts another technical scheme that: a storage medium storing program instructions executable by a processor to perform the method of optimizing convolution operations.
Compared with the prior art, the embodiment of the application has the advantages that: the convolution operation optimization method, the system, the terminal and the storage medium of the embodiment of the application utilize the similarity of input data in time and space, and screen and filter a part of similar data in a data caching stage, so that the actual convolution operation amount is greatly reduced, the data reusability is improved, the maximum utilization of thread resources is achieved, the overall network calculation time is reduced, and the chip performance is improved.
Drawings
FIG. 1 is a flow chart of a convolution operation optimization method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a convolution operation optimization system according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a storage medium according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Because the special convolution computing chip can relate to reading of image memory data, about 80% of energy is consumed in data transmission for the chip, and therefore the reusability of data is improved by optimizing a data cache storage mode, the power consumption of the chip can be greatly reduced, and the performance of the chip is improved. Based on the above, the convolution operation optimization method in the embodiment of the application is based on the intrinsic characteristic analysis of data, and utilizes the similarity of input data in time and space to screen and filter a part of data in the data caching stage, so that the actual calculation quantity of the convolution operation which takes the most time in a network is greatly reduced, the energy consumption is reduced, and the performance of a convolution calculation chip is improved.
Specifically, please refer to fig. 1, which is a flowchart illustrating a convolution operation optimization method according to an embodiment of the present application. The convolution operation optimization method comprises the following steps:
s1: inputting the image data in the data memory module into the multithreading data cache module, and simultaneously recording data characteristics of the image data in each thread, such as pixel maximum value, minimum value, mean value and the like;
s2: recording the line program number of each image data cached through an address register, and generating a line program number list for recording the caching position of the image data;
s3: performing space-time similarity analysis and comparison on data characteristics in at least two adjacent threads in the multithread data cache module, judging whether the data characteristics in the adjacent threads are similar, if so, executing S4, otherwise, continuing to execute S3;
in this step, the analysis of the spatio-temporal similarity specifically comprises: taking 64 threads as an example, in the time dimension, the 64 threads synchronously record the data characteristics flushed into each buffer while caching data, and when the data caching is finished, the data characteristics in each thread are completely updated; in the spatial dimension, dividing 64 threads into 16 groups, wherein each group comprises 4 adjacent threads, and each thread is a sliding window; performing spatiotemporal similarity analysis on data characteristics in 4 adjacent threads (namely, every 4 adjacent sliding windows) in each group, respectively comparing the data characteristics of the second, third and fourth threads in each group with the data characteristics of the first thread, judging whether the data characteristics of the second, third and fourth threads are similar to the data characteristics of the first thread, marking the first thread and the threads which are not similar to the first thread in each group as valid threads, marking the threads which are similar to the first thread as invalid threads, and replacing the calculation results of the invalid threads with the calculation results of the first thread during subsequent thread calculation.
The method for judging whether the data features are similar further comprises the following steps: and if the maximum value, the minimum value and the mean value in the data characteristics of a certain thread and the first thread in each group are close to a certain set threshold value, or the difference between the maximum value, the minimum value and the mean value of the certain thread and the first thread is less than a set similarity threshold value, the image data cached in the thread is considered to have similarity in the image data of the first thread.
It can be understood that the thread grouping manner can be adjusted according to actual operation, or adjusted to be less or more sliding windows per group for spatiotemporal similarity comparison, or adjusted to be pairwise for every two adjacent sliding windows for spatiotemporal similarity comparison.
S4: filtering out image data in at least one of the at least two adjacent threads, updating the thread program number list (i.e. deleting the number of the idle thread from which the data is filtered), and re-executing S3;
s5: judging whether the first round of space-time similarity comparison of all threads in the multi-thread data cache module is finished or not, if so, executing S6; otherwise, execution continues with S3;
s6: reordering idle threads after data filtering in the multithreading data cache module;
in this step, taking 64 threads as an example, the idle threads after data filtering are reordered from 65.
S7: continuing caching the image data from the first sorted thread, and re-executing S2-S6;
in this step, taking 64 threads as an example, after the first round of similarity comparison is finished, the image data continues to be cached from the 65 th sliding window, and the line program number of each cached image data continues to be recorded in the line program number list through the address register.
S8: judging whether all threads in the multithread data cache module are full, if not, continuing to execute S7; otherwise, go to S9;
s9: and the convolution calculation module performs convolution calculation according to the image data in the multithread data cache module and outputs new image data and a line program number list to the data memory module.
Based on the above, the embodiment of the application performs the time-space similarity analysis on the artificial intelligence data, and applies the artificial intelligence data to the data cache module before the convolution calculation, so that the actual convolution operation amount is greatly reduced, the data reusability is improved, the maximum utilization of thread resources is achieved, the overall calculation time of a network is reduced, and the performance of a chip is improved. Compared with other traditional convolutional neural network acceleration modes, the method is simple in algorithm, good in practicability, applicable to various convolutional types and acceleration algorithms, capable of adjusting the similarity threshold value according to input data under different conditions, and capable of accelerating operation of the convolutional neural network on the premise of not losing accuracy. In addition, the method and the device have the advantages that from the characteristics of input data, for the first-layer convolutional layer, the situation that the input data (such as background images, images of monitoring videos and the like) are single and unchanged can be achieved, and the method and the device have great potential for the situation that the depth of a convolutional kernel in a deep neural network is large.
The following examples demonstrate the feasibility and effectiveness of this protocol by experimentation. The experimental scheme is realized by adopting a Verilog HDL language, and the feasibility and the running time of the scheme are simulated and verified by adopting a Modelsim simulation tool. The method specifically comprises the following steps: configuring a configuration file aiming at a specific neural network, and writing image data into a memory; the experiment takes 64 threads, an input image size of 28x28 channel number of 64, an output image size of 28x28 channel number of 128 and a convolution kernel size of 5x5 as examples, the scheme is added into a data cache, and the 64 threads are filled after several rounds of fetching and then start convolution operation; when the experiment is finished, the memory is observed through Modelsim to record the experiment result, and the experiment result shows that the data caching frequency required by processing one picture is reduced, the calculation frequency after caching is reduced, and the time for processing a single picture is reduced. Therefore, the experimental result fully verifies the advantages of simple algorithm, low complexity and high efficiency of the scheme.
Please refer to fig. 2, which is a schematic structural diagram of a convolution operation optimization system according to an embodiment of the present application. The convolution operation optimization system 40 according to the embodiment of the present application includes:
the multithreaded data cache module 41: the system comprises a cache, a data processing unit and a data processing unit, wherein the cache is used for caching image data and recording the data characteristics of the image data in each thread;
the data filtering module 42: when all threads in the multi-thread data cache module are filled with image data, performing spatiotemporal similarity analysis on data features of at least two adjacent threads respectively, filtering out image data of at least one thread in the at least two adjacent threads when the data features of the at least two adjacent threads have spatiotemporal similarity, taking the thread after filtering the image data as an idle thread to re-cache the image data, and performing the spatiotemporal similarity analysis again when all threads are filled with the image data again until all threads in the multi-thread data cache module are filled with the image data and the data features of the at least two adjacent threads do not have spatiotemporal similarity;
convolution operation module 43: and the image data cache module is used for performing convolution calculation according to the image data cached in the multithread data cache module and outputting new image data.
Please refer to fig. 3, which is a schematic structural diagram of a storage medium according to an embodiment of the present application. The storage medium of the embodiment of the present application stores a program file 61 capable of implementing all the methods described above, where the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A convolution operation optimization method is characterized by comprising the following steps:
inputting the image data in the data memory module into the multithreading data cache module, and recording the data characteristics of the image data in each thread;
when all threads in the multi-thread data cache module are filled with image data, respectively carrying out space-time similarity analysis on the data characteristics of at least two adjacent threads, and when the data characteristics of at least two adjacent threads have space-time similarity,
filtering out image data of at least one thread in the at least two adjacent threads, taking the thread after filtering the image data as an idle thread to re-cache the image data input by the data memory module, and re-performing the spatiotemporal similarity analysis when all threads are filled with the image data again until all threads in the multi-thread data cache module are filled with the image data and the data characteristics of the at least two adjacent threads do not have spatiotemporal similarity,
and performing convolution calculation according to the image data cached in the multithreading data caching module, and outputting new image data.
2. The method of claim 1, wherein the data features comprise pixel maxima, minima, and means of the image data.
3. The method of claim 2, wherein the performing spatiotemporal similarity analysis on the data characteristics of at least two adjacent threads in the multithreaded data cache module respectively comprises:
grouping all threads in the multithreading data caching module, wherein each group comprises at least two adjacent threads;
and respectively carrying out time and space similarity analysis on the data characteristics of at least two adjacent threads in each group, and if the maximum value, the minimum value and the mean value difference in the data characteristics of the at least two adjacent threads are smaller than a set threshold value, judging that the image data in the at least two adjacent threads have space-time similarity.
4. The method of claim 2, wherein the performing spatiotemporal similarity analysis on the data characteristics of at least two adjacent threads in the multithreaded data cache module respectively further comprises:
and if the differences among the maximum value, the minimum value and the mean value of at least two adjacent threads in each group are smaller than a set similarity threshold, judging that the image data in the at least two adjacent threads are similar.
5. The method of claim 1, wherein filtering out image data of at least one of the at least two adjacent threads and continuing to buffer the image data according to an idle thread after filtering the image data comprises:
reordering the idle threads after the image data are filtered;
and continuing to cache the image data from the reordered first thread.
6. The method of any of claims 1 to 5, wherein the inputting image data into a multithreaded data caching module further comprises:
the line program number of each buffer memory with image data is recorded by an address register, and a line program number list for recording the buffer memory position of the image data is generated.
7. The method of claim 6, wherein filtering out image data of at least one of the at least two adjacent threads further comprises:
and deleting the line program number of the filtered data in the line program number list.
8. The method of claim 7, wherein performing convolution calculations based on image data cached in the multithreaded data caching module comprises:
and outputting the new image data and the line program number list to the data memory module.
9. A system for optimizing convolution operations, comprising:
a multithreading data caching module: the system comprises a cache, a data processing unit and a data processing unit, wherein the cache is used for caching image data and recording the data characteristics of the image data in each thread;
a data filtering module: when all threads in the multi-thread data cache module are filled with image data, performing spatiotemporal similarity analysis on data features of at least two adjacent threads respectively, filtering out image data of at least one thread in the at least two adjacent threads when the data features of the at least two adjacent threads have spatiotemporal similarity, taking the thread after filtering the image data as an idle thread to re-cache the image data, and performing the spatiotemporal similarity analysis again when all threads are filled with the image data again until all threads in the multi-thread data cache module are filled with the image data and the data features of the at least two adjacent threads do not have spatiotemporal similarity;
a convolution operation module: and the image data cache module is used for performing convolution calculation according to the image data cached in the multithread data cache module and outputting new image data.
10. A storage medium having stored thereon program instructions executable by a processor to perform the method of optimizing convolution operations according to any one of claims 1 to 8.
CN202010986153.3A 2020-09-18 2020-09-18 Convolution operation optimization method, system, terminal and storage medium Active CN114201726B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010986153.3A CN114201726B (en) 2020-09-18 2020-09-18 Convolution operation optimization method, system, terminal and storage medium
PCT/CN2020/127128 WO2022057054A1 (en) 2020-09-18 2020-11-06 Convolution operation optimization method and system, terminal, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010986153.3A CN114201726B (en) 2020-09-18 2020-09-18 Convolution operation optimization method, system, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN114201726A true CN114201726A (en) 2022-03-18
CN114201726B CN114201726B (en) 2023-02-10

Family

ID=80645316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010986153.3A Active CN114201726B (en) 2020-09-18 2020-09-18 Convolution operation optimization method, system, terminal and storage medium

Country Status (2)

Country Link
CN (1) CN114201726B (en)
WO (1) WO2022057054A1 (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101807144A (en) * 2010-03-17 2010-08-18 上海大学 Prospective multi-threaded parallel execution optimization method
US20140072243A1 (en) * 2011-11-21 2014-03-13 Tencent Technology (Shenzhen) Company Limited Method and system for image processing
CN104657436A (en) * 2015-02-02 2015-05-27 中国人民解放军空军航空大学 Static tile pyramid parallel building method based on MapReduce
CN104932956A (en) * 2015-06-19 2015-09-23 华南理工大学 Big-data-oriented cloud disaster tolerant backup method
CN107229598A (en) * 2017-04-21 2017-10-03 东南大学 A kind of low power consumption voltage towards convolutional neural networks is adjustable convolution computing module
CN107463439A (en) * 2017-08-21 2017-12-12 山东浪潮通软信息科技有限公司 A kind of thread pool implementation method and device
CN110569927A (en) * 2019-09-19 2019-12-13 浙江大搜车软件技术有限公司 Method, terminal and computer equipment for scanning and extracting panoramic image of mobile terminal
CN111046092A (en) * 2019-11-01 2020-04-21 东北大学 Parallel similarity connection method based on CPU-GPU heterogeneous system structure
US20200218917A1 (en) * 2019-01-07 2020-07-09 Hcl Technologies Limited Reconfigurable 3d convolution engine
CN111522885A (en) * 2018-01-25 2020-08-11 曲逸文 Distributed database system collaborative optimization method based on dynamic programming
CN111597029A (en) * 2020-05-20 2020-08-28 上海商汤智能科技有限公司 Data processing method and device, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3401846B1 (en) * 2017-05-09 2023-02-01 Nokia Technologies Oy Method and device for analyzing sensor data
CN109886407B (en) * 2019-02-27 2021-10-22 上海商汤智能科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN110059668B (en) * 2019-04-29 2020-12-15 中国民用航空总局第二研究所 Behavior prediction processing method and device and electronic equipment
CN110458279B (en) * 2019-07-15 2022-05-20 武汉魅瞳科技有限公司 FPGA-based binary neural network acceleration method and system
CN111639563B (en) * 2020-05-18 2023-07-18 浙江工商大学 Basketball video event and target online detection method based on multitasking

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101807144A (en) * 2010-03-17 2010-08-18 上海大学 Prospective multi-threaded parallel execution optimization method
US20140072243A1 (en) * 2011-11-21 2014-03-13 Tencent Technology (Shenzhen) Company Limited Method and system for image processing
CN104657436A (en) * 2015-02-02 2015-05-27 中国人民解放军空军航空大学 Static tile pyramid parallel building method based on MapReduce
CN104932956A (en) * 2015-06-19 2015-09-23 华南理工大学 Big-data-oriented cloud disaster tolerant backup method
CN107229598A (en) * 2017-04-21 2017-10-03 东南大学 A kind of low power consumption voltage towards convolutional neural networks is adjustable convolution computing module
CN107463439A (en) * 2017-08-21 2017-12-12 山东浪潮通软信息科技有限公司 A kind of thread pool implementation method and device
CN111522885A (en) * 2018-01-25 2020-08-11 曲逸文 Distributed database system collaborative optimization method based on dynamic programming
US20200218917A1 (en) * 2019-01-07 2020-07-09 Hcl Technologies Limited Reconfigurable 3d convolution engine
CN110569927A (en) * 2019-09-19 2019-12-13 浙江大搜车软件技术有限公司 Method, terminal and computer equipment for scanning and extracting panoramic image of mobile terminal
CN111046092A (en) * 2019-11-01 2020-04-21 东北大学 Parallel similarity connection method based on CPU-GPU heterogeneous system structure
CN111597029A (en) * 2020-05-20 2020-08-28 上海商汤智能科技有限公司 Data processing method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
D. KIM ET.LA: "Physical experimentation with prefetching helper threads on Intel"s hyper-threaded processors", 《 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2004. CGO 2004.》 *
王峥等: "远程线程注入技术在监控系统中的应用", 《计算机技术与发展》 *

Also Published As

Publication number Publication date
WO2022057054A1 (en) 2022-03-24
CN114201726B (en) 2023-02-10

Similar Documents

Publication Publication Date Title
CN108765247B (en) Image processing method, device, storage medium and equipment
Chen et al. Saliency detection via the improved hierarchical principal component analysis method
EP4156017A1 (en) Action recognition method and apparatus, and device and storage medium
CN110287820B (en) Behavior recognition method, device, equipment and medium based on LRCN network
CN108491856B (en) Image scene classification method based on multi-scale feature convolutional neural network
CN111414910B (en) Small target enhancement detection method and device based on double convolution neural network
CN108881254B (en) Intrusion detection system based on neural network
WO2022007349A1 (en) Neural network tuning method and system, terminal, and storage medium
CN111083933B (en) Data storage and acquisition method and device
CN114882530A (en) Pedestrian detection-oriented lightweight convolutional neural network model
Tao et al. An adaptive interference removal framework for video person re-identification
CN117034100A (en) Self-adaptive graph classification method, system, equipment and medium based on hierarchical pooling architecture
CN109447239B (en) Embedded convolutional neural network acceleration method based on ARM
CN114882278A (en) Tire pattern classification method and device based on attention mechanism and transfer learning
CN112200310B (en) Intelligent processor, data processing method and storage medium
CN114201726B (en) Convolution operation optimization method, system, terminal and storage medium
WO2024074042A1 (en) Data storage method and apparatus, data reading method and apparatus, and device
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
CN117223005A (en) Accelerator, computer system and method
CN112084371B (en) Movie multi-label classification method and device, electronic equipment and storage medium
CN105094701B (en) A kind of adaptive pre-head method and device
CN103631726B (en) File processing method and device of series-connection streaming computational nodes
US10909417B2 (en) Proposal processing method and related products
CN110490312B (en) Pooling calculation method and circuit
CN114419630A (en) Text recognition method based on neural network search in automatic machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240201

Address after: 518000, 18D1, Block C, Central Avenue, Intersection of Xixiang Avenue and Baoyuan Road, Labor Community, Xixiang Street, Bao'an District, Shenzhen, Guangdong Province

Patentee after: Shenzhen Zhongke Yuanwuxin Technology Co.,Ltd.

Country or region after: China

Address before: 1068 No. 518055 Guangdong city in Shenzhen Province, Nanshan District City Xili University School Avenue

Patentee before: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY

Country or region before: China