CN109711323B - Real-time video stream analysis acceleration method, device and equipment - Google Patents

Real-time video stream analysis acceleration method, device and equipment Download PDF

Info

Publication number
CN109711323B
CN109711323B CN201811585634.2A CN201811585634A CN109711323B CN 109711323 B CN109711323 B CN 109711323B CN 201811585634 A CN201811585634 A CN 201811585634A CN 109711323 B CN109711323 B CN 109711323B
Authority
CN
China
Prior art keywords
cache
decoding
caches
analysis
writable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811585634.2A
Other languages
Chinese (zh)
Other versions
CN109711323A (en
Inventor
谈鸿韬
陆辉
刘树惠
杨波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Fiberhome Digtal Technology Co Ltd
Original Assignee
Wuhan Fiberhome Digtal Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Fiberhome Digtal Technology Co Ltd filed Critical Wuhan Fiberhome Digtal Technology Co Ltd
Priority to CN201811585634.2A priority Critical patent/CN109711323B/en
Publication of CN109711323A publication Critical patent/CN109711323A/en
Application granted granted Critical
Publication of CN109711323B publication Critical patent/CN109711323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a real-time video stream analysis accelerating method, a device and equipment, aiming at the optimization and the expansion of the number of paths analyzed by a real-time video stream algorithm, a GPU is called to decode each path of real-time video, the decoding result is directly returned to the algorithm through a video memory address, the algorithm end is provided with double caches, one cache is used for multi-path storage of decoding data, the other cache is used for transferring to the algorithm for GPU batch processing, and after the batch processing is finished, the functions of the two caches are switched, so that the purpose of minimum system delay is achieved.

Description

Real-time video stream analysis acceleration method, device and equipment
Technical Field
The invention relates to the technical field of video image processing, in particular to a real-time video stream analysis acceleration method, a real-time video stream analysis acceleration device and real-time video stream analysis acceleration equipment.
Background
Along with the gradual advance and landing of large security engineering and projects such as 'safe cities', 'smart cities', 'snow projects', and the like, city video monitoring construction has slowly entered the deep phase, and when massive video data is accumulated, the video stage of simple 'watching' has not been satisfied for a long time: in the face of a large amount of video scenes, the traditional manual visual looking up of videos usually still seems unconscious while consuming a large amount of manpower and material resources, and cannot adapt to the case handling requirements of the real public security industry. Under the background, people, vehicles, objects and the like in the video are subjected to video structurization through an intelligent video analysis algorithm, such as lineation detection, target tracking, face detection and the like, target features in the videos are extracted, human eyes are replaced by automatic extraction of programs, keyword search is carried out by combining technical means such as big data and the like to find clues, and the method gradually becomes a mainstream mode of the security industry.
However, when an intelligent analysis algorithm faces a massive video processing scene, a huge performance pressure is faced, taking the currently most widely applied 1080PH264 video stream as an example, currently, a mainstream intelXeon server based on an x86 architecture can only achieve the performance of about 200 to 300fps based on CPU decoding, and the intelligent video analysis algorithm is a pipeline processed by video stream- > decoding- > YUV/RGB data- > algorithm, when an algorithm link is added, because an image algorithm usually consumes a CPU very much, the decoding performance above is lower, specifically, aiming at the fact that the number of concurrent paths which can be supported by a real-time video stream is difficult to go, and if the efficiency is improved by increasing an analysis server through horizontally expanding analysis nodes, the cost is too high and the cost performance ratio is too low, and an application scene of large-scale video analysis is difficult to support.
Disclosure of Invention
The present invention is directed to overcoming the drawbacks of the prior art and providing a method, an apparatus and a device for accelerating real-time video stream analysis, which can significantly improve the performance of a system based on GPU hardware acceleration.
The invention is realized by the following steps: the invention provides a real-time video stream analysis acceleration method, which comprises the following steps:
1) setting at least two caches aiming at each GPU card, wherein a flag bit and a decoding way value k are arranged in each cache, the decoding way value k is used for storing the accumulated decoding way number, and when the flag bit of each cache is false, the cache is writable, and decoding data are allowed to be stored in the writable cache; when the cache flag bit is true, the cache is readable, the multi-channel decoded data stored in the cache is allowed to be transmitted to an algorithm analysis module in batch for analysis and processing, the flag bits of a plurality of caches corresponding to each GPU card are initialized to false, two monitoring threads are started, one monitoring thread is a cache write-in monitoring thread, and the other monitoring thread is a double-cache read-out monitoring thread;
2) after receiving the decoded data, the algorithm end firstly checks the flag bits of a plurality of caches, judges whether writable caches exist or not, when at least one cache flag bit is false, the writable cache exists, a writable cache with the flag bit being false is randomly selected, the decoded data of the path is stored, and the value k of the decoding path of the cache is added with 1; otherwise, directly abandoning the path of decoded data, and directly returning without processing;
3) checking the states of a plurality of caches at intervals by a cache write monitoring thread, when the decoding way value K of the cache is greater than or equal to a set value K, considering the cache to be readable, and setting the cache mark position as true, otherwise, setting the cache mark position as false; meanwhile, when the cache reading monitoring thread checks the states of a plurality of caches at intervals of designated time, when the cache flag bit is true, the cache reading monitoring thread considers that the cache flag bit is readable, the multi-channel decoding data stored in the cache is transmitted to the algorithm analysis module in batch for analysis processing, and after the processing is finished, the flag bit of the cache is false and is set to be writable again.
Further, two caches are set for each GPU card; the two caches are bound with corresponding GPU cards; each double buffer is responsible for accepting decoded data on the corresponding GPU. The N GPU cards correspond to N double caches.
Furthermore, each cache is allowed to store maximum M paths of decoded data, wherein M is the maximum parallel processing task number allowed on each GPU card obtained through testing.
Further, the set value K is M/2.
And transmitting the decoded data information to the algorithm end through a data receiving interface of the algorithm end every time the main thread of the application program finishes decoding one frame.
The algorithm analysis module provides a receiving data interface for the decoding layer to call, and is somewhat like push operation in a data structure. The decoding module and the algorithm analysis module are mainly operated on a GPU. And the algorithm analysis module analyzes the data. The decoding module and the algorithm analysis module are core calculation modules in an application layer sequence and are responsible for decoding and analyzing functions, the decoding and the analyzing are all dependent on corresponding hardware components of the GPU, and a special video coding and decoding core and a cuda core are arranged in the nvidiagpu.
Further, the step of obtaining the maximum parallel processing task number M allowed on each GPU card through testing specifically includes the steps of:
selecting a reference test file;
decoding and analyzing the M test files through a benchmark test program, outputting an analysis frame rate fps, continuously increasing M from M to 1,2 and 3, and recording the M value at the moment when the fps is reduced to approach a set Q value, wherein the M value is the optimal number of paths supported and analyzed by a single card; and the benchmark test program decodes and algorithmically analyzes the multi-channel video stream file. The frame rate of the real-time stream is generally 25-30 fps, for example, 25, when a file is used for a simulation test, when M is smaller, fps of each path is larger, for example, when M is 2, the fps can reach 200fps, when M is continuously increased, fps is continuously decreased, when fps is decreased to 25-30, M cannot be increased, and when mfps is increased to <25, the requirement of real-time performance cannot be met. Approximating Q means that Q is slightly greater than or equal to Q. Based on the minimum fps, the average is generally not too different.
When fps M is maximum, the analysis speed is highest, and the analysis steps are as follows:
(1) assuming that the duration of a video file is T and the frame rate is FR;
(2) defining index analysis acceleration ratio as video recording duration/analysis time to measure analysis efficiency;
(3) for the sake of simplicity of the analysis model, assuming that the GPU server has N GPU cards, the video is first uniformly cut into the N cards for analysis, and the time length of the video segment divided on each card is:
Figure BDA0001919020310000041
(4) assuming that the time length t video on each card is segmented, the number of the segments is M, each card is equivalent to parallel analysis of M paths of video streams, the analysis frame rate of each video stream is fps, and the time required for the analysis of each stream is as long as:
Figure BDA0001919020310000042
the total analysis time of the video can be approximated by t1, so the acceleration ratio is analyzed
Figure BDA0001919020310000043
N is the number of GPU cards, and FR is the frame rate of the video, which are all fixed values. The variable slice number M of only one card and the analysis frame rate fps of each slice can be used, so that the analysis speed is the highest when the product of the two is the maximum.
Step 2) and step 3) are performed asynchronously.
The method also comprises the following steps before the step 2): and calling the GPU to decode each path of real-time video, and directly returning a decoding result to the algorithm end through the video memory address.
The invention provides a real-time video stream analysis accelerating device, which comprises a decoding data receiving module, a decoding module, a writing module, a cache writing monitoring module and a cache reading monitoring module;
the decoding data receiving module is used for receiving each path of decoding data;
the write-in module is used for checking the flag bit of the corresponding cache, judging whether writable cache exists or not, when at least one cache flag bit is false, indicating that writable cache exists, randomly selecting a writable cache with the flag bit being false, storing the decoded data of the path in the writable cache, and adding 1 to the value k of the decoded path of the cache; otherwise, directly abandoning the path of decoded data;
the cache write-in monitoring module is used for checking the states of a plurality of caches at intervals, when the decoding way numerical value K of the cache is greater than or equal to a set value K, the cache is considered to be readable, the cache mark position is true, and otherwise, the cache mark position is false;
the cache reading monitoring module is used for checking the states of a plurality of caches at intervals, when the cache flag bit is true, the cache reading monitoring module considers the state to be readable, the multi-channel decoding data stored in the cache is transmitted to the algorithm analysis module in batch for algorithm analysis processing, and after the processing is finished, the flag position of the cache is false and is set to be writable again.
The invention provides a real-time video stream analysis accelerating device, which comprises a memory, a video processing unit and a video processing unit, wherein the memory is used for storing a program;
and a processor for implementing the steps of the real-time video stream analysis acceleration method as described above when executing the program.
Compared with the prior art, the invention has the following beneficial effects:
the method aims at the optimization and the expansion of the number of paths analyzed by a real-time video stream algorithm, the GPU is called to decode each path of real-time video, the decoding result is directly returned to the algorithm through a video memory address, the algorithm end is provided with double caches, one cache is used for storing decoding data in multiple paths, the other cache is used for transferring the decoding data to the algorithm for GPU batch processing, after the batch processing is completed, the functions of the two caches are switched, the purpose of minimizing the system delay is achieved, and the problem of time delay when the main performance bottleneck of the real-time video stream is parallel processing of multiple GPU tasks is solved.
The invention provides a corresponding acceleration method aiming at real-time flow analysis, and can obviously improve the system efficiency based on GPU hardware acceleration.
Drawings
FIG. 1 is a diagram of an embodiment for a real-time video analytics task;
FIG. 2 is a diagram illustrating a detailed embodiment of a double-buffer switching procedure according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1 and fig. 2, the present embodiment provides a real-time video stream analysis acceleration method, including the following steps:
1) the algorithm analysis module sets two identical GPU caches aiming at each GPU, wherein the caches are respectively marked as a first cache and a second cache, each cache can store decoding data of at most M paths, and a flag bit and a decoding path numerical value k are arranged in each cache and used for storing accumulated decoding path numbers. When the algorithm module is started, initializing the flag bits of the two GPU caches as false; and the decoding device also comprises a data receiving interface, and each path of decoding data can transfer the decoding data L to the algorithm analysis module by calling the interface.
A. When the flag bit of the double caches is false, the caches are indicated to be writable, and the multi-path decoding data can be stored in the writable caches;
B. when the double-cache flag bit is true, the cache is readable, and the stored multi-channel decoding data can be transmitted to the algorithm analysis module in batch for batch processing;
C. the double cache corresponds to one GPU, if the double cache is N GPUs, the double cache corresponds to N parts of double caches, the double caches are bound with the card, and each part of double cache is responsible for receiving M paths of decoding data on the corresponding GPU; the following steps are all explained by using a single card;
2) the algorithm analysis module starts two threads, one cache is written into the monitoring thread, one double cache is read out of the monitoring thread, and monitoring and checking are carried out once every 10 ms;
3) when the ith path of decoded data (1< ═ i < ═ M) arrives, a data receiving interface of an algorithm analysis module is called;
4) the data receiving interface firstly checks the double-cache flag bit of the algorithm analysis module, when at least one is false, the writable double cache is indicated, the next step is carried out, otherwise, the decoding data of the path is directly abandoned without any processing;
5) randomly selecting a writable double cache which is false, storing the decoding data of the ith path, adding 1 to k, and finishing the calling of a data receiving interface;
step 3), step 4), step 5) call the execution flow of the data receiving interface of the algorithm analysis module for the decoding module, and the following step 6) and later the processing steps inside the algorithm analysis module are executed asynchronously;
6) writing a cache of the algorithm analysis module into a monitoring thread, checking a double-cache state every 10ms, and when the number of the decoded data paths stored in each cache exceeds half of the maximum value (k > -M/2), considering the cache to be readable, and marking the cache as true;
7) and reading the monitoring thread by a cache of the algorithm analysis module, checking the state of the double caches every 10ms, considering the double caches to be readable when the flag bit of the double caches is true, transmitting the block cache to the analysis module for batch processing, and setting the flag to be false and re-writable after the processing is finished.
The scheduling and decoding steps before the real-time video stream analysis are as follows:
(1) detecting and managing various GPU models, and automatically identifying the card types and the number;
(2) for a specified GPU card type, using a mainstream H264 or H2651080P real-time video stream as a benchmark test source;
(3) writing a benchmark test analysis program, realizing the decoding and algorithm analysis functions of a plurality of paths of video files, and outputting the analysis frame rate fps of each path;
(4) for a single card to access M paths of real-time streams, simultaneously printing and outputting an algorithm link to analyze frame rate fps, starting to continuously increase M from M being 1,2 and 3, and when the fps is reduced and approaches to a Q value such as Q being 25 (fps being 25, 25 is the most common real-time video stream frame rate in the field of video monitoring, and the Q value can be adjusted according to the actual frame rate), recording the M value at the moment and being the optimal single card to support the analysis path number;
(5) the GPU scheduler initializes the maximum parallel processing task number P on each GPU card to be M and the running task number C to be 0;
(6) for each real-time flow analysis task K, sequentially traversing N blocks of cards, when the number C being analyzed on the ith block of card is less than P, returning the id of the ith block of card to the algorithm for processing, meanwhile, adding 1 to C, and waiting when the traversal is finished and no idle state exists (C > of all GPUs is equal to P);
(7) when the task of each GPU resource is analyzed, releasing the corresponding GPU resource id, subtracting 1 from the number C being analyzed on the ith block card, and distributing the resource to the task waiting in the total task;
(8) using a GPU scheduler to obtain a corresponding GPU card id (j) and an analysis task Ti (1< ═ i < ═ M);
(9) calling a GPU decoder (SDK) to perform GPU hard decoding on the analysis task Ti on the GPUj, and storing decoded data in a GPU video memory L;
the decoding result is directly recalled to the algorithm through the video memory address, the algorithm end sets at least two caches for each GPU card, and the steps of the first embodiment are adopted for double-cache switching and batch processing of the decoded data, as shown in FIG. 2.
Example two
The embodiment provides a real-time video stream analysis accelerating device, which comprises a decoding data receiving module, a decoding module, a writing module, a cache writing monitoring module and a cache reading monitoring module;
the decoding data receiving module is used for receiving each path of decoding data;
the write-in module is used for checking the flag bit of the corresponding cache, judging whether writable cache exists or not, when at least one cache flag bit is false, indicating that writable cache exists, randomly selecting a writable cache with the flag bit being false, storing the decoded data of the path in the writable cache, and adding 1 to the value k of the decoded path of the cache; otherwise, directly abandoning the path of decoded data;
the cache write-in monitoring module is used for checking the states of a plurality of caches at intervals, when the decoding way numerical value K of the cache is greater than or equal to a set value K, the cache is considered to be readable, the cache mark position is true, and otherwise, the cache mark position is false;
the cache reading monitoring module is used for checking the states of a plurality of caches at intervals, when the cache flag bit is true, the cache reading monitoring module considers the state to be readable, the multi-channel decoding data stored in the cache is transmitted to the algorithm analysis module in batch for algorithm analysis processing, and after the processing is finished, the flag position of the cache is false and is set to be writable again.
EXAMPLE III
The embodiment provides a real-time video stream analysis accelerating device, which comprises a memory, a storage unit and a processing unit, wherein the memory is used for storing a program;
and a processor for implementing the steps of the real-time video stream analysis acceleration method as described above when executing the program.
The invention adopts double buffers aiming at real-time video (frame rate is online to send and fixed, generally 25-30 fps), and emphasizes that the invention supports as many paths (generally 10-30) as possible on the premise of meeting real-time performance as far as possible. However, the number of paths is large, and data transmission and delay between the CPU and the GPU and between the interior of the GPU become a great bottleneck, so that double-cache batch processing is designed for relieving.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (6)

1. A real-time video stream analysis acceleration method is characterized by comprising the following steps:
1) setting at least two caches aiming at each GPU card, wherein a flag bit and a decoding way value k are arranged in each cache, the decoding way value k is used for storing the accumulated decoding way number, and when the flag bit of each cache is false, the cache is writable, and decoding data are allowed to be stored in the writable cache; when the cache flag bit is true, the cache is readable, the multi-channel decoded data stored in the cache is allowed to be transmitted to an algorithm analysis module in batch for analysis and processing, the flag bits of a plurality of caches corresponding to each GPU card are initialized to false, two monitoring threads are started, one monitoring thread is a cache write-in monitoring thread, and the other monitoring thread is a double-cache read-out monitoring thread; each cache allows the storage of at most M paths of decoded data, wherein M is the maximum parallel processing task number allowed on each GPU card obtained through testing;
the step of testing the maximum parallel processing task number M allowed on each GPU card specifically comprises the following steps:
selecting a reference test file;
decoding and analyzing the M test files through a benchmark test program, outputting an analysis frame rate fps, wherein M =1,2 and 3 …, increasing M from M =1, and recording the M value at the moment when the fps is reduced to approach a set Q value, wherein the M value is the optimal number of single-card supported analysis paths; the reference test program decodes and analyzes the algorithm of the multi-channel video stream file;
2) calling a GPU to decode each path of real-time video, directly returning a decoding result to an algorithm end through a video memory address, after the algorithm end receives decoded data, firstly checking a plurality of cached flag bits, judging whether writable caches exist or not, when at least one cached flag bit is false, indicating that writable caches exist, randomly selecting a writable cache with the flag bit being false, storing the decoded data of the path, and adding 1 to a decoding path value k of the cache; otherwise, directly abandoning the path of decoded data, and directly returning without processing;
3) checking the states of a plurality of caches at intervals by a cache write monitoring thread, when the decoding way value K of the cache is greater than or equal to a set value K, considering the cache to be readable, and setting the cache mark position as true, otherwise, setting the cache mark position as false; meanwhile, when the cache reading monitoring thread checks the states of a plurality of caches at intervals of designated time, when the cache flag bit is true, the cache reading monitoring thread considers that the cache flag bit is readable, the multi-channel decoding data stored in the cache is transmitted to the algorithm analysis module in batch for analysis processing, and after the processing is finished, the flag bit of the cache is false and is set to be writable again.
2. The method of claim 1, wherein: setting two caches for each GPU card; the two caches are bound with corresponding GPU cards; each double buffer is responsible for accepting decoded data on the corresponding GPU.
3. The method of claim 1, wherein: the set value K is M/2.
4. The method of claim 1, wherein: step 2) and step 3) are performed asynchronously.
5. A real-time video stream analysis acceleration apparatus, characterized by: the device comprises a decoding data receiving module, a decoding module, a writing module, a cache writing monitoring module and a cache reading monitoring module;
the decoding data receiving module is used for receiving each path of decoding data;
the write-in module is used for checking the flag bit of the corresponding cache, judging whether writable cache exists or not, when at least one cache flag bit is false, indicating that writable cache exists, randomly selecting a writable cache with the flag bit being false, storing the decoded data in the writable cache, and adding 1 to the decoded way value k of the cache; otherwise, directly abandoning the decoded data;
the cache write-in monitoring module is used for checking the states of a plurality of caches at intervals, when the decoding way numerical value K of the cache is greater than or equal to a set value K, the cache is considered to be readable, the cache mark position is true, and otherwise, the cache mark position is false;
the cache reading monitoring module is used for checking the states of a plurality of caches at intervals, when the cache flag bit is true, the cache reading monitoring module considers the state to be readable, the multi-channel decoding data stored in the cache is transmitted to the algorithm analysis module in batch for algorithm analysis processing, and after the processing is finished, the flag position of the cache is false and is set to be writable again.
6. A real-time video stream analysis acceleration apparatus, characterized by: comprises a memory for storing a program;
and a processor for implementing the steps of the real-time video stream analysis acceleration method according to any one of claims 1 to 4 when executing said program.
CN201811585634.2A 2018-12-25 2018-12-25 Real-time video stream analysis acceleration method, device and equipment Active CN109711323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811585634.2A CN109711323B (en) 2018-12-25 2018-12-25 Real-time video stream analysis acceleration method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811585634.2A CN109711323B (en) 2018-12-25 2018-12-25 Real-time video stream analysis acceleration method, device and equipment

Publications (2)

Publication Number Publication Date
CN109711323A CN109711323A (en) 2019-05-03
CN109711323B true CN109711323B (en) 2021-06-15

Family

ID=66257447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811585634.2A Active CN109711323B (en) 2018-12-25 2018-12-25 Real-time video stream analysis acceleration method, device and equipment

Country Status (1)

Country Link
CN (1) CN109711323B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111988561B (en) * 2020-07-13 2022-05-03 浙江大华技术股份有限公司 Adaptive adjustment method and device for video analysis, computer equipment and medium
CN112528961B (en) * 2020-12-28 2023-03-10 山东巍然智能科技有限公司 Video analysis method based on Jetson Nano
CN112822494A (en) * 2020-12-30 2021-05-18 稿定(厦门)科技有限公司 Double-buffer coding system and control method thereof
CN113572997A (en) * 2021-07-22 2021-10-29 中科曙光国际信息产业有限公司 Video stream data analysis method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105163127A (en) * 2015-09-07 2015-12-16 浙江宇视科技有限公司 Video analysis method and device
WO2015192806A1 (en) * 2014-06-20 2015-12-23 Tencent Technology (Shenzhen) Company Limited Model parallel processing method and apparatus based on multiple graphic processing units
CN105224410A (en) * 2015-10-19 2016-01-06 成都卫士通信息产业股份有限公司 A kind of GPU of scheduling carries out method and the device of batch computing
CN105912479A (en) * 2016-04-07 2016-08-31 武汉数字派特科技有限公司 Concurrent data caching method and structure
CN107784001A (en) * 2016-08-26 2018-03-09 北京计算机技术及应用研究所 Parallel spatial querying method based on CUDA

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9268596B2 (en) * 2012-02-02 2016-02-23 Intel Corparation Instruction and logic to test transactional execution status

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015192806A1 (en) * 2014-06-20 2015-12-23 Tencent Technology (Shenzhen) Company Limited Model parallel processing method and apparatus based on multiple graphic processing units
CN105163127A (en) * 2015-09-07 2015-12-16 浙江宇视科技有限公司 Video analysis method and device
CN105224410A (en) * 2015-10-19 2016-01-06 成都卫士通信息产业股份有限公司 A kind of GPU of scheduling carries out method and the device of batch computing
CN105912479A (en) * 2016-04-07 2016-08-31 武汉数字派特科技有限公司 Concurrent data caching method and structure
CN107784001A (en) * 2016-08-26 2018-03-09 北京计算机技术及应用研究所 Parallel spatial querying method based on CUDA

Also Published As

Publication number Publication date
CN109711323A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN109769115B (en) Method, device and equipment for optimizing intelligent video analysis performance
CN109711323B (en) Real-time video stream analysis acceleration method, device and equipment
CN110537194B (en) Power efficient deep neural network processor and method configured for layer and operation protection and dependency management
CN113221706B (en) AI analysis method and system for multi-process-based multi-path video stream
CN110610510A (en) Target tracking method and device, electronic equipment and storage medium
CN110298213B (en) Video analysis system and method
CN107451066A (en) Interim card treating method and apparatus, storage medium, terminal
CN113613065A (en) Video editing method and device, electronic equipment and storage medium
CN107870928A (en) File reading and device
WO2022152104A1 (en) Action recognition model training method and device, and action recognition method and device
CN113591674B (en) Edge environment behavior recognition system for real-time video stream
CN102810133A (en) Ray query method for network game, and scene server
CN109086737B (en) Convolutional neural network-based shipping cargo monitoring video identification method and system
CN113535366A (en) High-performance distributed combined multi-channel video real-time processing method
CN114973152B (en) Monitoring method, device and medium of micromolecule recyclable fracturing fluid storage tank based on neural network
CN109039804B (en) File reading method and electronic equipment
CN115022645A (en) Video compression method and device, electronic equipment and machine-readable storage medium
CN109274966A (en) A kind of monitor video content De-weight method and system based on motion vector
CN107749065A (en) VIBE background modeling methods based on CUDA
CN110826471B (en) Video tag labeling method, device, equipment and computer readable storage medium
CN114330675A (en) Chip, accelerator card, electronic equipment and data processing method
CN107169480B (en) Distributed character recognition system of real-time video stream
CN111917600A (en) Spark performance optimization-based network traffic classification device and classification method
CN113627354B (en) A model training and video processing method, which comprises the following steps, apparatus, device, and storage medium
CN112188215B (en) Video decoding method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant