CN113436232B - Hardware acceleration method based on tracking algorithm - Google Patents

Hardware acceleration method based on tracking algorithm Download PDF

Info

Publication number
CN113436232B
CN113436232B CN202110723521.XA CN202110723521A CN113436232B CN 113436232 B CN113436232 B CN 113436232B CN 202110723521 A CN202110723521 A CN 202110723521A CN 113436232 B CN113436232 B CN 113436232B
Authority
CN
China
Prior art keywords
video data
algorithm
processing
data stream
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110723521.XA
Other languages
Chinese (zh)
Other versions
CN113436232A (en
Inventor
胡铭德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lexin Information Technology Co ltd
Original Assignee
Shanghai Lexin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lexin Information Technology Co ltd filed Critical Shanghai Lexin Information Technology Co ltd
Priority to CN202110723521.XA priority Critical patent/CN113436232B/en
Publication of CN113436232A publication Critical patent/CN113436232A/en
Application granted granted Critical
Publication of CN113436232B publication Critical patent/CN113436232B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a hardware acceleration method based on a tracking algorithm; s1, receiving and segmenting data stream information by hardware; s2, the CPU distributes the compressed video data to the GPU and the APU for processing; s3, the GPU and the APU realize post-processing of the video data stream through self algorithms; s4, algorithm parallelization is adopted for the algorithm processing process of the video data stream; s5, receiving and playing the processed video data stream by the CPU; the invention realizes the segmentation of the data information, so that the data information can be divided into a plurality of small blocks for processing, the accelerated running of hardware can be effectively improved, and the running speed of the hardware and the running speed and efficiency of the hardware can be effectively improved by adopting algorithm parallelization, data parallelization and operation parallelization when the hardware is processed.

Description

Hardware acceleration method based on tracking algorithm
Technical Field
The invention belongs to the technical field of hardware acceleration, and particularly relates to a hardware acceleration method based on a tracking algorithm.
Background
Hardware acceleration refers to a technique for reducing the workload of a central processing unit by allocating a very computationally intensive job to dedicated hardware for processing in a computer. This technique is often used, in particular, in image processing, the structure of the central processor being such that it can carry out a wide variety of different instructions in a short time. What instructions it can process is mainly limited by software. But some repetitive tasks cannot be handled very efficiently and quickly due to the structure of the central processing unit. These special hardware elements do not have to be as flexible as the central processor and therefore their hardware design already takes into account the need to optimize the handling of these special problems, so that the central processor has time to handle other tasks. Some tasks can be solved very efficiently by breaking them down into thousands of smaller tasks. Such as fourier transforming a certain frequency band or rendering a small image. These tasklets can be computed in parallel independent of each other. The overall computational speed for processing these special tasks by massively parallel computations, i.e., using a large number of small processors running in parallel, can be greatly increased. In many cases the computation speed increases linearly with the number of parallel processors. Such parallel calculation is also significant from the viewpoint of efficient energy utilization. Energy usage increases linearly with the number of parallel processors and increases as the square ratio of processor frequency. Therefore, the frequency of the parallel arithmetic processor is not required to be too high, and the energy used is relatively small, but various hardware acceleration in the market still has various problems.
Although the target detection hardware accelerator and the acceleration method disclosed in the No. CN112230884B can reduce the time and power consumption required by the accelerator for data transportation and improve the working efficiency of the accelerator, the hardware acceleration method based on the tracking algorithm is proposed for solving the problems that the target acquisition cannot be realized through the tracking algorithm, the target to be detected is accurately processed, and then the hardware is accelerated in the existing hardware acceleration technology.
Disclosure of Invention
The present invention is directed to a hardware acceleration method based on a tracking algorithm, so as to solve the problems set forth in the above background art.
In order to achieve the purpose, the invention provides the following technical scheme: a hardware acceleration method based on a tracking algorithm comprises the following steps:
s1, receiving and segmenting data stream information by hardware: the CPU receives the data stream information, separates and compresses the data stream information to obtain video data, places the compressed video data at the separation part in a system memory, and divides the compressed video data into a plurality of small parts;
s2, the CPU distributes the compressed video data to the GPU and the APUs for processing: the CPU transmits the compressed video data divided into a plurality of small parts to the GPU and the APU for data parallelization and operation parallelization processing, so that the compressed video data stream is decompressed, and decompressed data information is stored in the sound card;
s3, the GPU and the APU realize post-processing of the video data stream through the algorithm of the GPU and the APU: the GPU and the APU realize target locking on characteristic objects or characters in the video through a tracking algorithm, then realize tracking and positioning on the target through Kalman filtering, particle filtering, meanshift, camshift, MOSSE, CSK, KCF, BACF or SAMF, then realize fine processing on the small video data stream of the area where the target is located, so that the video data stream can be processed in a higher definition mode, then the small video data stream of the area where the non-target is located is processed in a fuzzy mode, and further the speed of hardware processing is improved;
s4, algorithm parallelization is adopted for the algorithm processing process of the video data stream: when the GPU and the APU perform algorithm processing on the video data stream, algorithm parallelization is adopted, so that the GPU and the APU can simultaneously utilize own processing space, the running speed and the efficiency of hardware can be effectively accelerated, and the accelerated processing of the video data stream is completed;
s5, receiving and playing the processed video data stream by the CPU: when the GPU and the APU finish processing the video data stream, the CPU receives the video data, splices the video data stream according to the segmented data sequence, arranges the video data stream by adopting a bubble sorting method, and then finishes quickly playing the video data stream.
Preferably, the segmentation processing in S1 adopts a space-domain algorithm and a time-domain algorithm, the space-domain algorithm is performed by a macroblock, the processing of each pixel is performed locally in a space domain, each pixel is processed in sequence and then an output is generated, and there is no result accumulation effect when the output is generated from the previous or next pixel;
the time domain algorithm looks for where a change or similarity occurs in a particular pixel or pixel region between frames and converts the interlaced field to progressive format by line doubling or filtering.
Preferably, the algorithm of the macroblock uses the average value calculated by adjacent pixels around a certain pixel to perform low-pass filtering on the pixel, that is, 5 × 5 two-dimensional convolution kernels are used for detecting edge information in an image and drawing related information from a large number of pixels around the pixel; the spatial processing and the temporal processing are combined to form a new category, which is called "space-time processing", each image frame is decomposed into a plurality of macro blocks, namely an area of 16 × 16 pixels, and then the macro blocks are tracked and compared frame by frame to extract approximate values of motion estimation and compensation.
Preferably, the data parallelization in S2 is to divide data blocks into a plurality of small blocks capable of being processed simultaneously, where the data blocks can implement 16 by 16 and 32 by 32 data blocks, and the data parallelization needs to have a Stream object, call a parallel method thereof to enable the Stream object to have a parallel operation capability, or create a Stream from an aggregation class to call a parallel Stream to immediately obtain a Stream with a parallel capability.
Preferably, the operation parallelization in S2 is a detail optimization of the algorithm processing, so that generation of intermediate variables is reduced as much as possible, and calculation is performed as one step as possible.
Preferably, the target in S3 is subjected to tracking, positioning and particle filtering, and the particle filtering includes the following operation steps:
s301, initialization stage-extraction of target features: selecting a target, and extracting the characteristics of a target area, namely a target color histogram;
s302, initializing particles: a) Scattering particles evenly over the entire image, b) scattering particles in a gaussian distribution near the target of the previous frame;
s303, a searching stage: counting the color histogram of each particle, comparing with the color histogram of the target model, calculating the weight according to the Papanicolaou distance, normalizing the weight to make the weight of all the particles added to be 1,
s304, resampling particles: a small number of particles are placed at a place with low similarity, a plurality of particles are placed at a place with high similarity, and the particles with low weight are discarded;
s305, state transition: according to
st = Ast-1+ wt-1s _t = As _ { t-1} + w _ { t-1} st = Ast-1+ wt-1 calculating the position of the particle at the next moment;
s306, an observation stage: calculating the similarity between each particle and the target characteristic, and updating the weight of the particle;
s307, decision stage: calculating a weighted average value of the coordinates and the similarity to obtain the position of the next frame of the tracking target;
s308, repeating S303, S304, S305, S306 and S307 according to the predicted position.
Preferably, the algorithm parallelization in S4 adopts a PRAM model, which has a centralized shared memory and an instruction controller, and performs implicit synchronous computation by exchanging data through R/W of SM.
Preferably, the step of comparing in the bubble sort method in S5 is as follows:
s501, comparing adjacent elements, and if the first element is larger than the second element, exchanging the two elements;
s502, performing the same work on each pair of adjacent elements, namely, from the first pair to the last pair at the end, wherein the last element is the maximum number after the work is completed;
s503, repeating the steps for all the elements except the last element;
s504, repeating the above steps for fewer and fewer elements each time until no pair of numbers needs to be compared.
Preferably, the dividing process numbers the divided video data stream ends when dividing, and then performs effective bubble sorting process according to the numbers when performing the bubble sorting method, so as to realize sequential playing of the video data stream.
Preferably, in the high-definition processing in S3, a cubic convolution method is adopted for the scaling processing of the video signal, each pixel of the output image of the cubic convolution method is a result of operation of 16 pixels of the original image, and when cubic convolution interpolation is used, the value of the target point is obtained by resampling the values of 16 known surrounding pixels.
Compared with the prior art, the invention has the beneficial effects that:
the invention realizes the segmentation of the data information, so that the data information can be divided into a plurality of small blocks for processing, then realizes the fine processing of the target area through the tracking algorithm, and performs the fuzzy processing on other areas, thereby effectively improving the accelerated operation of hardware, and effectively improving the operation speed of the hardware through the algorithm parallelization, the data parallelization and the operation parallelization adopted when the hardware is processed, thereby improving the acceleration of the hardware through the fuzzy processing and the processing of the hardware algorithm, and improving the operation speed and the efficiency of the hardware.
Drawings
FIG. 1 is a schematic flow chart of the steps of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution: a hardware acceleration method based on a tracking algorithm comprises the following steps:
s1, receiving and segmenting data stream information by hardware: the CPU receives the data stream information, separates and compresses the data stream information to obtain video data, places the compressed video data at the separation part in a system memory, and divides the compressed video data into a plurality of small parts;
s2, the CPU distributes the compressed video data to the GPU and the APUs for processing: the CPU transmits the compressed video data divided into a plurality of small parts to the GPU and the APU for data parallelization and operation parallelization processing, so that the compressed video data stream is decompressed, and decompressed data information is stored in the sound card;
s3, the GPU and the APU realize post processing of the video data stream through the algorithm of the GPU and the APU: the GPU and the APU realize target locking on characteristic objects or characters in the video through a tracking algorithm, then realize tracking and positioning on the target through Kalman filtering, particle filtering, meanshift, camshift, MOSSE, CSK, KCF, BACF or SAMF, then realize fine processing on the small video data stream of the area where the target is located, so that the video data stream can be processed in a higher definition mode, then the small video data stream of the area where the non-target is located is processed in a fuzzy mode, and further the speed of hardware processing is improved;
s4, algorithm parallelization is adopted for the algorithm processing process of the video data stream: when the GPU and the APU process the video data stream by the algorithm, the algorithm is parallelized, so that the GPU and the APU can simultaneously utilize the processing space of the GPU and the APU, the running speed and the efficiency of hardware can be effectively accelerated, and the accelerated processing of the video data stream is finished;
s5, receiving and playing the processed video data stream by the CPU: when the GPU and the APU finish processing the video data stream, the CPU receives the video data, splices the video data stream according to the segmented data sequence, arranges the video data stream by adopting a bubble sorting method, and then finishes quickly playing the video data stream.
In this embodiment, it is preferable that the segmentation processing in S1 employs a spatial domain algorithm and a time domain algorithm, the spatial domain algorithm is performed by a macroblock, the processing of each pixel is performed locally in a spatial domain, each pixel is processed in sequence and then an output is generated, and there is no result accumulation effect when the output is generated from the previous or next pixel;
the time domain algorithm looks for where a change or similarity occurs in a particular pixel or pixel region between frames and converts the interlaced field to progressive format by line doubling or filtering.
In this embodiment, preferably, the algorithm of the macroblock performs low-pass filtering on a pixel by using an average value calculated by neighboring pixels around the pixel, that is, detecting edge information in an image by using a two-dimensional convolution kernel of 5 × 5, and drawing related information from a large number of pixels around the pixel; the spatial processing and the temporal processing are combined to form a new category, which is called "space-time processing", each image frame is decomposed into a plurality of macro blocks, namely an area of 16 × 16 pixels, and then the macro blocks are tracked and compared frame by frame to extract approximate values of motion estimation and compensation.
In this embodiment, preferably, the data parallelization in S2 is to divide the data block into a plurality of small blocks capable of being processed simultaneously, where the data block can implement 16 by 16 and 32 by 32 data blocks, and the data parallelization needs to have a Stream object, call a parallel method of the Stream object to enable the Stream object to have a parallel operation capability, or create a Stream from a set class to call a parallel Stream to immediately obtain a Stream with a parallel capability.
In this embodiment, preferably, the operation parallelization in S2 is a detail optimization of algorithm processing, so as to reduce generation of intermediate variables as much as possible and achieve calculation in one step as much as possible.
In this embodiment, preferably, the target in S3 is tracked, positioned, and particle filtered, and the operation steps of the particle filtering are as follows:
s301, initialization stage-target feature extraction: selecting a target, and extracting the characteristics of a target area, namely a target color histogram;
s302, initializing particles: a) Scattering particles evenly over the entire image, b) scattering particles in a gaussian distribution near the target of the previous frame;
s303, a search stage: counting the color histogram of each particle, comparing with the color histogram of the target model, calculating the weight according to the Papanicolaou distance, normalizing the weight to make the weight of all the particles added to be 1,
s304, particle resampling: a small number of particles are placed at a place with low similarity, a plurality of particles are placed at a place with high similarity, and the particles with low weight are discarded;
s305, state transition: according to
st = Ast-1+ wt-1s _t = As _ { t-1} + w _ { t-1} st = Ast-1+ wt-1 calculating the position of the particle at the next moment;
s306, an observation stage: calculating the similarity between each particle and the target characteristic, and updating the weight of the particle;
s307, decision stage: calculating a weighted average value of the coordinates and the similarity to obtain the position of the next frame of the tracking target;
s308, repeating S303, S304, S305, S306 and S307 according to the predicted position.
In this embodiment, preferably, the model adopted for parallelizing the algorithm in S4 is a PRAM model, which has a centralized shared memory and an instruction controller, and performs implicit synchronous computation by exchanging data through R/W of SM.
In this embodiment, preferably, the step of comparing the bubble sort method in S5 is as follows:
s501, comparing adjacent elements, and if the first element is larger than the second element, exchanging the two elements;
s502, performing the same work on each pair of adjacent elements, namely, from the first pair to the last pair at the end, wherein the last element is the maximum number after the work is completed;
s503, repeating the steps for all the elements except the last element;
s504, repeating the above steps for fewer and fewer elements each time until no pair of numbers needs to be compared.
In this embodiment, preferably, the dividing process numbers the divided video data stream ends when dividing, and then performs effective bubble sorting process according to the numbers when performing the bubble sorting method, so as to implement sequential playing of the video data stream.
In this embodiment, preferably, in the high definition processing in S3, a cubic convolution method is adopted for the scaling processing of the video signal, each pixel of the output image of the cubic convolution method is a result of operation of 16 pixels of the original image, and when cubic convolution interpolation is used, the value of the target point is calculated by resampling values of 16 known pixels around the target point.
The working principle and the using process of the invention are as follows:
firstly, receiving and dividing data stream information by hardware: the CPU receives the data stream information, separates and compresses the data stream information to obtain video data, places the compressed video data at the separation part in a system memory, and divides the compressed video data into a plurality of small parts;
secondly, the CPU distributes the compressed video data to the GPU and the APUs for processing: the CPU transmits the compressed video data divided into a plurality of small parts to the GPU and the APU for data parallelization and operation parallelization processing, so that the compressed video data stream is decompressed, and decompressed data information is stored in the sound card;
thirdly, the GPU and the APU realize post-processing of the video data stream through self algorithms: the GPU and the APU realize target locking on characteristic objects or characters in the video through a tracking algorithm, then realize tracking and positioning on the target through Kalman filtering, particle filtering, meanshift, camshift, MOSSE, CSK, KCF, BACF or SAMF, then realize fine processing on the small video data stream of the area where the target is located, so that the video data stream can be processed in a higher definition mode, then the small video data stream of the area where the non-target is located is processed in a fuzzy mode, and further the speed of hardware processing is improved;
fourthly, algorithm parallelization is adopted for the algorithm processing process of the video data stream: when the GPU and the APU process the video data stream by the algorithm, the algorithm is parallelized, so that the GPU and the APU can simultaneously utilize the processing space of the GPU and the APU, the running speed and the efficiency of hardware can be effectively accelerated, and the accelerated processing of the video data stream is finished;
fifthly, receiving and playing the processed video data stream by the CPU: when the GPU and the APU finish processing the video data stream, the CPU receives the video data, splices the video data stream according to the partitioned data sequence, arranges the video data stream by using a bubble sorting method, and then finishes quickly playing the video data stream.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A hardware acceleration method based on a tracking algorithm is characterized by comprising the following steps:
s1, receiving and segmenting data stream information by hardware: the CPU receives the data stream information, separates and compresses the video data from the data stream information, and places the separated and compressed video data in a system memory to realize the division of the compressed video data and divide the compressed video data into a plurality of small parts;
s2, the CPU distributes the compressed video data to the GPU and the APUs for processing: the CPU transmits the compressed video data divided into a plurality of small parts to the GPU and the APU for data parallelization and operation parallelization processing, so that the compressed video data stream is decompressed, and decompressed data information is stored in the sound card;
s3, the GPU and the APU realize post-processing of the video data stream through the algorithm of the GPU and the APU: the GPU and the APU realize target locking on characteristic objects or characters in the video through a tracking algorithm, then realize tracking and positioning on the target through Kalman filtering, particle filtering, meanshift, camshift, MOSSE, CSK, KCF, BACF or SAMF, then realize fine processing on the small video data stream of the area where the target is located, so that the video data stream can be processed in a higher definition mode, then the small video data stream of the area where the non-target is located is processed in a fuzzy mode, and further the speed of hardware processing is improved;
s4, algorithm parallelization is adopted for the algorithm processing process of the video data stream: when the GPU and the APU process the video data stream by the algorithm, the algorithm is parallelized, so that the GPU and the APU can simultaneously utilize the processing space of the GPU and the APU, the running speed and the efficiency of hardware can be effectively accelerated, and the accelerated processing of the video data stream is finished;
s5, receiving and playing the processed video data stream by the CPU: when the GPU and the APU finish processing the video data stream, the CPU receives the video data, splices the video data stream according to the partitioned data sequence, arranges the video data stream by using a bubble sorting method, and then finishes quickly playing the video data stream.
2. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the segmentation processing in the S1 adopts a space domain algorithm and a time domain algorithm, the space domain algorithm is carried out through a macro block, the processing of each pixel is carried out in a space domain part, each pixel is processed according to a sequence and then an output is generated, and the output generated from the previous pixel or the next pixel has no result accumulation effect;
the time domain algorithm looks for where a change or similarity occurs in a particular pixel or pixel region between frames and converts the interlaced field to progressive scan format by line doubling or filtering.
3. The hardware acceleration method based on tracking algorithm of claim 2, characterized in that: the spatial domain algorithm performs low-pass filtering on a pixel by using an average value calculated by adjacent pixels around the pixel, namely, 5 × 5 two-dimensional convolution kernels are used for detecting edge information in an image and drawing related information from the surrounding pixels; the spatial domain algorithm and the time domain algorithm are combined to form a new category called space-time processing, each image frame is decomposed into a plurality of macro blocks, namely an area of 16 × 16 pixels, then the macro blocks are tracked and compared frame by frame, and approximate values of motion estimation and compensation are extracted from the macro blocks.
4. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the data parallelization in S2 is to divide data into blocks and divide the data blocks into a plurality of small blocks that can be processed simultaneously, where the data blocks implement two data blocks of 16 by 16 and 32 by 32, and the data parallelization needs to have a Stream object, call a parallel method of the Stream object to enable the Stream object to have a parallel operation capability, or create a Stream from a set class to call the parallel method to immediately obtain a Stream with the parallel capability.
5. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the operation parallelization in the S2 is detail optimization of algorithm processing, so that the generation of intermediate variables is reduced as much as possible, and the calculation is completed as much as possible.
6. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: and the target in the S3 is tracked, positioned and selected to be subjected to particle filtering, and the particle filtering comprises the following operation steps:
s301, initialization stage-extraction of target features: selecting a target, and extracting the characteristics of a target area, namely a target color histogram;
s302, initializing particles: a) Scattering particles evenly over the entire image, b) scattering particles in a gaussian distribution near the target of the previous frame;
s303, a search stage: counting the color histogram of each particle, comparing with the color histogram of the target model, calculating the weight according to the Papanicolaou distance, normalizing the weight to make the weight of all the particles added to be 1,
s304, particle resampling: a small number of particles are placed at the position with low similarity in the comparison with the target model color histogram, a large number of particles are placed at the position with high similarity, and the particles with low weight are discarded;
s305, state transition: according to
st = Ast-1+ wt-1s _t = As _ { t-1} + w _ { t-1} st = Ast-1+ wt-1 calculating the position of the particle at the next moment;
s306, an observation stage: calculating the similarity between each particle and the target characteristic, and updating the weight of the particle;
s307, a decision stage: calculating a weighted average value of the coordinates and the similarity to obtain the position of the next frame of the tracking target;
s308, repeating S303, S304, S305, S306 and S307 according to the predicted position.
7. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the model adopted by the algorithm parallelization in the S4 is a PRAM model, the PRAM model is provided with a centralized shared memory and an instruction controller, data are exchanged through the R/W of the SM, and implicit synchronous calculation is carried out.
8. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the step of comparing the bubble sorting method in S5 is as follows:
s501, comparing adjacent elements, and if the first element is larger than the second element, exchanging the two elements;
s502, performing the same work on each pair of adjacent elements, namely, from the first pair to the last pair at the end, wherein the last element is the maximum number after the work is completed;
s503, repeating the steps for all the elements except the last element;
s504, repeating the above steps for fewer and fewer elements each time until no pair of numbers needs to be compared.
9. The hardware acceleration method based on tracking algorithm of claim 8, characterized in that: and when the segmentation processing is carried out, numbering processing is carried out on the segmented video data stream ends, and then effective bubble sorting processing is carried out according to the numbering when the bubble sorting method is carried out, so that the video data stream is sequentially played.
10. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: in the high-definition processing in S3, a cubic convolution method is adopted for scaling the video signal, each pixel of the image output by the cubic convolution method is a result of operation of 16 pixels of the original image, and when interpolation by the cubic convolution method is used, a value of a target point is obtained by resampling values of 16 known pixels around the target point.
CN202110723521.XA 2021-06-29 2021-06-29 Hardware acceleration method based on tracking algorithm Active CN113436232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110723521.XA CN113436232B (en) 2021-06-29 2021-06-29 Hardware acceleration method based on tracking algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110723521.XA CN113436232B (en) 2021-06-29 2021-06-29 Hardware acceleration method based on tracking algorithm

Publications (2)

Publication Number Publication Date
CN113436232A CN113436232A (en) 2021-09-24
CN113436232B true CN113436232B (en) 2023-03-24

Family

ID=77757427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110723521.XA Active CN113436232B (en) 2021-06-29 2021-06-29 Hardware acceleration method based on tracking algorithm

Country Status (1)

Country Link
CN (1) CN113436232B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732637A (en) * 2021-01-22 2021-04-30 湖南师范大学 Bayesian resampling-based FPGA hardware implementation method and device for particle filtering, and target tracking method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2546427A1 (en) * 2003-11-19 2005-06-02 Reuven Bakalash Method and system for multiple 3-d graphic pipeline over a pc bus
CN112927127A (en) * 2021-03-11 2021-06-08 华南理工大学 Video privacy data fuzzification method running on edge device
CN112669196B (en) * 2021-03-16 2021-06-08 浙江欣奕华智能科技有限公司 Method and equipment for optimizing data by factor graph in hardware acceleration engine

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732637A (en) * 2021-01-22 2021-04-30 湖南师范大学 Bayesian resampling-based FPGA hardware implementation method and device for particle filtering, and target tracking method

Also Published As

Publication number Publication date
CN113436232A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
Shafiee et al. Fast YOLO: A fast you only look once system for real-time embedded object detection in video
CN105205441B (en) Method and apparatus for extracting feature regions from point cloud
CN113989276B (en) Detection method and detection device based on depth image and camera equipment
EP2958077B1 (en) Method and apparatus for generating temporally consistent superpixels
Yi et al. Real-time integrated face detection and recognition on embedded GPGPUs
CN111914601A (en) Efficient batch face recognition and matting system based on deep learning
Chen et al. Efficient parallel connected component labeling with a coarse-to-fine strategy
Mahmoudi et al. Multi-CPU/multi-GPU based framework for multimedia processing
CN114937159A (en) Binocular matching method based on GPU acceleration
Cheng et al. Realtime background subtraction from dynamic scenes
CN113436232B (en) Hardware acceleration method based on tracking algorithm
Wasala et al. Real-time HOG+ SVM based object detection using SoC FPGA for a UHD video stream
Safia et al. Image segmentation using continuous cellular automata
Mahmoudi et al. Taking advantage of heterogeneous platforms in image and video processing
Jia et al. NSLIC: SLIC superpixels based on nonstationarity measure
Messom et al. Stream processing of integral images for real-time object detection
CA2780710A1 (en) Video segmentation method
CN109493349B (en) Image feature processing module, augmented reality equipment and corner detection method
Li et al. Combining localized oriented rectangles and motion history image for human action recognition
Jiang et al. Superpixel segmentation based gradient maps on RGB-D dataset
Pham Parallel implementation of geodesic distance transform with application in superpixel segmentation
Denoulet et al. Implementing motion markov detection on general purpose processor and associative mesh
Vineet et al. Solving multilabel mrfs using incremental α-expansion on the gpus
Cheng et al. Improving sampling criterion for alpha matting
Sawant et al. Performance evaluation of feature extraction algorithm on GPGPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant