CN113436232A - Hardware acceleration method based on tracking algorithm - Google Patents

Hardware acceleration method based on tracking algorithm Download PDF

Info

Publication number
CN113436232A
CN113436232A CN202110723521.XA CN202110723521A CN113436232A CN 113436232 A CN113436232 A CN 113436232A CN 202110723521 A CN202110723521 A CN 202110723521A CN 113436232 A CN113436232 A CN 113436232A
Authority
CN
China
Prior art keywords
video data
algorithm
processing
data stream
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110723521.XA
Other languages
Chinese (zh)
Other versions
CN113436232B (en
Inventor
胡铭德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lexin Information Technology Co ltd
Original Assignee
Shanghai Lexin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lexin Information Technology Co ltd filed Critical Shanghai Lexin Information Technology Co ltd
Priority to CN202110723521.XA priority Critical patent/CN113436232B/en
Publication of CN113436232A publication Critical patent/CN113436232A/en
Application granted granted Critical
Publication of CN113436232B publication Critical patent/CN113436232B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a hardware acceleration method based on a tracking algorithm; s1, receiving and dividing the data flow information by hardware; s2, the CPU distributes the compressed video data to the GPU and the APU for processing; s3, the GPU and the APU realize post processing of the video data stream through the algorithm of the GPU and the APU; s4, carrying out algorithm parallelization on the algorithm processing process of the video data stream; s5, the processed video data stream is received and played by the CPU; the invention realizes the segmentation of the data information, so that the data information can be divided into a plurality of small blocks for processing, the accelerated running of hardware can be effectively improved, and the running speed of the hardware and the running speed and efficiency of the hardware can be effectively improved by adopting algorithm parallelization, data parallelization and operation parallelization when the hardware is processed.

Description

Hardware acceleration method based on tracking algorithm
Technical Field
The invention belongs to the technical field of hardware acceleration, and particularly relates to a hardware acceleration method based on a tracking algorithm.
Background
Hardware acceleration refers to a technique for reducing the workload of a central processing unit by allocating a very computationally intensive job to dedicated hardware for processing in a computer. This technique is often used, in particular, in image processing, the structure of the central processor being such that it can carry out a wide variety of different instructions in a short time. What instructions it can process is mainly limited by software. But some repetitive tasks cannot be handled very efficiently and quickly due to the structure of the central processor. These special hardware elements do not have to be as flexible as the central processor and therefore their hardware design already takes into account the need to optimize the handling of these special problems, so that the central processor has time to handle other tasks. Some tasks can be solved very efficiently by breaking them down into thousands of smaller tasks. Such as fourier transforming a certain frequency band or rendering a small image. These tasklets can be computed in parallel independent of each other. The overall computational speed for processing these special tasks by a large number of parallel computations, i.e. using a large number of small processors running in parallel, can be greatly increased. In many cases the computation speed increases linearly with the number of parallel processors. Such parallel calculation is also significant from the viewpoint of efficient energy utilization. Energy usage increases linearly with the number of parallel processors and increases as the square ratio of processor frequency. Therefore, the frequency of the parallel arithmetic processor is not required to be too high, and the energy used is relatively small, but various hardware acceleration in the market still has various problems.
Although the target detection hardware accelerator and the acceleration method disclosed by the grant publication No. CN112230884B can reduce the time and power consumption required by the accelerator for data transportation and improve the working efficiency of the accelerator, the method does not solve the problems that the existing hardware acceleration technology cannot realize target acquisition through a tracking algorithm, realize accurate processing of a target to be detected, and then realize acceleration of hardware, and therefore we propose a hardware acceleration method based on a tracking algorithm.
Disclosure of Invention
The present invention is directed to a hardware acceleration method based on a tracking algorithm, so as to solve the problems set forth in the above background art.
In order to achieve the purpose, the invention provides the following technical scheme: a hardware acceleration method based on a tracking algorithm comprises the following steps:
s1, the hardware realizes the receiving and dividing process of the data flow information: the CPU receives the data stream information, separates and compresses the data stream information to obtain video data, places the compressed video data at the separation part in a system memory, and divides the compressed video data into a plurality of small parts;
s2, the CPU distributes the compressed video data to the GPU and the APUs for processing: the CPU transmits the compressed video data divided into a plurality of small parts to the GPU and the APU for data parallelization and operation parallelization processing, so that the compressed video data stream is decompressed, and decompressed data information is stored in the sound card;
s3, GPU and APU realize post-processing of video data stream through their own algorithm: the GPU and the APU realize target locking on characteristic objects or characters in the video through a tracking algorithm, then realize tracking and positioning on the target through Kalman filtering, particle filtering, Meanshift, Camshift, MOSSE, CSK, KCF, BACF or SAMF, then realize fine processing on the small video data stream of the area where the target is located, so that the video data stream can be processed in a higher definition mode, then the small video data stream of the area where the non-target is located is processed in a fuzzy mode, and further the speed of hardware processing is improved;
s4, adopting algorithm parallelization for the algorithm processing process of the video data stream: when the GPU and the APU process the video data stream by the algorithm, the algorithm is parallelized, so that the GPU and the APU can simultaneously utilize the processing space of the GPU and the APU, the running speed and the efficiency of hardware can be effectively accelerated, and the accelerated processing of the video data stream is finished;
s5, the processed video data stream is received and played by the CPU: when the GPU and the APU finish processing the video data stream, the CPU receives the video data, splices the video data stream according to the segmented data sequence, arranges the video data stream by adopting a bubble sorting method, and then finishes quickly playing the video data stream.
Preferably, the segmentation processing in S1 adopts a spatial domain algorithm and a temporal domain algorithm, the spatial domain algorithm is performed by a macroblock, the processing of each pixel is performed locally in a spatial domain, each pixel is processed in sequence and then an output is generated, and there is no result accumulation effect when the output is generated from the previous or next pixel;
the time domain algorithm looks for where a change or similarity occurs in a particular pixel or pixel region between frames and converts the interlaced field to progressive format by line doubling or filtering.
Preferably, the algorithm of the macroblock uses the average value calculated by adjacent pixels around a certain pixel to perform low-pass filtering on the pixel, that is, 5 × 5 two-dimensional convolution kernels are used for detecting edge information in an image and drawing related information from a large number of pixels around the pixel; the spatial processing and the temporal processing are combined to form a new category, which is called "space-time processing", each image frame is decomposed into a plurality of macro blocks, namely an area of 16 × 16 pixels, and then the macro blocks are tracked and compared frame by frame to extract approximate values of motion estimation and compensation.
Preferably, the data parallelization in S2 is to divide the data block into several small blocks capable of being processed simultaneously, where the data block can implement 16 by 16 and 32 by 32 data blocks, and the data parallelization needs to have a Stream object, call its parallel method to enable it to have the capability of parallel operation, or create a Stream from a collection class to call parallel Stream to immediately obtain a Stream with the capability of parallel.
Preferably, the operation parallelization in S2 is a detail optimization of the algorithm processing, so as to reduce generation of intermediate variables as much as possible and calculate in one step as much as possible.
Preferably, the target in S3 is tracked, positioned and particle-filtered, and the particle-filtered operation steps are as follows:
s301, initialization stage-extraction of target features: selecting a target, and extracting the characteristics of a target area, namely a target color histogram;
s302, initializing particles: a) scattering particles evenly over the entire image, b) scattering particles in a gaussian distribution near the target of the previous frame;
s303, a search stage: counting the color histogram of each particle, comparing with the color histogram of the target model, calculating the weight according to the Papanicolaou distance, normalizing the weight to make the weight of all the particles added to be 1,
s304, particle resampling: a small number of particles are placed at a place with low similarity, a plurality of particles are placed at a place with high similarity, and the particles with low weight are discarded;
s305, state transition: according to
Calculating the position of the particle at the next moment by st-Ast-1 + wt-1s _ t ═ As _ { t-1} + w _ { t-1} st ═ Ast-1+ wt-1;
s306, an observation stage: calculating the similarity between each particle and the target characteristic, and updating the weight of the particle;
s307, decision stage: calculating a weighted average value of the coordinates and the similarity to obtain the position of the next frame of the tracking target;
s308, repeating S303, S304, S305, S306 and S307 according to the predicted position.
Preferably, the algorithm parallelization in S4 adopts a PRAM model having a centralized shared memory and an instruction controller, and implicitly synchronizes the calculation by exchanging data through R/W of the SM.
Preferably, the step of comparing the bubble sort method in S5 is as follows:
s501, comparing adjacent elements, and if the first element is larger than the second element, exchanging the two elements;
s502, performing the same work on each pair of adjacent elements, namely, from the first pair to the last pair at the end, wherein the last element is the maximum number after the work is completed;
s503, repeating the steps for all the elements except the last element;
s504, repeating the above steps for fewer and fewer elements each time until no pair of numbers needs to be compared.
Preferably, the dividing process numbers the divided video data stream ends when dividing, and then performs effective bubble sorting process according to the numbers when performing the bubble sorting method, so as to realize sequential playing of the video data stream.
Preferably, in the high definition processing in S3, a cubic convolution method is adopted for the scaling processing of the video signal, each pixel of the output image of the cubic convolution method is a result of operation of 16 pixels of the original image, and when cubic convolution interpolation is used, the value of the target point is calculated by resampling the values of 16 known surrounding pixels.
Compared with the prior art, the invention has the beneficial effects that:
the invention realizes the segmentation of the data information, so that the data information can be divided into a plurality of small blocks for processing, then realizes the fine processing of the target area through the tracking algorithm, and performs the fuzzy processing on other areas, thereby effectively improving the accelerated operation of hardware, and effectively improving the operation speed of the hardware through the algorithm parallelization, the data parallelization and the operation parallelization adopted when the hardware is processed, thereby improving the acceleration of the hardware through the fuzzy processing and the processing of the hardware algorithm, and improving the operation speed and the efficiency of the hardware.
Drawings
FIG. 1 is a schematic flow chart of the steps of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution: a hardware acceleration method based on a tracking algorithm comprises the following steps:
s1, the hardware realizes the receiving and dividing process of the data flow information: the CPU receives the data stream information, separates and compresses the data stream information to obtain video data, places the compressed video data at the separation part in a system memory, and divides the compressed video data into a plurality of small parts;
s2, the CPU distributes the compressed video data to the GPU and the APUs for processing: the CPU transmits the compressed video data divided into a plurality of small parts to the GPU and the APU for data parallelization and operation parallelization processing, so that the compressed video data stream is decompressed, and decompressed data information is stored in the sound card;
s3, GPU and APU realize post-processing of video data stream through their own algorithm: the GPU and the APU realize target locking on characteristic objects or characters in the video through a tracking algorithm, then realize tracking and positioning on the target through Kalman filtering, particle filtering, Meanshift, Camshift, MOSSE, CSK, KCF, BACF or SAMF, then realize fine processing on the small video data stream of the area where the target is located, so that the video data stream can be processed in a higher definition mode, then the small video data stream of the area where the non-target is located is processed in a fuzzy mode, and further the speed of hardware processing is improved;
s4, adopting algorithm parallelization for the algorithm processing process of the video data stream: when the GPU and the APU process the video data stream by the algorithm, the algorithm is parallelized, so that the GPU and the APU can simultaneously utilize the processing space of the GPU and the APU, the running speed and the efficiency of hardware can be effectively accelerated, and the accelerated processing of the video data stream is finished;
s5, the processed video data stream is received and played by the CPU: when the GPU and the APU finish processing the video data stream, the CPU receives the video data, splices the video data stream according to the segmented data sequence, arranges the video data stream by adopting a bubble sorting method, and then finishes quickly playing the video data stream.
In this embodiment, it is preferable that the segmentation processing in S1 employs a spatial domain algorithm and a temporal domain algorithm, the spatial domain algorithm is performed by a macroblock, the processing of each pixel is performed locally in a spatial domain, each pixel is processed in sequence and then an output is generated, and there is no result accumulation effect when the output is generated from the previous or next pixel;
the time domain algorithm looks for where a change or similarity occurs in a particular pixel or pixel region between frames and converts the interlaced field to progressive format by line doubling or filtering.
In this embodiment, preferably, the algorithm of the macroblock performs low-pass filtering on a certain pixel by using an average value calculated by neighboring pixels around the pixel, that is, detecting edge information in an image by using a 5 × 5 two-dimensional convolution kernel, and drawing related information from a large number of surrounding pixels; the spatial processing and the temporal processing are combined to form a new category, which is called "space-time processing", each image frame is decomposed into a plurality of macro blocks, namely an area of 16 × 16 pixels, and then the macro blocks are tracked and compared frame by frame to extract approximate values of motion estimation and compensation.
In this embodiment, preferably, the data parallelization in S2 is to divide the data block into several small blocks that can be processed simultaneously, where the data block can implement 16 by 16 and 32 by 32 data blocks, and the data parallelization needs to have a Stream object, call its parallel method to enable it to have the capability of parallel operation, or create a Stream from an aggregate class to call a parallel Stream to immediately obtain a Stream with the capability of parallel.
In this embodiment, preferably, the parallelization of operations in S2 is a detailed optimization of algorithm processing, so as to reduce generation of intermediate variables as much as possible and perform calculation as much as possible in one step.
In this embodiment, preferably, the target in S3 is tracked, positioned, and particle filtered, and the operation steps of the particle filtering are as follows:
s301, initialization stage-extraction of target features: selecting a target, and extracting the characteristics of a target area, namely a target color histogram;
s302, initializing particles: a) scattering particles evenly over the entire image, b) scattering particles in a gaussian distribution near the target of the previous frame;
s303, a search stage: counting the color histogram of each particle, comparing with the color histogram of the target model, calculating the weight according to the Papanicolaou distance, normalizing the weight to make the weight of all the particles added to be 1,
s304, particle resampling: a small number of particles are placed at a place with low similarity, a plurality of particles are placed at a place with high similarity, and the particles with low weight are discarded;
s305, state transition: according to
Calculating the position of the particle at the next moment by st-Ast-1 + wt-1s _ t ═ As _ { t-1} + w _ { t-1} st ═ Ast-1+ wt-1;
s306, an observation stage: calculating the similarity between each particle and the target characteristic, and updating the weight of the particle;
s307, decision stage: calculating a weighted average value of the coordinates and the similarity to obtain the position of the next frame of the tracking target;
s308, repeating S303, S304, S305, S306 and S307 according to the predicted position.
In this embodiment, preferably, the model adopted by the parallelization of the algorithm in S4 is a PRAM model, which has a centralized shared memory and an instruction controller, and performs implicit synchronous computation by exchanging data through R/W of the SM.
In this embodiment, preferably, the step of comparing the bubble sort method in S5 is as follows:
s501, comparing adjacent elements, and if the first element is larger than the second element, exchanging the two elements;
s502, performing the same work on each pair of adjacent elements, namely, from the first pair to the last pair at the end, wherein the last element is the maximum number after the work is completed;
s503, repeating the steps for all the elements except the last element;
s504, repeating the above steps for fewer and fewer elements each time until no pair of numbers needs to be compared.
In this embodiment, preferably, the dividing process numbers the divided video data stream ends when dividing, and then performs effective bubble sorting process according to the numbers when performing the bubble sorting method, so as to implement sequential playing of the video data stream.
In this embodiment, preferably, in the high definition processing in S3, a cubic convolution method is adopted for the scaling processing of the video signal, each pixel of the output image of the cubic convolution method is a result of operation of 16 pixels of the original image, and when cubic convolution interpolation is used, the value of the target point is calculated by resampling values of 16 known surrounding pixels.
The working principle and the using process of the invention are as follows:
firstly, receiving and dividing data stream information by hardware: the CPU receives the data stream information, separates and compresses the data stream information to obtain video data, places the compressed video data at the separation part in a system memory, and divides the compressed video data into a plurality of small parts;
secondly, the CPU distributes the compressed video data to the GPU and the APUs for processing: the CPU transmits the compressed video data divided into a plurality of small parts to the GPU and the APU for data parallelization and operation parallelization processing, so that the compressed video data stream is decompressed, and decompressed data information is stored in the sound card;
thirdly, the GPU and the APU realize post-processing of the video data stream through self algorithms: the GPU and the APU realize target locking on characteristic objects or characters in the video through a tracking algorithm, then realize tracking and positioning on the target through Kalman filtering, particle filtering, Meanshift, Camshift, MOSSE, CSK, KCF, BACF or SAMF, then realize fine processing on the small video data stream of the area where the target is located, so that the video data stream can be processed in a higher definition mode, then the small video data stream of the area where the non-target is located is processed in a fuzzy mode, and further the speed of hardware processing is improved;
fourthly, algorithm parallelization is adopted for the algorithm processing process of the video data stream: when the GPU and the APU process the video data stream by the algorithm, the algorithm is parallelized, so that the GPU and the APU can simultaneously utilize the processing space of the GPU and the APU, the running speed and the efficiency of hardware can be effectively accelerated, and the accelerated processing of the video data stream is finished;
fifthly, receiving and playing the processed video data stream by the CPU: when the GPU and the APU finish processing the video data stream, the CPU receives the video data, splices the video data stream according to the segmented data sequence, arranges the video data stream by adopting a bubble sorting method, and then finishes quickly playing the video data stream.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A hardware acceleration method based on a tracking algorithm is characterized by comprising the following steps:
s1, the hardware realizes the receiving and dividing process of the data flow information: the CPU receives the data stream information, separates and compresses the data stream information to obtain video data, places the compressed video data at the separation part in a system memory, and divides the compressed video data into a plurality of small parts;
s2, the CPU distributes the compressed video data to the GPU and the APUs for processing: the CPU transmits the compressed video data divided into a plurality of small parts to the GPU and the APU for data parallelization and operation parallelization processing, so that the compressed video data stream is decompressed, and decompressed data information is stored in the sound card;
s3, GPU and APU realize post-processing of video data stream through their own algorithm: the GPU and the APU realize target locking on characteristic objects or characters in the video through a tracking algorithm, then realize tracking and positioning on the target through Kalman filtering, particle filtering, Meanshift, Camshift, MOSSE, CSK, KCF, BACF or SAMF, then realize fine processing on the small video data stream of the area where the target is located, so that the video data stream can be processed in a higher definition mode, then the small video data stream of the area where the non-target is located is processed in a fuzzy mode, and further the speed of hardware processing is improved;
s4, adopting algorithm parallelization for the algorithm processing process of the video data stream: when the GPU and the APU process the video data stream by the algorithm, the algorithm is parallelized, so that the GPU and the APU can simultaneously utilize the processing space of the GPU and the APU, the running speed and the efficiency of hardware can be effectively accelerated, and the accelerated processing of the video data stream is finished;
s5, the processed video data stream is received and played by the CPU: when the GPU and the APU finish processing the video data stream, the CPU receives the video data, splices the video data stream according to the segmented data sequence, arranges the video data stream by adopting a bubble sorting method, and then finishes quickly playing the video data stream.
2. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the segmentation processing in the S1 adopts a spatial domain algorithm and a time domain algorithm, the spatial domain algorithm is performed by a macroblock, the processing of each pixel is performed locally in a spatial domain, each pixel is processed in sequence and then an output is generated, and there is no result accumulation effect when the output is generated from the previous or next pixel;
the time domain algorithm looks for where a change or similarity occurs in a particular pixel or pixel region between frames and converts the interlaced field to progressive format by line doubling or filtering.
3. The hardware acceleration method based on tracking algorithm of claim 2, characterized in that: the algorithm of the macro block uses the adjacent pixels around a certain pixel to calculate the average value to carry out low-pass filtering on the pixel, namely 5 × 5 two-dimensional convolution kernels are used for detecting the edge information in the image and drawing related information from a large number of surrounding pixels; the spatial processing and the temporal processing are combined to form a new category, which is called "space-time processing", each image frame is decomposed into a plurality of macro blocks, namely an area of 16 × 16 pixels, and then the macro blocks are tracked and compared frame by frame to extract approximate values of motion estimation and compensation.
4. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the data parallelization in S2 is to divide data into blocks capable of being processed simultaneously, where the data blocks can implement 16 by 16 and 32 by 32 data blocks, and the data parallelization needs to have a Stream object, call a parallel method of the Stream object to enable the Stream object to have the capability of parallel operation, or create a Stream from a collection class to call a parallel Stream to immediately obtain a Stream with the capability of parallel.
5. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the operation parallelization in the step S2 is a detailed optimization of algorithm processing, so that generation of intermediate variables is reduced as much as possible, and calculation is performed as one step as possible.
6. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: and performing tracking, positioning and particle filtering on the target in the step S3, wherein the particle filtering operation steps are as follows:
s301, initialization stage-extraction of target features: selecting a target, and extracting the characteristics of a target area, namely a target color histogram;
s302, initializing particles: a) scattering particles evenly over the entire image, b) scattering particles in a gaussian distribution near the target of the previous frame;
s303, a search stage: counting the color histogram of each particle, comparing with the color histogram of the target model, calculating the weight according to the Papanicolaou distance, normalizing the weight to make the weight of all the particles added to be 1,
s304, particle resampling: a small number of particles are placed at a place with low similarity, a plurality of particles are placed at a place with high similarity, and the particles with low weight are discarded;
s305, state transition: according to
Calculating the position of the particle at the next moment by st-Ast-1 + wt-1s _ t ═ As _ { t-1} + w _ { t-1} st ═ Ast-1+ wt-1;
s306, an observation stage: calculating the similarity between each particle and the target characteristic, and updating the weight of the particle;
s307, decision stage: calculating a weighted average value of the coordinates and the similarity to obtain the position of the next frame of the tracking target;
s308, repeating S303, S304, S305, S306 and S307 according to the predicted position.
7. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the algorithm parallelization in the S4 adopts a PRAM model which has a centralized shared memory and an instruction controller, and performs implicit synchronous calculation by exchanging data through R/W of the SM.
8. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: the step of comparing the bubble sort method in S5 is as follows:
s501, comparing adjacent elements, and if the first element is larger than the second element, exchanging the two elements;
s502, performing the same work on each pair of adjacent elements, namely, from the first pair to the last pair at the end, wherein the last element is the maximum number after the work is completed;
s503, repeating the steps for all the elements except the last element;
s504, repeating the above steps for fewer and fewer elements each time until no pair of numbers needs to be compared.
9. The hardware acceleration method based on tracking algorithm of claim 8, characterized in that: and when the segmentation processing is carried out, the segmented video data stream ends are numbered, and then effective bubble sorting processing is carried out according to the numbers when the bubble sorting method is carried out, so that the video data stream is sequentially played.
10. The hardware acceleration method based on tracking algorithm of claim 1, characterized in that: in the high-definition processing in S3, a cubic convolution method is adopted for scaling the video signal, each pixel of the output image of the cubic convolution method is the result of operation of 16 pixels of the original image, and when cubic convolution interpolation is used, the value of the target point is calculated by resampling the values of 16 known pixels around the target point.
CN202110723521.XA 2021-06-29 2021-06-29 Hardware acceleration method based on tracking algorithm Active CN113436232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110723521.XA CN113436232B (en) 2021-06-29 2021-06-29 Hardware acceleration method based on tracking algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110723521.XA CN113436232B (en) 2021-06-29 2021-06-29 Hardware acceleration method based on tracking algorithm

Publications (2)

Publication Number Publication Date
CN113436232A true CN113436232A (en) 2021-09-24
CN113436232B CN113436232B (en) 2023-03-24

Family

ID=77757427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110723521.XA Active CN113436232B (en) 2021-06-29 2021-06-29 Hardware acceleration method based on tracking algorithm

Country Status (1)

Country Link
CN (1) CN113436232B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418824A (en) * 2022-01-27 2022-04-29 支付宝(杭州)信息技术有限公司 Image processing method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070279411A1 (en) * 2003-11-19 2007-12-06 Reuven Bakalash Method and System for Multiple 3-D Graphic Pipeline Over a Pc Bus
CN112669196A (en) * 2021-03-16 2021-04-16 浙江欣奕华智能科技有限公司 Method and equipment for optimizing data by factor graph in hardware acceleration engine
CN112732637A (en) * 2021-01-22 2021-04-30 湖南师范大学 Bayesian resampling-based FPGA hardware implementation method and device for particle filtering, and target tracking method
CN112927127A (en) * 2021-03-11 2021-06-08 华南理工大学 Video privacy data fuzzification method running on edge device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070279411A1 (en) * 2003-11-19 2007-12-06 Reuven Bakalash Method and System for Multiple 3-D Graphic Pipeline Over a Pc Bus
CN112732637A (en) * 2021-01-22 2021-04-30 湖南师范大学 Bayesian resampling-based FPGA hardware implementation method and device for particle filtering, and target tracking method
CN112927127A (en) * 2021-03-11 2021-06-08 华南理工大学 Video privacy data fuzzification method running on edge device
CN112669196A (en) * 2021-03-16 2021-04-16 浙江欣奕华智能科技有限公司 Method and equipment for optimizing data by factor graph in hardware acceleration engine

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418824A (en) * 2022-01-27 2022-04-29 支付宝(杭州)信息技术有限公司 Image processing method, device and storage medium

Also Published As

Publication number Publication date
CN113436232B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN105205441B (en) Method and apparatus for extracting feature regions from point cloud
JP2005011005A (en) Information processor
Ttofis et al. High-quality real-time hardware stereo matching based on guided image filtering
CN109191364A (en) Accelerate the hardware structure of artificial intelligence process device
CN113989276B (en) Detection method and detection device based on depth image and camera equipment
CN113436232B (en) Hardware acceleration method based on tracking algorithm
Mahmoudi et al. Multi-gpu based event detection and localization using high definition videos
Chen et al. Efficient parallel connected component labeling with a coarse-to-fine strategy
Su et al. Artificial intelligence design on embedded board with edge computing for vehicle applications
CN105303519A (en) Method and apparatus for generating temporally consistent superpixels
Li et al. Gpu and cpu cooperative accelaration for face detection on modern processors
Wasala et al. Real-time HOG+ SVM based object detection using SoC FPGA for a UHD video stream
Mahmoudi et al. Multi-CPU/multi-GPU based framework for multimedia processing
CN114937159A (en) Binocular matching method based on GPU acceleration
CN111680619A (en) Pedestrian detection method based on convolutional neural network and double-attention machine mechanism
Mahmoudi et al. Taking advantage of heterogeneous platforms in image and video processing
Jia et al. NSLIC: SLIC superpixels based on nonstationarity measure
Messom et al. Stream processing of integral images for real-time object detection
CN109493349B (en) Image feature processing module, augmented reality equipment and corner detection method
CN110942416B (en) General morphological acceleration method for GPU
CA2780710A1 (en) Video segmentation method
Denoulet et al. Implementing motion markov detection on general purpose processor and associative mesh
Ramirez-Martinez et al. Dynamic management of a partial reconfigurable hardware architecture for pedestrian detection in regions of interest
Gu et al. High frame-rate tracking of multiple color-patterned objects
Cabido et al. High speed articulated object tracking using GPUs: A particle filter approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant