CN108846364A

CN108846364A - A kind of video features detection method and system based on FPGA

Info

Publication number: CN108846364A
Application number: CN201810653311.6A
Authority: CN
Inventors: 张良; 徐杰; 陈训逊; 何跃鹰; 党向磊; 李建强; 张晓明; 刘刚; 朱缓; 郭敬林
Original assignee: Shenzhen Surfilter Technology Development Co ltd; National Computer Network and Information Security Management Center
Current assignee: Shenzhen Surfilter Technology Development Co ltd; National Computer Network and Information Security Management Center
Priority date: 2018-06-22
Filing date: 2018-06-22
Publication date: 2018-11-20
Anticipated expiration: 2038-06-22
Also published as: CN108846364B

Abstract

The video features detection method based on FPGA that the present invention provides a kind of, including：The characteristic point gathering of video flowing in selecting video library；Characteristic point gathering is trained, sorter network is obtained；Solidified using FPGA and realizes sorter network to carry out video features comparison.By realizing neural network framework, approximate SIFT feature and SURF feature based on FPGA, video features detection is realized.Traditional SIFT and SURF algorithm is compared by searching for the mode of feature database, and the present invention completes feature generation and comparison process by neural network actually on FPGA, eliminates the step of searching feature database, improves comparison efficiency.The present invention optimizes SIFT and SURF algorithm, makes to have evaded magnanimity feature library lookup link as a result, it is suitable for large scale system application, and use FPGA hardware technology acceleration calculating process, improved inspection efficiency by combining depth learning technology.

Description

A kind of video features detection method and system based on FPGA

Technical field

The present invention relates to technical field of video processing more particularly to a kind of video features detection method based on FPGA and it is System.

Background technique

Universal with Internet application, multitude of video application is active on the internet, wherein adulterate it is some it is sudden and violent probably, color Feelings, reaction video seriously threaten state and society safety and masses' daily life.Therefore, the video under big handling capacity is examined in real time Survey technology is the indispensable management means of this kind of application.

For the massive video detection under big handling capacity, traditional video detecting method is to computing capability and network transmission Ability proposes rigors.(the benchmark by taking HD video as an example：Video vertical resolution 720p or 1080i), each processing is single Member answers real-time detection to be no less than 150 road videos.The standard resolution of 720P is 1280*720.It is extensive after debit's video decompression Again at original video files, usual color depth is 32bit (red green blue tricolor each 8bit, luminance information 8bit).With the calculating of frame per second 5 (can suitably reduce when needing 20 or so frame per second when human eye is watched, but detecting), then the scale of the 150 1 second video in tunnel is：

1280*720*32bit*5*150=22118400000bit

It is scaled byte capacity, is exactly 22118400000/8=2764,800,000 bytes, about 2.7GB.

In addition, the data volume of intermediate result will be further added by when carrying out the operations such as gaussian pyramid using SIFT scheduling algorithm 10 times or more.The data of this scale propose rigors to computing capability and network capacity, using server cluster+ GPU processing cannot achieve whole streamlined and power consumption is high.Further, when feature generate after, need in magnanimity feature database into Row compares, and when feature is more than 1,000,000,000, Query Cost is huge, thus, traditional scheme is difficult to use in large scale system.

Summary of the invention

The purpose of the present invention is to provide a kind of video features detection method and system based on FPGA.

On the one hand, the embodiment of the present invention provides a kind of video features detection method based on FPGA, includes the following steps：

The characteristic point gathering of video flowing in selecting video library；

The characteristic point gathering is trained, sorter network is obtained；

Realize the sorter network to carry out video features comparison using FPGA solidification.

In the video features detection method of the invention based on FPGA, the characteristic point cluster of the video flowing in selecting video library Collection the step include：

Extract multiple key frames of the video flowing；

For each key frame, corresponding SIFT feature and SURF characteristic point are generated；

The SIFT feature and the SURF characteristic point in image at same frame are compared, the SIFT feature and institute are chosen State the pixel collection of SURF characteristic point coincidence；

Gathering classification is carried out to the pixel collection and is marked to generate the characteristic point gathering.

In the video features detection method of the invention based on FPGA, it is being directed to each key frame, is generating and corresponds to SIFT feature and SURF characteristic point the step in, generate the SIFT feature by following steps：

The detection of scale space extreme point is carried out with the SIFT feature of the determination key frame to the key frame；

The SIFT feature is accurately positioned, determines the pixel coordinate of the SIFT feature.

In the video features detection method of the invention based on FPGA, it is being directed to each key frame, is generating and corresponds to SIFT feature and SURF characteristic point the step in, generate the SURF characteristic point by following steps：

Construct Hessian matrix；

Generate scale space；

The SURF characteristic point is determined using non-maxima suppression；

The SURF characteristic point is accurately positioned, determines the pixel coordinate of the SURF characteristic point.

In the video features detection method of the invention based on FPGA, the characteristic point gathering is trained, is obtained The step of sorter network includes：

The framework of the sorter network is constructed based on the Darknet network architecture；

Using the corresponding key frame of pixel in the characteristic point gathering as training set, it is trained, obtains the classification The weight of network.

Correspondingly, the video features detection system based on FPGA that the present invention also provides a kind of, including：

Characteristic point gathering generation module, the characteristic point gathering for the video flowing in selecting video library；

Sorter network generation module obtains sorter network for being trained to the characteristic point gathering；

Video features comparison module, for realizing the sorter network to carry out video features comparison using FPGA solidification.

In the video features detection system of the invention based on FPGA, the characteristic point gathering generation module includes：

Extraction unit, for extracting multiple key frames of the video flowing；

Characteristic point generation unit generates corresponding SIFT feature and SURF feature for being directed to each key frame Point；

Comparing unit, for comparing the SIFT feature and the SURF characteristic point in image at same frame, described in selection The pixel collection that SIFT feature and the SURF characteristic point are overlapped；

Characteristic point gathering generation unit, for carrying out gathering classification to the pixel collection and marking to generate the spy Sign point gathering.

In the video features detection system of the invention based on FPGA, the characteristic point generation unit includes SIFT feature Point generates subelement, is used for：

In the video features detection system of the invention based on FPGA, the characteristic point generation unit includes SURF feature Point generates subelement, is used for：

Construct Hessian matrix；

Generate scale space；

The SURF characteristic point is determined using non-maxima suppression；

In the video features detection system of the invention based on FPGA, the sorter network generation module includes：

Sorter network framework construction unit, for constructing the frame of the sorter network based on the Darknet network architecture Structure；

Training unit, for being trained using the corresponding key frame of pixel in the characteristic point gathering as training set, Obtain the weight of the sorter network.

Implement the embodiment of the present invention, has the advantages that：The spy that the present invention passes through the video flowing in selecting video library Sign point gathering；The characteristic point gathering is trained, sorter network is obtained；Using FPGA solidification realize the sorter network with Carry out video features comparison.By realizing neural network framework, approximate SIFT feature and SURF feature based on FPGA, video is realized Feature detection.Traditional SIFT and SURF algorithm is compared by searching for the mode of feature database, and the present invention actually exists Feature generation and comparison process are completed by neural network on FPGA, the step of searching feature database is eliminated, improves comparison Efficiency.The present invention optimizes SIFT and SURF algorithm, makes it is suitable for large scale system application by combining deep learning technology, And calculating process is accelerated using FPGA hardware technology, evade magnanimity feature library lookup link as a result, improve inspection efficiency, And then realize the real-time and accuracy of internet video feature detection.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is the flow chart for the video features detection method based on FPGA that the embodiment of the present invention one provides；

Fig. 2 is the flow chart of step S1 shown in FIG. 1；

Fig. 3 is the flow chart of step S2 shown in FIG. 1；

Fig. 4 is the schematic diagram of the video features detection system provided by Embodiment 2 of the present invention based on FPGA；

Fig. 5 is the schematic diagram of characteristic point gathering generation module shown in Fig. 4；

Fig. 6 is the schematic diagram of sorter network generation module shown in Fig. 4.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Embodiment one

Present embodiments provide a kind of video features detection method based on FPGA.It, should the video based on FPGA referring to Fig. 1 Characteristic detection method includes the following steps：

Step S1：The characteristic point gathering of video flowing in selecting video library；

Currently, being limited to computing capability, video features are usually to extract Video Key frame image features, then basic herein Upper realization aspect ratio pair.Characteristics of image traditionally is divided into two class of global characteristics and local feature, and global characteristics refer to image Integrity attribute, common global characteristics include color characteristic, textural characteristics and shape feature, such as intensity histogram etc..Office Portion's feature is then the feature extracted from image local area, including edge, angle point, line, curve and region of special attribute etc.. Common local feature includes angle point class and region class two major classes describing mode.It is complete with line feature, textural characteristics, structure feature etc. Office's characteristics of image is compared, and local image characteristics are abundant with quantity is contained in the picture, and the degree of correlation is small between feature, under circumstance of occlusion Will not because of Partial Feature disappearance and the features such as influence the detection and matching of other features.

In numerous local feature description's, SIFT and SURF are applied than wide, the core of local image characteristics description Heart problem is invariance (robustness) and ga s safety degree.When due to using local image characteristics to describe sub-, typically to The case where robustly handling the transformation of various images.Therefore, when building and design feature describe sub-, invariance problem is exactly Firstly the need of considering the problems of.Wide baseline matching in, need to consider Feature Descriptor for the invariance of visual angle change, to ruler Spend the invariance of variation, to rotationally-varying invariance etc.；In shape recognition and object retrieval, need to consider Feature Descriptor To the invariance of shape.However, the power of the ga s safety degree of Feature Descriptor is often contradictory with its invariance, that is, It says, as soon as the Feature Descriptor with numerous invariance, the ability that it distinguishes local image content are slightly weak；And if one non- It is often easy to distinguish the Feature Descriptor of different local image contents, its robustness is often relatively low.Thus, a variety of methods need again It to use simultaneously.Specifically, SIFT and SURF feature is had chosen in this application.

Therefore, as shown in Fig. 2, step S1 includes：

Step S11：Extract multiple key frames of the video flowing；

Step S12：For each key frame, corresponding SIFT feature and SURF characteristic point are generated；

Specifically, the calculation amount of SIFT algorithm and SURF algorithm is all larger, so in order to improve processing speed, in the application In, it is compared by the characteristic point that pixel coordinate is realized.Therefore, in this application, it is accomplished that the SIFT and SURF by cutting Algorithm is not carried out feature point description.

Optionally, SIFT feature is generated by following steps：

Optionally, SURF characteristic point is generated by following steps：

Construct Hessian matrix；

Generate scale space；

The SURF characteristic point is determined using non-maxima suppression；

Step S13：The SIFT feature and the SURF characteristic point in image at same frame are compared, it is special to choose the SIFT The pixel collection that sign point and the SURF characteristic point are overlapped；

Step S14：Gathering classification is carried out to the pixel collection and is marked to generate the characteristic point gathering.

Specifically, use K-means method with the specification of 32*32 to being overlapped pixel progress gathering classification and marking, such as Fruit gathering is more, then by being overlapped, characteristic point number sequence is most to retain first 15.

Step S2：The characteristic point gathering is trained, sorter network is obtained；

According to omnipotent approximation theorem (universal approximation theorem), a feedforward neural network is such as Fruit is with linear convergent rate layer and at least one layer of activation primitive (such as logistic with any " extruding " property Sigmoid activation primitive) hidden layer, as long as giving network sufficient amount of hidden unit, it can with arbitrary precision come Approximate any Borel measurable function from a finite dimensional space to another finite dimensional space.It can be seen that from this theorem Image shallow-layer feature can also actually be realized by certain convolutional neural networks.SIFT and SURF algorithm are all shallow-layer features, because And neural network approximation can be passed through.

Specifically, as shown in figure 3, step S2 includes：

Step S21：The framework of the sorter network is constructed based on the Darknet network architecture；

Step S22：Using the corresponding key frame of pixel in the characteristic point gathering as training set, it is trained, obtains The weight of the sorter network.

Specifically, 19 layers of neural network structure are constructed based on the Darknet network architecture, are overlapped with passing through in video library The key frame of pixel gathering mark is training set, is trained, obtains sorter network.Weight training using GPU mode, Facilitate adjusting parameter, is transplanted after parameter is fixed, then to FPGA.

Step S3：Realize the sorter network to carry out video features comparison using FPGA solidification.

FPGA (Field-Programmable Gate Array), i.e. field programmable gate array, it be PAL, The product further developed on the basis of the programming devices such as GAL, CPLD.It is as in the field specific integrated circuit (ASIC) A kind of semi-custom circuit and occur, not only solved the deficiency of custom circuit, but also overcome original programming device gate circuit number Limited disadvantage.FPGA is in general slow than the speed of ASIC (specific integrated circuit), realizes same function ratio ASIC electricity Road surface product is big.But advantage be can quick finished product, and belong to the architecture of hardware reconfiguration, be used as special chip (ASIC) small lot substitute.Therefore, after generating sorter network, in order to improve processing speed, solidified by FPGA and realized The sorter network realizes that video features compare using FPGA assembly line, eliminates the step of searching feature database as a result, improve Comparison efficiency.

The characteristic point gathering that the present invention passes through the video flowing in selecting video library；The characteristic point gathering is trained, Obtain sorter network；Realize the sorter network to carry out video features comparison using FPGA solidification.By being realized based on FPGA Neural network framework, approximate SIFT feature and SURF feature realize video features detection.Traditional SIFT and SURF algorithm passes through The mode for searching feature database is compared, and the present invention completes feature generation and ratio by neural network actually on FPGA To process, the step of searching feature database is eliminated, comparison efficiency is improved.The present invention is by combining deep learning technology, optimization SIFT and SURF algorithm make it is suitable for large scale system application, and accelerate calculating process using FPGA hardware technology, as a result, Evade magnanimity feature library lookup link, improved inspection efficiency, and then realizes the real-time of internet video feature detection And accuracy.

Embodiment two

Present embodiments provide a kind of video features detection system based on FPGA.It referring to fig. 4, should the video based on FPGA Feature detection system includes：

Characteristic point gathering generation module 10, the characteristic point gathering for the video flowing in selecting video library；

Specifically, as described above, having chosen SIFT and SURF feature in this application.Therefore, as shown in figure 5, the spy Levying point gathering generation module 10 includes：

Extraction unit 110, for extracting multiple key frames of the video flowing；

Characteristic point generation unit 120 generates corresponding SIFT feature and SURF is special for being directed to each key frame Sign point；

Comparing unit 130 chooses institute for comparing the SIFT feature and the SURF characteristic point in image at same frame State the pixel collection that SIFT feature and the SURF characteristic point are overlapped；

Characteristic point gathering generation unit 140, for carrying out gathering classification to the pixel collection and marking to generate State characteristic point gathering.

Specifically, the calculation amount of SIFT algorithm and SURF algorithm is all larger, so in order to improve processing speed, in the application In, it is compared by the characteristic point that pixel coordinate is realized.Therefore, in this application, it is accomplished that the SIFT and SURF by cutting Algorithm is not carried out feature point description.Therefore, the characteristic point generation unit include SIFT feature generate subelement and SURF characteristic point generates subelement.

Further, SIFT feature generates subelement and is used for：

Further, SURF characteristic point generates subelement and is used for：

Construct Hessian matrix；

Generate scale space；

The SURF characteristic point is determined using non-maxima suppression；

Sorter network generation module 20 obtains sorter network for being trained to the characteristic point gathering；

As described above, SIFT and SURF algorithm are all shallow-layer features, thus neural network approximation can be passed through.Therefore, such as Shown in Fig. 6, the sorter network generation module 20 includes：

Sorter network framework construction unit 210, for constructing the sorter network based on the Darknet network architecture Framework；

Training unit 220, for being instructed using the corresponding key frame of pixel in the characteristic point gathering as training set Practice, obtains the weight of the sorter network.

Video features comparison module 30, for realizing the sorter network to carry out video features ratio using FPGA solidification It is right.

Specifically, after generating sorter network, in order to improve processing speed, the classification net is realized by FPGA solidification Network realizes that video features compare using FPGA assembly line, eliminates the step of searching feature database as a result, improve comparison efficiency.

Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..

Above disclosed is only a preferred embodiment of the present invention, cannot limit the power of the present invention with this certainly Sharp range, those skilled in the art can understand all or part of the processes for realizing the above embodiment, and weighs according to the present invention Benefit requires made equivalent variations, still belongs to the scope covered by the invention.

Claims

1. a kind of video features detection method based on FPGA, which is characterized in that include the following steps：

The characteristic point gathering is trained, sorter network is obtained；

2. the video features detection method according to claim 1 based on FPGA, which is characterized in that in selecting video library The step of characteristic point gathering of video flowing include：

Extract multiple key frames of the video flowing；

The SIFT feature and the SURF characteristic point in image at same frame are compared, the SIFT feature and described is chosen The pixel collection that SURF characteristic point is overlapped；

3. the video features detection method according to claim 2 based on FPGA, which is characterized in that for described every A key frame, in the step for generating corresponding SIFT feature and SURF characteristic point, generated by following steps described in SIFT feature：

4. the video features detection method according to claim 2 based on FPGA, which is characterized in that for described every A key frame, in the step for generating corresponding SIFT feature and SURF characteristic point, generated by following steps described in SURF characteristic point：

Construct Hessian matrix；

Generate scale space；

The SURF characteristic point is determined using non-maxima suppression；

5. the video features detection method according to claim 1 based on FPGA, which is characterized in that the characteristic point Gathering is trained, and the step for obtaining sorter network includes：

Using the corresponding key frame of pixel in the characteristic point gathering as training set, it is trained, obtains the sorter network Weight.

6. a kind of video features detection system based on FPGA, which is characterized in that including：

7. the video features detection system according to claim 6 based on FPGA, which is characterized in that the characteristic point cluster Collecting generation module includes：

Extraction unit, for extracting multiple key frames of the video flowing；

Characteristic point generation unit generates corresponding SIFT feature and SURF characteristic point for being directed to each key frame；

Comparing unit chooses the SIFT for comparing the SIFT feature and the SURF characteristic point in image at same frame The pixel collection that characteristic point and the SURF characteristic point are overlapped；

Characteristic point gathering generation unit, for carrying out gathering classification to the pixel collection and marking to generate the characteristic point Gathering.

8. the video features detection system according to claim 7 based on FPGA, which is characterized in that the characteristic point is raw Include that SIFT feature generates subelement at unit, is used for：

9. the video features detection system according to claim 7 based on FPGA, which is characterized in that the characteristic point is raw Include that SURF characteristic point generates subelement at unit, is used for：

Construct Hessian matrix；

Generate scale space；

The SURF characteristic point is determined using non-maxima suppression；

10. the video features detection system according to claim 6 based on FPGA, which is characterized in that the classification net Network generation module includes：

Sorter network framework construction unit, for constructing the framework of the sorter network based on the Darknet network architecture；

Training unit, for being trained, obtaining using the corresponding key frame of pixel in the characteristic point gathering as training set The weight of the sorter network.