CN107886120A

CN107886120A - Method and apparatus for target detection tracking

Info

Publication number: CN107886120A
Application number: CN201711070712.0A
Authority: CN
Inventors: 王德祥
Original assignee: Beijing Rui Qing Dimension Airlines Technology Development Co Ltd
Current assignee: Beijing Rui Qing Dimension Airlines Technology Development Co Ltd
Priority date: 2017-11-03
Filing date: 2017-11-03
Publication date: 2018-04-06

Abstract

Embodiments of the present invention provide a kind of method and apparatus for target detection tracking.The method for being used for target detection tracking includes：Infrared video is gathered, the test data set for detecting the target is built according to the infrared video；The first candidate region of each two field picture of the test data concentration is obtained using optical flow method；First candidate region is inputted to the Faster RCNN detector acquisition testing results trained.First candidate region of image to be detected is obtained by optical flow method, the method of the present invention causes input to Faster RCNN detectors to be the first candidate region that test data concentrates each two field picture, so as to significantly reduce the image real time transfer amount of detection target, the speed of target detection is improved.In addition, embodiments of the present invention provide a kind of device, electronic equipment and computer-readable storage medium for target detection tracking.

Description

Method and apparatus for target detection tracking

Technical field

Embodiments of the present invention are related to computer vision field, are used for more specifically, embodiments of the present invention are related to Method, apparatus, electronic equipment and the computer-readable storage medium of target detection tracking.

Background technology

This part is it is intended that the embodiments of the present invention stated in claims provide background or context.Herein Description recognizes it is prior art not because not being included in this part.

Target detection and tracking are the important research problems in computer vision.The purpose of target detection is provided in image The position of interesting target and classification, this has in actual applications is widely applied prospect very much.In automatic Pilot technology, it is necessary to Pedestrian, vehicle and moving object effectively around identification and tracking, to realize automatic obstacle-avoiding；Monitoring system is also required to marking video In pedestrian and vehicle, relevant fragment is quickly transferred from massive video to facilitate.

Above- mentioned information is only used for strengthening the understanding to the background of the disclosure, therefore it disclosed in the background section It can include not forming the information to prior art known to persons of ordinary skill in the art.

The content of the invention

At present, the depth learning technology based on convolutional neural networks obtains in the tasks such as image classification, target detection Huge success, speed and accuracy are all significantly improved.

But under visual light imaging, picture is easily by weather, illumination effect, so as to influence target detection effect, and it is existing The detection target that algorithm is directed to tends to take up major part in image, and prior art is examined for the target of distance small target Detection results are deteriorated during survey.

Compared with visible ray, infrared thermal imaging has the characteristics that operating distance is remote, can worked at night, therefore is widely used In in the tasks such as video monitoring, early warning and target following.

Therefore in the prior art, it is right in the realization of the video monitoring based on infrared thermal imaging and early warning system etc. It is a major issue that target in infrared video sequence, which carries out automatic detection and identification,.

Therefore, being highly desirable to a kind of improved method and apparatus for target detection tracking, on the one hand can avoid can The problems such as weather seen under photoimaging, illumination effect, additionally it is possible to avoid distance small target to caused by Target detection and identification The problems such as Detection results are deteriorated, so as to carry out automatic detection and identification to the target in infrared video.

In the present context, embodiments of the present invention it is expected provide it is a kind of for target detection tracking method, apparatus, Electronic equipment and computer-readable storage medium.

In the first aspect of embodiment of the present invention, there is provided a kind of method for target detection tracking, including：Adopt Collect infrared video, the test data set for detecting the target is built according to the infrared video；Institute is obtained using optical flow method State the first candidate region that test data concentrates each two field picture；And first candidate region is inputted to Faster- RCNN detector acquisition testing results.

In one embodiment of the invention, methods described also includes：Training step, wherein the training step includes： Training dataset for detecting the target is built according to the infrared video, the training dataset includes categorized data set With detection data set；The target classification network being pre-designed according to categorized data set training；And according to the institute trained State target classification network and the Faster-RCNN detectors are practiced in the detection data training.

In another embodiment of the present invention, the instruction built according to the infrared video for detecting the target Practicing data set includes：Obtain in the infrared video be used for build the training dataset each two field picture positive sample and its Positive sample label, negative sample and its negative sample label, build the categorized data set；Obtained using the optical flow method described infrared It is used to build the pixel that each two field picture of the training dataset moves in video；And according to each two field picture The pixel moved obtains the training data and concentrates the topography that the target in each two field picture be present and correspondingly Coordinate information form the detection data set.

In yet another embodiment of the present invention, the target classification being pre-designed according to categorized data set training Network includes：A convolutional neural networks are designed as the target classification network；And according to the designed convolutional Neural Network and the categorized data set, the target classification network is trained using stochastic gradient descent method.

In yet another embodiment of the present invention, the target classification network and the testing number that the basis trains The Faster-RCNN detectors are trained to include according to collection：Design the convolution Rotating fields of the Faster-RCNN detectors with it is described The convolution Rotating fields of target classification network are identical；Convolutional layer network parameter using the target classification network trained is initial Change the convolutional layer network parameter of the Faster-RCNN detectors；And the detection data set is inputted to designed institute Faster-RCNN detectors are stated, the Faster-RCNN detectors are trained using the method for joint training.

In yet another embodiment of the present invention, the target classification network and the testing number that the basis trains The Faster-RCNN detectors are trained also to include according to collection：The Faster-RCNN detectors according to the target modification detected RPN networks in anchor point setting.

In yet another embodiment of the present invention, methods described also includes：According to the target preceding default frame motion Estimate the equation of motion of the target in track；According to the equation of motion of the target, predict that the test data concentrates each frame Second candidate region of image；And second candidate region is inputted to the Faster-RCNN detector acquisitions trained The testing result.

In yet another embodiment of the present invention, methods described also includes：According to the testing result and preset rules pair The target is tracked.

In yet another embodiment of the present invention, the preset rules are：Setting system initial state is first state；System When system is in the first state, when detecting the target in the candidate region of next frame infrared video, system jumps to Second state；When system is in second state, do not examined when in the candidate region of next frame infrared video and its adjacent domain When measuring the target, system jumps to the first state；When system is in second state, when regarding next frame is infrared When the candidate region of frequency and/or adjacent domain detect the target, system is maintained at second state；When system continuous When one default frame number is in second state, system jumps to the third state；When system is in the third state, when When the candidate region of next frame infrared video and/or adjacent domain detect the target, system is maintained at the third state； When system is in the third state, when being not detected by the mesh in the candidate region of next frame infrared video and its adjacent domain Timestamp, system jump to the 4th state；When system is in four state, when in the candidate region of next frame infrared video And/or adjacent domain, when detecting the target, system jumps to the third state；When system is in four state, When being not detected by the target in the candidate region of next frame infrared video and its adjacent domain, system is maintained at the described 4th State；And when the default frame number of system continuous second is in four state, system jumps to the first state.

In the second aspect of embodiment of the present invention, there is provided a kind of device for target detection tracking, including：Survey Data set structure module is tried, for gathering infrared video, the test for detecting the target is built according to the infrared video Data set；Candidate region acquisition module, the first of each two field picture is concentrated to wait for obtaining the test data using optical flow method Favored area；Module of target detection, for first candidate region to be inputted to the Faster-RCNN detector acquisitions trained Testing result.

In one embodiment of the invention, described device also includes training module, and the training module includes：Train number According to collection construction unit, for building the training dataset for detecting the target, the training number according to the infrared video Include categorized data set and detection data set according to collection；Classifier training unit, it is advance for being trained according to the categorized data set The target classification network of design；Detector training unit, for according to the target classification network and the detection trained Data set trains the Faster-RCNN detectors.

In another embodiment of the present invention, the training dataset construction unit includes：Categorized data set structure Unit, it is used to build the positive sample of each two field picture of the training dataset and its positive sample in the infrared video for obtaining This label, negative sample and its negative sample label, build the categorized data set；Move pixel and obtain subelement, for using institute State optical flow method and obtain and be used to build the pixel that each two field picture of the training dataset moves in the infrared video；Inspection Data set structure subelement is surveyed, the pixel for being moved according to each two field picture obtains the training data and concentrated often The topography and corresponding coordinate information that the target in one two field picture be present form the detection data set.

In yet another embodiment of the present invention, the classifier training unit includes：Grader structure design subelement, For designing a convolutional neural networks as the target classification network；Classifier training subelement, for according to designed The convolutional neural networks and the categorized data set, the target classification network is trained using stochastic gradient descent method.

In yet another embodiment of the present invention, the detector training unit includes：Detector structure design subelement, Convolution Rotating fields for designing the Faster-RCNN detectors are identical with the convolution Rotating fields of the target classification network； Detector initializes subelement, for described in the convolutional layer network parameter initialization using the target classification network trained The convolutional layer network parameter of Faster-RCNN detectors；Detector train subelement, for by it is described detection data set input to The designed Faster-RCNN detectors, the Faster-RCNN detectors are trained using the method for joint training.

In yet another embodiment of the present invention, the detector training unit also includes：Anchor point sets subelement, is used for The setting of anchor point in the RPN networks of Faster-RCNN detectors according to the target modification detected.

In yet another embodiment of the present invention, described device also includes the second candidate region generation module, wherein described Second candidate region generation module includes：Equation of motion generation unit, for according to the target preceding default frame motion rail Mark estimates the equation of motion of the target；Estimation generates candidate region unit, for the equation of motion according to the target, Predict that the test data concentrates the second candidate region of each two field picture；Estimation detection unit, for by described second Candidate region is inputted to testing result described in the Faster-RCNN detector acquisitions trained.

In yet another embodiment of the present invention, in addition to：Target tracking module, for according to the testing result and in advance If rule is tracked to the target.

In yet another embodiment of the present invention, the preset rules are：Setting system initial state is first state；System When system is in the first state, when detecting the target in the candidate region of next frame infrared video, system jumps to Second state；When system is in second state, do not examined when in the candidate region of next frame infrared video and its adjacent domain When measuring the target, system jumps to the first state；When system is in second state, when regarding next frame is infrared When the candidate region of frequency and/or adjacent domain detect the target, system is maintained at second state；When system continuous When one default frame number is in second state, system jumps to the third state；When system is in the third state, when When the candidate region of next frame infrared video and/or adjacent domain detect the target, system is maintained at the third state； When system is in the third state, when being not detected by the mesh in the candidate region of next frame infrared video and its adjacent domain Timestamp, system jump to the 4th state；When system is in four state, when in the candidate region of next frame infrared video And/or adjacent domain, when detecting the target, system jumps to the third state；When system is in four state, When being not detected by the target in the candidate region of next frame infrared video and its adjacent domain, system is maintained at the described 4th State；When system continuous second, which presets frame number, is in four state, system jumps to the first state.

In the third aspect of embodiment of the present invention, there is provided a kind of electronic equipment, including memory, processor and deposit Storage on a memory and the computer program that can run on a processor, wherein, realize during the computing device described program As above the method for being used for target detection tracking described in any embodiment.

In the fourth aspect of embodiment of the present invention, there is provided a kind of computer-readable storage medium, store thereon Have computer program, wherein, realized when the program is executed by processor as above described in any embodiment be used for target detection with The method of track.

Method, apparatus, the electronic equipment and computer-readable that target detection tracks are used for according to embodiment of the present invention Storage medium, by optical flow method extract images to be recognized the first candidate region, then by first candidate region input to The Faster-RCNN detectors trained carry out Classification and Identification, can substantially reduce the data processing amount of target detection, without Detection identification is entirely carried out to images to be recognized, can be more rapidly so as to significantly reduce the complexity of object detection method Effectively realize the detection and identification to the target in infrared video.

In addition, according to some embodiments, the method, apparatus for being used for target detection tracking of embodiment of the present invention, electronics Equipment and computer-readable storage medium, by combining optical flow method and Faster-RCNN, realize the height of distance small target Imitate detect and track.Simultaneously as the detection algorithm of the present invention can be used for infrared video, can efficiently be examined on daytime and night Survey.

It should be appreciated that the general description and following detailed description of the above are only exemplary, this can not be limited Invention.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights Specifically noted structure is realized and obtained in claim and accompanying drawing.

Brief description of the drawings

Detailed description below, above-mentioned and other mesh of exemplary embodiment of the invention are read by reference to accompanying drawing , feature and advantage will become prone to understand.In the accompanying drawings, if showing the present invention's by way of example, and not by way of limitation Dry embodiment, wherein：

Fig. 1 schematically shows the application scenarios schematic diagram that can be realized wherein according to embodiment of the present invention；

Fig. 2 schematically shows the method flow diagram according to an embodiment of the invention for target detection tracking；

Fig. 3 schematically shows the method flow diagram according to another embodiment of the present invention for target detection tracking；

Fig. 4 schematically shows the signal of the neutral net according to an embodiment of the invention for target classification network Figure；

Fig. 5 schematically shows the schematic diagram according to an embodiment of the invention that unmanned plane is detected using methods described；

Fig. 6 schematically shows the schematic diagram according to another embodiment of the present invention that unmanned plane is detected using methods described；

Fig. 7 schematically shows the schematic diagram that unmanned plane is detected using methods described according to further embodiment of this invention；

Fig. 8 schematically shows the method flow diagram for being used for target detection and tracking according to further embodiment of this invention；

Fig. 9 schematically shows the state transition signal according to an embodiment of the invention for target detection tracking Figure；

The structure that Figure 10 schematically shows the device according to an embodiment of the invention for target detection tracking is shown It is intended to；

Figure 11 schematically shows the structural representation of electronic equipment according to an embodiment of the invention；

Figure 12 schematically shows the schematic diagram of computer-readable storage medium according to an embodiment of the invention.

In the accompanying drawings, identical or corresponding label represents identical or corresponding part.

Embodiment

The principle and spirit of the present invention is described below with reference to some illustrative embodiments.It should be appreciated that provide this A little embodiments are not with any just for the sake of better understood when those skilled in the art and then realize the present invention Mode limits the scope of the present invention.On the contrary, these embodiments are provided so that the disclosure is more thorough and complete, and energy It is enough that the scope of the present disclosure is intactly communicated to those skilled in the art.

One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, equipment, method Or computer program product.Therefore, the disclosure can be implemented as following form, i.e.,：Complete hardware, complete software (including firmware, resident software, microcode etc.), or the form that hardware and software combines.

According to the embodiment of the present invention, it is proposed that it is a kind of for target detection tracking method, apparatus, electronic equipment and Computer-readable storage medium.

Herein, it is to be understood that involved term small objects are represented in be identified or image to be detected Target either to be detected to be identified it is smaller with respect to be identified or image to be detected the size, i.e., this is to be identified or to be checked It is smaller to survey the pixel that target occupies in the to be identified or image to be detected, is shot for example with common infrared collecting equipment The unmanned plane of distant location may be considered small objects as target to be identified or to be detected, unmanned plane here.And this It is remote described in inventive embodiments, and for the relative target that identify or detect, such as in following nobody In embodiment of the machine as target to be identified or to be detected, it is believed that apart from infrared collecting equipment 1.5km in addition nobody Machine is remote small objects, is illustrated in specific embodiment by taking 2.6km as an example.But when by embodiment of the present invention In method when being applied to different scenes, small objects and can be adjusted correspondingly and set at a distance, the present invention is to this It is not limited.Any number of elements in accompanying drawing is used to example and unrestricted, and any name is only used for distinguishing, without With any restrictions implication.

Below with reference to the principle and spirit of some representative embodiments of the present invention, in detail the explaination present invention.

Summary of the invention

The inventors discovered that under visual light imaging, images to be recognized is easily influenceed by weather, illumination etc., so as to influence Target detection effect, and existing target detection and the algorithm of tracking are primarily directed to occupy major part in images to be recognized Larger or relatively near target to be identified, when being detected for the target of distance small target, Detection results are deteriorated.It is in addition, existing Images to be recognized is directly inputted into grader and carries out Classification and Identification by some detection algorithms, and data processing amount is big, so as to cause Recognition efficiency is relatively low.

Therefore, carried for the technical problem present invention that data processing amount present in prior art is big and Detection results are deteriorated Method, apparatus, electronic equipment and the computer-readable storage medium for being used for target detection tracking are supplied, by gathering infrared regard Frequently, the test data set for detecting the target is built according to the infrared video, the test number is obtained using optical flow method According to the first candidate region for concentrating each two field picture, first candidate region is inputted to Faster-RCNN detector acquisitions Testing result, so, embodiment of the present invention can utilize the data processing amount of optical flow method reduction target detection, so as to the present invention Embodiment can obtain more efficient target detection effect；It follows that the technical scheme that embodiment of the present invention provides has Beneficial to automatic and efficient detection and position target to be identified.

After the general principle of the present invention is described, lower mask body introduces the various non-limiting embodiment party of the present invention Formula.

Application scenarios overview

With reference first to Fig. 1, it is schematically shown that according to the applied field that can be realized wherein of embodiment of the present invention Scape.

In Fig. 1, the infrared spy that can capture the infrared video for target detection tracking is installed in infrared collecting equipment Survey component and optical imaging assemblies.Server is communicated with the infrared collecting equipment by wireless and/or wired mode, is received The infrared video of the infrared collecting equipment shooting carries out the processing of the target detection tracking in embodiment of the present invention to it.So And those skilled in the art are appreciated that the applicable scene of embodiment of the present invention not by any aspect of the framework completely Limitation.Although for example, illustrate only a server and an infrared collecting equipment in diagram, in practical application scene, take The quantity of business device and infrared collecting equipment is unrestricted.In addition it is also possible to by server and infrared collecting integration of equipments in one.

Illustrative methods

With reference to Fig. 1 application scenarios, describe to be used for mesh according to exemplary embodiment of the invention with reference to figure 2-9 The method for marking detecting and tracking.It should be noted that above-mentioned application scenarios are for only for ease of and understand spirit and principles of the present invention And show, embodiments of the present invention are unrestricted in this regard.On the contrary, embodiments of the present invention can apply to fit Any scene.

Referring to Fig. 2-9, it is schematically shown that the method according to an embodiment of the invention for target detection tracking Flow chart, this method generally perform in the equipment that can run computer program, for example, in desktop computer or server Etc. performing in equipment, it is of course also possible to be performed in the mobile computer even equipment such as tablet personal computer.

The method of embodiment of the present invention can include：Step S200, step S210 and step S220.Optionally, originally The method of invention embodiment can also include：Step S300, step S310, step S320 and step S330.Or this hair The method of bright embodiment can also include：Step S800, step S810, step S820 and step S830.

As shown in Fig. 2 in step s 200, infrared video is gathered, is built according to the infrared video for described in detecting The test data set of target.

It can be obtained as an example, gathering the infrared video by infrared collecting equipment such as infrared remote sensing video camera Take, the infrared video can be pre-processed, including analog-to-digital conversion, obtain X-Y scheme, 3-D graphic or a figure As sequence, in order to meet the requirement of subsequent image processing, secondary sample, smoothing denoising can also be carried out, improve the behaviour such as contrast Make.

In step S210, the first candidate region of each two field picture of the test data concentration is obtained using optical flow method.

As an example, the described of embodiment of the present invention obtains each two field picture of the test data concentration using optical flow method Candidate region may comprise steps of：The test data is obtained using the optical flow method and concentrates each two field picture difference phase For the pixel of rear default frame number image motion；Frame number image hair is preset after being respectively relative to according to each two field picture The pixel of raw motion obtains the pixel that the test data concentrates each two field picture to move；Sent out according to each two field picture The pixel of raw motion obtains first candidate region that the test data concentrates each two field picture.

Light stream is a kind of expression way of simple and practical image motion, the image being normally defined in an image sequence The apparent motion of luminance patterns, i.e., the table of the movement velocity of the point on space object surface on the imaging plane of vision sensor Reach.This definition thinks that light stream only represents a kind of Geometrical change.Light stream is newly defined as dynamic by Negahdaripour in 1998 Comprehensive expression of Geometrical change and the radiancy change of image.The research of light stream is to utilize the pixel intensity data in image sequence Time domain change and correlation determine " motion " of respective location of pixels, i.e. the change in time of research gradation of image and scape As middle object structures and its relation of motion.Generally, light stream by camera motion, scene target motion or both be total to Caused by relative motion caused by motion.

Optical flow method detection moving object general principle be：A speed arrow is assigned to each pixel in image Amount, which forms an image motion field, in a particular moment of motion, the point one on point and three-dimensional body on image One correspondence, this corresponding relation can be obtained by projection relation, and according to the velocity feature of each pixel, image can be entered Mobile state is analyzed.If not having moving object in image, light stream vector is consecutive variations in whole image region.Work as image In when having moving object, there is relative motion in target and image background, the velocity certainty and neighborhood that moving object is formed Background velocity vector is different, so as to detect moving object and position.The advantages of optical flow method, is that light stream not only carries motion The movable information of object, but also the abundant information about scenery three-dimensional structure is carried, it can not know appointing for scene In the case of what information, Moving Objects are detected.

In step S220, first candidate region is inputted to Faster-RCNN detector acquisition testing results.

In one preferred embodiment, methods described can also include training step.Wherein described training step can To further comprise the steps：Training dataset for detecting the target, the instruction are built according to the infrared video Practicing data set includes categorized data set and detection data set；The target classification net being pre-designed according to categorized data set training Network；The target classification network and the detection data training according to training practice the Faster-RCNN detectors.

In another preferred embodiment, the instruction built according to the infrared video for detecting the target Practicing data set can include：Obtain and be used for the positive sample for building each two field picture of the training dataset in the infrared video And its positive sample label, negative sample and its negative sample label, build the categorized data set；Using described in optical flow method acquisition It is used to build the pixel that each two field picture of the training dataset moves in infrared video；According to each two field picture The pixel moved obtains the training data and concentrates the topography that the target in each two field picture be present and correspondingly Coordinate information form the detection data set.

As an example, the target classification being pre-designed according to categorized data set training of embodiment of the present invention Network can include：A convolutional neural networks are designed as the target classification network；According to the designed convolutional Neural Network and the categorized data set, the target classification network is trained using stochastic gradient descent method.

In addition, although the above-mentioned fact has selected stochastic gradient descent method to train the target classification network in coming, originally It is open to be not limited to this, in other embodiments, other method training objective sorter network can also be used, is merely just risen To the effect of an illustration.

In one preferred embodiment, the target classification network and the detection data that the basis trains Collection trains the Faster-RCNN detectors to include：Design convolution Rotating fields and the institute of the Faster-RCNN detectors The convolution Rotating fields for stating target classification network are identical；At the beginning of convolutional layer network parameter using the target classification network trained The convolutional layer network parameter of the beginningization Faster-RCNN detectors；The detection data set is inputted to designed described Faster-RCNN detectors, the Faster-RCNN detectors are trained using the method for joint training.

In another more preferred embodiment, the target classification network and the inspection that the basis trains Surveying data set trains the Faster-RCNN detectors to include：The Faster- according to the target modification detected The setting of anchor point in the RPN networks of RCNN detectors.

As an example, the methods described of embodiment of the present invention can also include：According to the target in preceding default frame Movement locus estimates the equation of motion of the target；According to the equation of motion of the target, it is every to predict that the test data is concentrated Second candidate region of one two field picture；Second candidate region is inputted to the Faster-RCNN detector acquisitions trained The testing result.

A kind of method for target detection that embodiment of the present invention provides, images to be recognized is extracted by optical flow method First candidate region, then the first candidate region is inputted into the Faster-RCNN detectors trained again, reduces and waits to know The scope of other image so that region to be detected greatly reduces, and so as to reduce algorithm complex, improves detection efficiency.In addition, The first candidate region is classified by optical flow method combination Faster-RCNN detectors in some embodiments, so as to height The remote small objects of detection and localization of effect.Meanwhile the detection method of embodiment of the present invention can be used for infrared video, white It and night can efficient detections.

In the following embodiments, by taking the detection of remote small objects as an example, and using optical flow method combination Faster- The above method is specifically described RCNN detectors.

As shown in figure 3, in step S300, the structure of categorized data set and detection data set：Utilize infrared collecting equipment The infrared video of collection, builds for the categorized data set for small objects of classifying and the testing number for detecting the small objects According to collection.

In one preferred embodiment, the step S300 may comprise steps of.

In step S301, by the infrared video of mark of having of collection, (i.e. training dataset, it includes the grouped data Collection and the detection data set) realistic objective image (such as can be by its positive sample as positive sample in each two field picture in data This label is denoted as+1, but the disclosure is not limited to this), the background area of suitable size in the infrared video is intercepted at random (image block) as negative sample (such as its negative sample label can be denoted as to -1, or, its negative sample label can also be denoted as 0) categorized data set, is built.

In one preferred embodiment, it is illustrated so that the small objects are unmanned plane as an example, but this public affairs Open and be not limited to this, then the target frame marked in each two field picture of video includes the region of unmanned plane as positive sample.

As an example, the background area that size is preset in the interception of embodiment of the present invention can be according to images to be recognized Size and the size of target to be identified intercept the background area of suitable size, such as the 15*22's in example below is big It is small.

In step s 302, for collection the infrared video for having mark each two field picture, worked as using optical flow method Prior image frame is respectively relative to the pixel that rear a few frames move, and takes union to obtain the pixel finally moved.

In one preferred embodiment, remember that present frame that the training data is concentrated is f0, next frame f1, the 5th Frame is f5, and the tenth frame is f10.F0 and the light stream figure of f1, f5, f10 interframe are calculated respectively, and threshold value t0=0.3 can be set, by light Pixel of the ratio more than t0 for flowing amplitude and maximum amplitude is labeled as there occurs movement, remembers the pixel that f0 moves relative to f1 For R₀₁, the pixel moved relative to the 5th frame is R₀₅, the pixel moved relative to the tenth frame is R0₍₁₀₎, then remember most The pixel moved eventually is mov_map=R₀₁∪R₀₅∪R₀₍₁₀₎。

It should be noted that the first above-mentioned frame, the 5th frame, the tenth frame is selected parameter in an example, at it In his embodiment, other frame numbers can be selected, the disclosure is not construed as limiting to this.

In step S303, the generation campaign pixel of each two field picture is concentrated to the training data, carries out image procossing Operation and expansive working are opened in field, obtain the possible topography that move in each two field picture, retain these Local maps The image of small objects, and coordinate information corresponding to them as in be present, form the detection data set.

In one preferred embodiment, (it is a variable parameter, the selection of numerical value can be according to waiting to know 5*5 What other image size and target area size were chosen) sliding window translation after sliding window is contained in mobile light stream figure mov_map The set formed a little, as sliding window carries out etching operation (Erosion) to mobile light stream figure；5*5 sliding window The set for forming 5*5 sliding window and the point of mobile light stream figure mov_map common factor non-NULL after translation, as sliding window is to moving Dynamic light stream figure carries out expansive working (Dilate).Open operation (Open) and first corrode the process expanded afterwards.

Open operation and expansive working here, can utilize, by too small connected region remove and it is smooth after, the company that will obtain The frame in logical region takes out.If there is common factor in the region and the region of small objects such as unmanned plane mark, its bag that occurs simultaneously is taken out The picture contained is as the detection data set, coordinate information of the relative position of unmanned plane as target in the picture.

If the region for only carrying out out obtaining after operation for small objects in the embodiment of the present invention can not be completely covered Target, then carry out an expansive working and can obtain relatively good effect.

In step S310, the classifier training stage：Utilize the categorized data set training objective sorter network.

As an example, the step S310 of embodiment of the present invention may comprise steps of.

In step S311, suitable convolutional neural networks are designed as the target classification network.

As an example, the target classification network of embodiment of the present invention can be a neutral net as shown in Figure 4, this In assume picture of the input for 15*22, by 2 5*5*64 convolutional layer, 2 3*3 maximum pond layer, step-length 2, with And after two full articulamentums, 2 outputs finally are obtained by softmax graders, that is, the result detected is target or background.

It should be noted that the parameter of the convolutional neural networks in the embodiment of the present invention can be according to actual picture in implementation The target classification network designed with the size of target, it is not limited to the value of above-mentioned design parameter.

In step S312, the institute that is built in the neutral net and above-mentioned steps S300 according to designed by above-mentioned steps S311 Categorized data set is stated, the target classification network is trained using stochastic gradient descent method.

In one preferred embodiment, the target classification network is trained to use cross entropy as loss letter Number, and the regularization term of the L2 norms of weights can also be added.Wherein, regularization is to prevent the target classification network Over-fitting.Using stochastic gradient descent method, batch size could be arranged to 256 samples, and initial learning rate could be arranged to 0.1, It is that multiple is decayed that 17 categorized data sets, which are often traveled through, with 10, trains deconditioning after 12k iteration.It should be noted that Here numerical value is some parameters when implementing, for for example, being not used to limit the disclosure.

In step s 320, the detector training stage：Utilize the volume of the above-mentioned steps S310 target classification networks trained Lamination network parameter initializes the convolutional layer of a Faster-RCNN detection network, is then practiced using the detection data training The Faster-RCNN detects network.

As an example, described in above-mentioned designed Faster-RCNN detections network other specification following article embodiment, and Other parameters do not write exactly are the same as the parameter in original Faster-RCNN algorithms.

In one preferred embodiment, the step S320 may comprise steps of.

In step S321, suitable Faster-RCNN convolutional neural networks are designed as detecting small objects Faster-RCNN detectors, wherein the volume of the convolution Rotating fields of the Faster-RCNN detectors and preceding aim sorter network Lamination structure is identical, and initializes the Faster- with the convolutional layer network parameter of the target classification network trained The convolutional layer network parameter of RCNN detectors.

Wherein, Faster-RCNN make use of the candidate region network of shared deconvolution parameter to produce based on convolutional network Candidate region, and realize the feature calculation in same input picture using target area pond technology and share.The figure of input As obtaining characteristic pattern after several convolutional layers.Candidate region network (Region Proposal Network, RPN) produces one Series candidate frame of different sizes all forms size identical characteristic pattern after the layer of target area pond, and these characteristic patterns pass through again It is divided into two parts after crossing a series of full articulamentums, a part obtains classification results after softmax layers, and another part is by side Boundary's frame returns to obtain the bounding box by amendment.Candidate region on characteristic pattern can map back original according to the result of convolutional network Figure, so as to obtain final testing result.

In one preferred embodiment, the picture of all input Faster-RCNN convolutional neural networks can contract The length put as short side is 38, and (it is a parameter that can be set according to the size that picture is inputted in application scenarios, and unlimited Due to this), when being small objects in view of remote unmanned plane, the setting of anchor point (Anchors) in RPN networks is changed, can be with The size for changing anchor is that (8*8,16*16,24*24) (setting of the size of anchor can be according to images to be recognized and target to be identified Size dimension select, be not limited to this).Because original Faster-RCNN algorithms are not to be directed to small objects, because This make it that the Detection results to small objects are more preferable, it is necessary to change corresponding parameter.

Because candidate region is mostly overlapped, to reduce redundancy and improving speed, it is preferred that can be to the candidate of output Region uses non-maximum restraining, in one embodiment, can set friendship and is that 0.7 (numerical value is for example, simultaneously than threshold value It is not used in and limits the disclosure).Confidence level highest 50 can be used after non-maximum restraining, and (numerical value is for example, not For limiting the disclosure) candidate region is used for target detection.

In step S322, the detection data set is sent into the designed Faster-RCNN networks of above-mentioned steps S321, Using the method for joint training, by candidate region network, (detection network here is convolutional layer and detection part with detection network. Faster-RCNN has been candidate region network and has detected the convolutional network before network share) regard a network, two networks as Loss function be combined together as the loss function of whole network and optimize.Wherein, the loss of two networks here Function one be candidate region generation loss function, one be detection when loss function, two-part loss function phase Add and optimize together as total loss function.

As an example, equally producing candidate frame by candidate region network during training, the parameter of candidate frame is taken as fixed value It is transmitted to detection network, the loss functions of two networks optimizes after being grouped together.The L2 of weights can also be added simultaneously Decay is used as regular terms, prevents over-fitting.

In step S330, detection-phase：Can using each two field picture in infrared video of the optical flow method generation without mark There can be the candidate region of target, candidate region image is sent directly into the Faster-RCNN detectors trained, obtains candidate Region whether there is the testing result of small objects, and if there is small objects, the small objects are in whole image Coordinate range.

In one preferred embodiment, the step S330 may comprise steps of.

In step S331, if existed in the detection of the infrared video before without mark detected and tracked The target arrived, the target is estimated according to movement locus of the target in front cross frame or preceding N frames (N is the positive integer more than 2) The equation of motion.

In step S332, for each two field picture in infrared video to be detected, according to the equation of motion of the target, The target that the is detected possible position in this two field picture in prediction previous frame image, using the position as there may be Second candidate region image of small objects.

In one preferred embodiment, it can utilize the change in location of target in continuous 3 two field picture that mesh is calculated The average speed being marked in this section of motion, and the estimation using the average speed as the movement velocity to target in the current frame, Then the possibility position of the target in the current frame is obtained in the position of former frame according to the movement velocity of the target and the target Put, generation there may be the second candidate region image of small objects.But the disclosure is not limited to this.

In step S333, for each two field picture in infrared video to be detected, each frame is obtained using optical flow method The pixel that image moves relative to rear several two field pictures, union is taken to obtain the pixel finally moved.

In one preferred embodiment, remember that present frame that the test data is concentrated is f0, next frame f1, the 5th Frame is f5, and the tenth frame is f10.F0 and the light stream figure of f1, f5, f10 interframe are calculated respectively, and threshold value t0=0.3 can be set, by light Pixel of the ratio more than t0 for flowing amplitude and maximum amplitude is labeled as there occurs movement, remembers the pixel that f0 moves relative to f1 For R₀₁, the pixel moved relative to the 5th frame is R₀₅, the pixel moved relative to the tenth frame is R0₍₁₀₎, then remember most The pixel moved eventually is mov_map=R₀₁∪R₀₅∪R₀₍₁₀₎。

In step S334, the generation campaign pixel in each two field picture is concentrated to the test data, carried out at image Operation and expansive working are opened in reason field, obtain the first candidate region image that may be moved in each two field picture.

Open operation and expansive working here, can utilize, by too small connected region remove and it is smooth after, the company that will obtain The frame in logical region takes out the first candidate region of the as frame.

Due to candidate region (herein including above-mentioned the first candidate region and/or the second candidate region) phase mutual respect mostly It is folded, to reduce redundancy and improving speed, it is preferred that non-maximum restraining can be used to the candidate region of output, in an implementation Example in, can set friendship and than threshold value be 0.7 (numerical value be for example, be not used to limit the disclosure).Non-maximum restraining After can use confidence level highest 50 (numerical value is for example, being not used to limit the disclosure) candidate region be used for mesh Mark detection.

In step S335, for each two field picture of test data concentration, the first candidate region by caused by And/or second candidate region image be sent directly into the Faster-RCNN detectors trained, export as first candidate region And/or second candidate region whether there is small objects, if it does, the coordinate model of the small objects can also be exported simultaneously Enclose.

Fig. 5 be using the method described in embodiment of the present invention detected under simple background 2 framves at 2.6km nobody Machine；Fig. 6 is that 1 frame unmanned plane at 2.6km is detected under complex background；Fig. 7 is to be detected under simple background at 2.6km 1 frame unmanned plane.

As shown in figure 8, in step S800, infrared video is gathered, is built according to the infrared video for described in detecting The test data set of target.

In step S810, the first candidate region of each two field picture of the test data concentration is obtained using optical flow method.

In step S820, first candidate region is inputted to the Faster-RCNN detector acquisitions trained and examined Survey result.

Wherein, the step S800-S820 is referred to above-mentioned Fig. 2 step S200-S220, will not be repeated here.

In one preferred embodiment, methods described can also include：According to the target preceding default frame fortune Dynamic rail mark estimates the equation of motion of the target；According to the equation of motion of the target, it is each to predict that the test data is concentrated Second candidate region of two field picture；Second candidate region is inputted to the Faster-RCNN detector acquisitions institute trained State testing result.

In step S830, the target is tracked according to the testing result and preset rules.

As an example, according to the testing result, if detecting small objects really, according to the preset rules pair Target is tracked.

As an example, the preset rules of embodiment of the present invention can be：Setting system initial state is the first shape State；When system is in the first state, when detecting the target in the candidate region of next frame infrared video, system is jumped Go to the second state；When system is in second state, when in the candidate region of next frame infrared video and its adjacent domain When being not detected by the target, system jumps to the first state；When system is in second state, when red in next frame When the candidate region of outer video and/or adjacent domain detect the target, system is maintained at second state；When system connects When continuous first default frame number is in second state, system jumps to the third state；When system is in the third state, When detecting the target in the candidate region of next frame infrared video and/or adjacent domain, system is maintained at the described 3rd State；When system is in the third state, it is not detected by when in the candidate region of next frame infrared video and its adjacent domain During the target, system jumps to the 4th state；When system is in four state, as the candidate in next frame infrared video When region and/or adjacent domain detect the target, system jumps to the third state；System is in the 4th state When, when being not detected by the target in the candidate region of next frame infrared video and its adjacent domain, system is maintained at described 4th state；When system continuous second, which presets frame number, is in four state, system jumps to the first state.

A specific example such as following description for Fig. 9 of the embodiment of the present invention to the preset rules.Assuming that institute It is non-tracking mode to state first state, and second state is Pre-tracking state, and the third state is tracking mode, described Four states are pre- missing state.Illustrated below so that target to be identified is small objects as an example, but the disclosure is not limited to This, it can apply to the target of arbitrary size.

As shown in figure 9, for each target occurred in infrared video, it is assumed that the target is in four possible shapes State, it is respectively：Non- tracking mode, Pre-tracking state, tracking mode, pre- missing state.System initial state can be set for not Tracking mode.If being detected small objects first in some candidate region of detection-phase infrared video, system is immediately Jump to Pre-tracking state.

When system is in non-tracking mode, institute is not detected by the next frame infrared video that the test data is concentrated When stating small objects, system is maintained at non-tracking mode.When system is in non-tracking mode, concentrated in the test data When detecting the small objects in next frame infrared video, system never tracking mode jumps to Pre-tracking state.Work as system During in Pre-tracking state, when detecting the small objects in the next frame infrared video that the test data is concentrated, it is System is maintained at Pre-tracking state；And when the continuous N of system (N is positive integer) frame is in Pre-tracking state, system is from Pre-tracking shape State jumps to tracking mode.When system is in Pre-tracking state, in the next frame infrared video that the test data is concentrated When being not detected by the small objects, system is from Pre-tracking state transition to non-tracking mode.When system is in tracking mode, When detecting the small objects in the next frame infrared video that the test data is concentrated, system is maintained at tracking mode. When system is in tracking mode, the small objects are not detected by the next frame infrared video that the test data is concentrated When, system jumps to pre- missing state from tracking mode.When system is in pre- missing state, concentrated in the test data When detecting the small objects in next frame infrared video, system is from pre- missing state transition to tracking mode.At system When pre- missing state, when being not detected by the small objects in the next frame infrared video that the test data is concentrated, it is System is maintained at pre- missing state；When continuous N (M is positive integer) frame is in pre- missing state, system from pre- missing state transition to Non- tracking mode.It is considered that it is all that small objects are traced to that system, which is in tracking mode and pre- missing state,.

Description for above-mentioned Fig. 9 for the next frame infrared video in the test data it should be noted that carry out During detection, for the same small objects by basis, because the interval time of infrared video shooting is shorter, and it is general infrared Small objects negligible amounts in video, and movement locus is continuous, and next frame infrared video is described with respect to former frame infrared video Small objects are moved to the adjacent domain of the candidate region of former frame infrared video, therefore, judge that the small objects former frame is red The adjacent domain of some candidate region of outer video is in next frame video with the presence or absence of the small objects so as to be realized to it Tracking.Here the definition of adjacent domain can also come from homophony according to the size of images to be recognized and the size of target to be identified It is whole.For example, when system is in Pre-tracking state, as long as not detecting small mesh in adjacent domain in next frame infrared video Mark, then system just jumps to non-tracking mode；If the adjacent domain in next frame infrared video remains able to detect micro- Small object, then system is maintained at Pre-tracking state.For another example when system is in tracking mode, regarded if next frame is infrared Frequency is not detected by small objects in adjacent domain, then system jumps to pre- missing state, if next frame infrared video is neighbouring Region remains able to detect small objects, then system is maintained at tracking mode.When system is in pre- missing state, if under One frame infrared video detected small objects again in adjacent domain, then system jumps back to tracking mode, if next frame is infrared Video is still not detected by small objects in adjacent domain, then system is maintained at pre- missing state.

It is further to note that above-mentioned adjacent domain is for dynamic small objects, but sometimes small mesh Mark may also be subjected to displacement, such as when small objects are unmanned plane, it may hover in aerial, can now pass through detection The small objects of next frame infrared video candidate region residing in former frame infrared video whether there is the small objects Judged.

M and N value can reasonably be selected according to application scenarios in above-described embodiment.If N chooses larger, energy Enough ensure that the tracking mode of system is reliable, but tracking can not be jumped at all by setting excessive N to may result in system State, but be missing between state and Pre-tracking state and redirecting always.Similar, if M selections are smaller, can ensure to track Epidemic situation comparison is reliable, but excessive M can cause algorithm to be constantly between tracking mode and Pre-tracking state, reduce tracking Effect.

As an example, selecting N to be equal to 4, M in embodiment of the present invention is equal to 4, i.e. continuous 4 frame of system is all in Pre-tracking During state, then system jumps to tracking mode.When continuous 4 frame of system is all in pre- missing state, then system jumps to missing shape State.

Exemplary means

After the method for exemplary embodiment of the invention is described, next, exemplary to the present invention with reference to figure 10 The device for being used for target detection tracking of embodiment illustrates.

Referring to Figure 10, it is schematically shown that the knot of the device according to an embodiment of the invention for target detection tracking Structure schematic diagram, the device, which is generally disposed at, to be run in the equipment of computer program, for example, being somebody's turn to do in embodiment of the present invention Device can be arranged in the equipment such as desktop computer or server, and certainly, the device can also be arranged at notes type calculating In the machine even equipment such as tablet personal computer.

The device of embodiment of the present invention can include：Test data set structure module 1000, candidate region acquisition module 1010 and module of target detection 1020.The modules included by the device are illustrated respectively below.

Test data set structure module 1000 can be used for gathering infrared video, be built according to the infrared video for examining Survey the test data set of the target.Included particular content and specific manifestation form etc. may refer to above method reality The associated description in mode is applied, is not repeated.

Candidate region acquisition module 1010 can be used for obtaining each two field picture of the test data concentration using optical flow method The first candidate region.

Module of target detection 1020, which can be used for inputting first candidate region to the Faster-RCNN trained, to be examined Survey device and obtain testing result.

As an example, described device can also include training module.Wherein described training module may further include：Instruction Practice data set construction unit, for building the training dataset for detecting the target, the instruction according to the infrared video Practicing data set includes categorized data set and detection data set；Classifier training unit, for being trained according to the categorized data set The target classification network being pre-designed；Detector training unit, for according to the target classification network that trains and described Detect data training and practice the Faster-RCNN detectors.

Optionally, the training dataset construction unit can include：Categorized data set builds subelement, for obtaining State in infrared video be used for build the training dataset each two field picture positive sample and its positive sample label, negative sample and Its negative sample label, builds the categorized data set；Move pixel and obtain subelement, described in being obtained using the optical flow method It is used to build the pixel that each two field picture of the training dataset moves in infrared video；It is single to detect data set structure Member, the pixel for being moved according to each two field picture, which obtains the training data and concentrated in each two field picture, has institute The topography and corresponding coordinate information for stating target form the detection data set.

Optionally, the classifier training unit can include：Grader structure design subelement, for designing a convolution Neutral net is as the target classification network；Classifier training subelement, for according to the designed convolutional Neural net Network and the categorized data set, the target classification network is trained using stochastic gradient descent method

Optionally, the detector training unit can include：Detector structure design subelement, it is described for designing The convolution Rotating fields of Faster-RCNN detectors are identical with the convolution Rotating fields of the target classification network；Detector initializes Subelement, for initializing the Faster-RCNN using the convolutional layer network parameter of the target classification network trained The convolutional layer network parameter of detector；Detector trains subelement, for the detection data set to be inputted to designed institute Faster-RCNN detectors are stated, the Faster-RCNN detectors are trained using the method for joint training.

In one preferred embodiment, the detector training unit can also include：Anchor point sets subelement, uses The setting of anchor point in the RPN networks of Faster-RCNN detectors according to the target modification detected.

In another preferred embodiment, described device can also include the second candidate region generation module, wherein Second candidate region generation module includes：Equation of motion generation unit, for according to the target preceding default frame fortune Dynamic rail mark estimates the equation of motion of the target；Estimation generates candidate region unit, for the motion according to the target Equation, predict that the test data concentrates the second candidate region of each two field picture；Estimation detection unit, for by described in Second candidate region is inputted to testing result described in the Faster-RCNN detector acquisitions trained.

In one preferred embodiment, described device can also include：Target tracking module, for according to the inspection Survey result and preset rules are tracked to the target.

In one preferred embodiment, the preset rules can be：Setting system initial state is first state； When system is in the first state, when detecting the target in the candidate region of next frame infrared video, system redirects To the second state；When system is in second state, when in the candidate region of next frame infrared video and its adjacent domain not When detecting the target, system jumps to the first state；When system is in second state, when infrared in next frame When the candidate region of video and/or adjacent domain detect the target, system is maintained at second state；When system is continuous When first default frame number is in second state, system jumps to the third state；When system is in the third state, when When detecting the target in the candidate region of next frame infrared video and/or adjacent domain, system is maintained at the 3rd shape State；When system is in the third state, when being not detected by institute in the candidate region of next frame infrared video and its adjacent domain When stating target, system jumps to the 4th state；When system is in four state, when the candidate regions in next frame infrared video When domain and/or adjacent domain detect the target, system jumps to the third state；System is in the 4th state When, when being not detected by the target in the candidate region of next frame infrared video and its adjacent domain, system is maintained at described 4th state；When system continuous second, which presets frame number, is in four state, system jumps to the first state.

Concrete operations performed by modules and/or unit and/or subelement may refer to above method embodiment In be directed to step associated description, be not repeated.

Figure 11 shows the frame suitable for being used for the exemplary computer system/server 110 for realizing embodiment of the present invention Figure.The computer system/server 110 that Figure 11 is shown is only an example, to the function of the embodiment of the present invention and should not be made With range band come any restrictions.

As shown in figure 11, computer system/server 110 is showed in the form of universal computing device.Computer system/clothes The component of business device 110 can include but is not limited to：One or more processor or processing unit 1101, system storage 1102, the bus 1103 of connection different system component (including system storage 1102 and processing unit 1101).

Computer system/server 110 typically comprises various computing systems computer-readable recording medium.These media can be appointed The usable medium what can be accessed by computer system/server 110, including volatibility and non-volatile media, movably With immovable medium.

System storage 1102 can include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 11021 and/or cache memory 11022.Computer system/server 110 may further include Other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, ROM 11023 It can be used for reading and writing immovable, non-volatile magnetic media (not showing in Figure 11, commonly referred to as " hard disk drive ").Although Not figure 11 illustrates, can provide for may move non-volatile magnetic disk (such as " floppy disk ") read-write disc driver, And to may move the CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write. In the case of these, each driver can be connected by one or more data media interfaces with bus 1103.System stores At least one program product can be included in device 1102, the program product has one group of (for example, at least one) program module, this A little program modules are configured to perform the function of various embodiments of the present invention.

Program/utility 11025 with one group of (at least one) program module 11024, can be stored in and be for example In system memory 1102, and such program module 11024 includes but is not limited to：Operating system, one or more apply journey Sequence, other program modules and routine data, network environment may be included in each or certain combination in these examples Realize.Program module 11024 generally performs function and/or method in embodiment described in the invention.

Computer system/server 110 (such as keyboard, sensing equipment, can also show with one or more external equipments 1104 Show device etc.) communication.This communication can be carried out by input/output (I/O) interface 1105.Also, computer system/server 110 can also by network adapter 1106 and one or more network (such as LAN (LAN), wide area network (WAN) and/ Or public network, such as internet) communication.As shown in figure 11, network adapter 1106 passes through bus 1103 and department of computer science Other modules (such as processing unit 1101) communication of system/server 110.It should be understood that although not shown in Figure 11, Ke Yijie Unified Electronic Computer System/server 110 uses other hardware and/or software module.

Processing unit 1101 is stored in the computer program in system storage 1102 by operation, so as to perform various work( It can apply and data processing, for example, performing the instruction for realizing each step in above method embodiment；Specifically, Processing unit 1101 can be to be stored in execution system memory 1102 computer program, and when the computer program is performed, Following instructions are run：Infrared video is gathered, the test data set for detecting the target is built according to the infrared video； The first candidate region of each two field picture of the test data concentration is obtained using optical flow method；First candidate region is inputted To the Faster-RCNN detector acquisition testing results trained.Optionally, following instructions be can also carry out：According to the inspection Survey result and preset rules are tracked to the target.The specific implementation of each step may refer to above-described embodiment, This is not repeated to illustrate.

One specific example of computer-readable storage medium of embodiment of the present invention is as shown in figure 12.

Figure 12 medium is CD 120, is stored thereon with computer program (i.e. program product), and the program is held by processor During row, described each step can be realized in above method embodiment, for example, collection infrared video, infrared is regarded according to described Frequency builds the test data set for detecting the target；The test data is obtained using optical flow method and concentrates each two field picture First candidate region；First candidate region is inputted to the Faster-RCNN detector acquisition testing results trained；Can Choosing, following step can also be realized：The target is tracked according to the testing result and preset rules.Each step Specific implementation may refer to above-described embodiment, not be repeated.

It should be noted that although be referred in above-detailed for target detection tracking device some units/ Module or subelement/module, but it is this division be merely exemplary it is not enforceable.In fact, according to the present invention's Embodiment, the feature and function of two or more above-described units/modules can be specific in a units/modules Change.Conversely, the feature and function of an above-described units/modules can be further divided into by multiple units/modules Lai Embody.

In addition, although the operation of the inventive method is described with particular order in the accompanying drawings, still, this do not require that or Hint must perform these operations according to the particular order, or the operation having to carry out shown in whole could realize it is desired As a result.Additionally or alternatively, it is convenient to omit some steps, multiple steps are merged into a step and performed, and/or by one Step is decomposed into execution of multiple steps.

Although describe spirit and principles of the present invention by reference to some embodiments, it should be appreciated that, this Invention is not limited to disclosed embodiment, and the division to each side does not mean that the feature in these aspects can not yet Combination is to be benefited, and this division is merely to the convenience of statement.It is contemplated that cover appended claims spirit and In the range of included various modifications and equivalent arrangements.

Claims

1. a kind of method for target detection tracking, including：

Infrared video is gathered, the test data set for detecting the target is built according to the infrared video；

The first candidate region of each two field picture of the test data concentration is obtained using optical flow method；And

First candidate region is inputted to Faster-RCNN detector acquisition testing results.

2. according to the method for claim 1, wherein, methods described also includes training step, and the training step includes：

Training dataset for detecting the target is built according to the infrared video, the training dataset includes classification number According to collection and detection data set；

The target classification network being pre-designed according to categorized data set training；And

The target classification network and the detection data training according to training practice the Faster-RCNN detectors.

3. the method according to claim 11, wherein, it is described to be built according to the infrared video for detecting the target Training dataset includes：

Obtain and be used for the positive sample and its positive sample mark that build each two field picture of the training dataset in the infrared video Label, negative sample and its negative sample label, build the categorized data set；

Obtained in the infrared video using the optical flow method and transported for building each two field picture of the training dataset Dynamic pixel；And

The pixel moved according to each two field picture obtains the training data and concentrated in each two field picture in the presence of described The topography of target and corresponding coordinate information form the detection data set.

4. the method according to claim 11, wherein, the target classification network and the detection that the basis trains Data set trains the Faster-RCNN detectors to include：

The convolution Rotating fields for designing the Faster-RCNN detectors are identical with the convolution Rotating fields of the target classification network；

The Faster-RCNN detectors are initialized using the convolutional layer network parameter of the target classification network trained Convolutional layer network parameter；And

The detection data set is inputted to the designed Faster-RCNN detectors, instructed using the method for joint training Practice the Faster-RCNN detectors.

5. according to the method for claim 1, wherein, methods described also includes：

The equation of motion of the target is estimated in the movement locus of preceding default frame according to the target；

According to the equation of motion of the target, predict that the test data concentrates the second candidate region of each two field picture；And

Second candidate region is inputted to testing result described in the Faster-RCNN detector acquisitions trained.

6. according to the method for claim 1, wherein, methods described also includes：

The target is tracked according to the testing result and preset rules.

7. according to the method for claim 6, wherein, the preset rules are：

Setting system initial state is first state；

When system is in the first state, when detecting the target in the candidate region of next frame infrared video, system Jump to the second state；

When system is in second state, when being not detected by institute in the candidate region of next frame infrared video and its adjacent domain When stating target, system jumps to the first state；

When system is in second state, when detecting institute in the candidate region of next frame infrared video and/or adjacent domain When stating target, system is maintained at second state；

When system continuous first, which presets frame number, is in second state, system jumps to the third state；

When system is in the third state, when detecting institute in the candidate region of next frame infrared video and/or adjacent domain When stating target, system is maintained at the third state；

When system is in the third state, when being not detected by institute in the candidate region of next frame infrared video and its adjacent domain When stating target, system jumps to the 4th state；

When system is in four state, when detecting institute in the candidate region of next frame infrared video and/or adjacent domain When stating target, system jumps to the third state；

When system is in four state, when being not detected by institute in the candidate region of next frame infrared video and its adjacent domain When stating target, system is maintained at the 4th state；And

When system continuous second, which presets frame number, is in four state, system jumps to the first state.

8. a kind of device for target detection tracking, including：

Test data builds module, for gathering infrared video, is built according to the infrared video for detecting the target Test data set；

Candidate region acquisition module, for obtaining the first candidate regions of each two field picture of the test data concentration using optical flow method Domain；

Module of target detection, examined for first candidate region to be inputted to the Faster-RCNN detector acquisitions trained Survey result.

9. device according to claim 8, wherein, described device also includes training module, wherein the training module bag Include：

Training dataset construction unit, for building the training dataset for detecting the target according to the infrared video, The training dataset includes categorized data set and detection data set；

Classifier training unit, for the target classification network being pre-designed according to categorized data set training；

Detector training unit, described in being practiced according to the target classification network and the detection data training that train Faster-RCNN detectors.

10. device according to claim 9, wherein, the training dataset construction unit includes：

Categorized data set builds subelement, is used for each frame for building the training dataset in the infrared video for obtaining The positive sample and its positive sample label of image, negative sample and its negative sample label, build the categorized data set；

Move pixel and obtain subelement, be used to build the training number in the infrared video for obtaining using the optical flow method The pixel moved according to each two field picture of collection；

Data set structure subelement is detected, the pixel for being moved according to each two field picture obtains the training data The topography that the target in each two field picture be present and corresponding coordinate information is concentrated to form the detection data set.

11. device according to claim 9, wherein, the detector training unit also includes：Anchor point sets subelement, Setting for anchor point in the RPN networks of Faster-RCNN detectors according to the target modification detected.

12. device according to claim 8, wherein, described device also includes the second candidate region generation module, wherein institute Stating the second candidate region generation module includes：

Equation of motion generation unit, for estimating the motion side of the target in the movement locus of preceding default frame according to the target Journey；

Estimation generates candidate region unit, for the equation of motion according to the target, predicts that the test data is concentrated Second candidate region of each two field picture；

Estimation detection unit, obtained for second candidate region to be inputted to the Faster-RCNN detectors trained Take the testing result.

13. a kind of electronic equipment, including memory, processor and its storage are on a memory and the meter that can run on a processor Calculation machine program, it is characterised in that the side as described in any in claim 1-7 is realized during the computing device described program Method.

14. a kind of computer-readable storage medium, is stored thereon with computer program, the program is realized when being executed by processor Method as described in any in claim 1-7.