CN109389086A

CN109389086A - Detect the method and system of unmanned plane silhouette target

Info

Publication number: CN109389086A
Application number: CN201811174109.1A
Authority: CN
Inventors: 阿孜古丽·吾拉木; 施祖贤; 张德政; 栗辉; 陈天傲; 杨容季
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2018-10-09
Filing date: 2018-10-09
Publication date: 2019-02-26
Anticipated expiration: 2038-10-09
Also published as: CN109389086B

Abstract

The invention discloses a kind of method and systems for detecting unmanned plane silhouette target.Wherein, this method comprises: judging whether the reference frame of target and the difference value of present frame are more than threshold value, wherein reference frame is the adjacent previous frame of present frame；Reference frame and the respective feature of present frame are extracted if difference value is more than threshold value；The feature of reference frame is transmitted to present frame by light stream network model；Present frame is combined according to different default weights as Enhanced feature from the feature passed over from reference frame, wherein weight is changeless space weight in feature channel；Detection Enhanced feature obtains the result of object detection and recognition and the result of semantic segmentation.The present invention, which solves the prior art, there is technical issues that processing accuracy and efficiency cannot when carrying out object detection and recognition to unmanned plane image.

Description

Detect the method and system of unmanned plane silhouette target

Technical field

The present invention relates to the unmanned plane silhouette target detection based on deep learning and field is identified, in particular to one The method and system of kind detection unmanned plane silhouette target.

Background technique

Detection and identification technology based on unmanned aerial vehicle remote sensing images data just gradually enter civil field from military field, One can quick and precisely obtain image information, the method that makes full use of complex information in image is concerned, target detection with Identification is the vital task of image procossing and pattern-recognition, and earliest period mainly identifies target object by human eye, is expended a large amount of Manpower, material resources and financial resources, and inefficiency；It develops into a few years ago and carries out target identification with machine learning, i.e., in extraction image Scale invariant features transform (Scale-invariant feature transform, SIFT), histograms of oriented gradients (Histogram of Oriented Gradient, HOG), acceleration robust feature (Speeded Up Robust Features, ) etc. SURF features are input in classifier and are classified by the different feature of artificial combination, in conjunction with corresponding method into The positioning of row target, although greatly improving efficiency, there is still a need for artificial participation, detection effects to be not so good as artificial detection；In recent years, With the development of deep learning, the prior art also occurs based on some image object detection algorithms of deep learning, for example, R-CNN, Some disadvantages are individually present in AttentionNet, YOLO, SSD, Faster R-CNN, Mask R-CNN etc., these methods, side Method 1:Mask R-CNN is example segmentation (Instance segmentation) algorithm, can be used to do " target inspection Survey ", " object instance segmentation ", " detection of target critical point ".It is used primarily in the example segmentation of still image, but in video Directly be primarily present both sides problem using the algorithm: a, calculation amount are huge, can not be real-time；B, since Mask R-CNN is needle To still image design, directly be there is into many erroneous detections and missing inspection in video in its application；Method 2: it is carried out using YOLO v3 It is high performance video cards with what is matched at high speed, cost is very high although object detection and recognition speed is fast.In addition, and Mask R-CNN is the same, it is designed for still image, its application is directly had many erroneous detections and missing inspection in video. Although method 3:T-CNN can be by temporal information and contextual information correction effect, if still image itself detects The detection effect of frame is bad, then is difficult to carry out subsequent correction, in addition, T-CNN is computationally intensive, is unable to reach in real time.Method 4: Deep feature flow is unable to reach in real time, and does not account for temporal information and contextual information, be easy there are missing inspection and Erroneous detection.The project high to required precision is unable to reach expected requirement.Method 5:Impression Network for Video Object Detection solves the problems, such as to compare limitation, only merges multiframe feature, while improving speed and performance；Although speed It is promoted, but can not be real-time, while lacking post-processing operation.Although the lightweight CNN of method 6:NoScope can accelerate single frames Detection overhead, but precision is relatively low, and without the Fusion Features using adjacent interframe, precision is different surely to reach requirement. Meanwhile the post-processing of detection shortage is carried out to promote precision for video.

In conclusion being blocked since there are motion blurs in video, metamorphosis diversity, illumination variation diversity etc. Problem can not obtain good testing result merely with the target in image object detection technique detection video, and precision is inadequate It is frequently present of missing inspection and erroneous detection.And these methods are used alone and all cannot be considered in terms of speed and precision.

For the above-mentioned prior art, when unmanned plane target image is detected and identified, there are processing accuracies and efficiency The problem of cannot taking into account, currently no effective solution has been proposed.

Summary of the invention

The embodiment of the invention provides a kind of method and systems for detecting unmanned plane silhouette target, at least to solve existing skill Art there is technical issues that processing accuracy and efficiency cannot when carrying out object detection and recognition to unmanned plane image.

According to an aspect of an embodiment of the present invention, a kind of method for detecting unmanned plane image is provided, which is characterized in that Whether the difference value of the reference frame and present frame that judge image is more than threshold value, wherein the reference frame is the phase of the present frame Adjacent previous frame, the image are made of multiple targets；The reference frame is extracted if the difference value is more than threshold value and described is worked as The respective feature of previous frame；The feature of the reference frame is transmitted to the present frame by light stream network model；It will be described current Frame is combined according to different default weights as Enhanced feature from the feature passed over from the reference frame, wherein institute Stating weight is changeless space weight in feature channel；It detects the Enhanced feature and obtains the target detection of the image With the result of identification and the result of semantic segmentation.

Further, it includes: to the reference that whether the difference value of the reference frame and present frame that judge target, which is more than threshold value, Frame and the present frame carry out inter-frame difference operation and obtain difference value；Judge whether the difference value is more than threshold value；If described Difference value is less than threshold value and the testing result of the reference frame is then transmitted to the present frame.

Further, extracting the reference frame and the respective feature of the present frame includes: to be mentioned by feature extraction network Take the reference frame and the respective feature of the present frame.

Further, it detects the Enhanced feature and obtains the result of the object detection and recognition and the result of semantic segmentation It include: to detect the Enhanced feature by high-precision detection network to obtain the detection and recognition result and semantic segmentation of the image As a result, wherein the high-precision detection network is the algorithm that a kind of pair of still image is detected and identified.

Further, it detects the Enhanced feature and obtains the object detection and recognition result and semantic segmentation of the image As a result include: after inhibited by before and after frames, high confidence level tracking and spatial position correction to the target detection of the image with Identification and semantic segmentation result carry out the promotion of precision, wherein the before and after frames inhibition is by the detection on whole section of image As a result for statistical analysis, it sorts to all detection windows according to confidence level, score height and the low window of score is selected, by score Low window subtracts particular value, for making the window for detecting right and wrong space out；The high confidence level tracking is to utilize Track algorithm chooses the high target of confidence level as tracking starting point, from the tracking starting point forward backward whole in testing result A enterprising line trace of image generates pursuit path, causes confidence level to be lower than if tracking the variation of target during tracking When preset threshold, then stop tracking；Then the high target of confidence level is chosen from remaining target as the tracking starting point, weight Aforesaid operations are carried out again；If the window is consistent with the window occurred on pursuit path before, directly skip, most Afterwards, the result of object detection and recognition and the result of semantic segmentation are corrected using the pursuit path；The space bit It sets to rectify and is exactly based on tracking result the testing result around each position is compared, the position as IOU > 0.5 is mesh Target final position, IOU are the parts of two regions overlapping divided by the Set-dissection in two regions.

Further, the reference frame is using first frame as the reference frame, then with having synthesized the of Enhanced feature Two frames replace the reference frame, and so on, or give a time interval t_in, first frame is used as reference frame, t first_in The reference frame of t moment is exactly t-t after moment_inThe frame at moment.

Further, the feature of the reference frame is transmitted to the present frame by light stream network model includes: by light Flow field M_y→x=F (x, y) carries out estimation by the light stream network model, wherein × it is reference frame, y is present frame；Root The feature of the reference frame is transmitted to present frame according to light stream propagation function, wherein the light stream propagation function is f_x→y=W (f_x, M_y→x)=W (f_x, F (x, y)), wherein W () is a bilinear propagation function, is each led to for being applied to feature On all positions in road, f_x→yIndicate the feature that frame y is traveled to from frame x.

According to another aspect of an embodiment of the present invention, a kind of system for detecting unmanned plane silhouette target is additionally provided, comprising: Judgment module, for judging whether reference frame and the difference value of present frame of image are more than threshold value, wherein the reference frame is institute State the adjacent previous frame of present frame；Extraction module, for extracting the reference frame and described if the difference value is more than threshold value The respective feature of present frame；Transfer module, for the feature of the reference frame to be transmitted to described work as by light stream network model Previous frame；Feature enhancing module, for presetting the present frame according to different from the feature passed over from the reference frame Weight is combined as Enhanced feature, wherein the weight is changeless space weight in feature channel；Detect mould Block, for detect the Enhanced feature obtain the image object detection and recognition result and semantic segmentation result.

Other side according to an embodiment of the present invention has been also provided to a kind of storage medium, has protected on the storage medium Computer program stored executes the upper method when described program is run.

Other side according to an embodiment of the present invention has been also provided to a kind of processor, has held when described program is run The above-mentioned method of row.

It in embodiments of the present invention, whether is more than threshold value using the difference value of the reference frame and present frame that judge target, In, the reference frame is the adjacent previous frame of the present frame；If the difference value be more than threshold value if extract the reference frame and The respective feature of present frame；The feature of the reference frame is transmitted to the present frame by light stream network model；By institute Present frame is stated to be combined according to different default weights as Enhanced feature from the feature passed over from the reference frame, In, the weight is changeless space weight in feature channel；It detects the Enhanced feature and obtains the target detection With the mode of the result of the result and semantic segmentation of identification, realizes and take into account efficient while guaranteeing precision to complex environment Lower unmanned aerial vehicle remote sensing images carry out object detection and recognition, and fine profile can be carried out to the target detected and is described, is being mentioned While rising efficiency to motion blur, block, video defocuses, metamorphosis diversity, illumination variation diversity etc. are no longer sensitive, With preferable robustness, generalization and real-time, and then solves the prior art and target detection is being carried out to unmanned plane image There is technical issues that processing accuracy and efficiency cannot when with identification.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 is a kind of method flow diagram for detecting unmanned plane silhouette target according to an embodiment of the present invention；

Fig. 2 is a kind of optional effective frame detector flow chart of unmanned plane image according to an embodiment of the present invention；

Fig. 3 is the image frame schematic diagram at unmanned plane image moment according to an embodiment of the present invention；

Fig. 4 is the image frame schematic diagram of unmanned plane image subsequent time according to an embodiment of the present invention；

Fig. 5 is unmanned plane image moment according to an embodiment of the present invention adjacent next reference frame schematic diagram；

Fig. 6 is a kind of object detection and recognition flow chart of Enhanced feature according to an embodiment of the present invention；

Fig. 7 is a kind of method flow diagram of optional detection unmanned plane silhouette target according to an embodiment of the present invention；

Fig. 8 is a kind of system schematic for detecting unmanned plane silhouette target according to an embodiment of the present invention.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.

It should be noted that term " first " in description and claims of this specification and above-mentioned attached drawing, " It two " is etc. to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " including " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.

According to embodiments of the present invention, a kind of embodiment of the method for detecting unmanned plane silhouette target is provided, needs to illustrate It is that step shown in the flowchart of the accompanying drawings can execute in a computer system such as a set of computer executable instructions, Also, although logical order is shown in flow charts, and it in some cases, can be to be different from sequence execution herein Shown or described step.

Fig. 1 is a kind of detection unmanned plane image method according to an embodiment of the present invention, as shown in Figure 1, this method includes such as Lower step:

Whether the difference value of step S102, the reference frame and present frame that judge image are more than threshold value, wherein reference frame is to work as The adjacent previous frame of previous frame；

Step S104 extracts reference frame and the respective feature of present frame if difference value is more than threshold value；

The feature of reference frame is transmitted to present frame by light stream network model by step S106；

Present frame is combined into from the feature passed over from reference frame according to different default weights by step S108 For Enhanced feature, wherein weight is changeless space weight in feature channel；

Step S1010, detection Enhanced feature obtain silhouette target detection and the result of identification and the result of semantic segmentation.

The above method can apply to unmanned plane in targets in ocean detection and identification, mine management, ground traffic control management, sea It is permeated with oil detection, unmanned aerial vehicle remote sensing images Objects extraction, automatic Pilot, medical image processing in administrative area, passes through the above method The problems such as simply and efficiently carrying out the acquisition of image information, and reducing existing de- frame, poor quality in video, also It is to reduce the fuzzy equal adverse effect caused by testing result.It can by reference to whether the difference value of frame and present frame is more than threshold value So that unmanned plane is when being detected, only detection changes big image frame, so that reducing operand accelerates arithmetic speed, and uses The transmitting that light stream network carries out feature can solve time in video to a certain extent, contextual information makes testing result At adverse effect, can be taken into account while guaranteeing precision efficient under complex environment unmanned aerial vehicle remote sensing images carry out target Detection and identification, the above method can also carry out fine profile to the target detected and describe；While raising efficiency pair Motion blur, block, video defocuses, metamorphosis diversity, illumination variation diversity etc. are no longer sensitive, there is preferable robust Property, generalization and real-time, to solve the prior art, when being detected and being identified to target image, there are processing accuracies The technical issues of cannot being taken into account with efficiency.

When above-mentioned unmanned plane carries out the detection of image valid frame, which includes that the valid frame based on difference extracts (spy Sign extract) and a kind of specific reference frame choose two parts, the purpose of the module is the efficiency in order to promote whole network, makes it It can reach the requirement of real-time detection in non-top hardware configuration, the selection for particular reference frame can at one In the embodiment of choosing, i.e., reference frame can be using first frame as reference frame, then with the second frame for having synthesized Enhanced feature Replace reference frame, and so on, or can be a given time interval t_in, first frame is used as reference frame, t first_inMoment The reference frame of t moment is exactly t-t afterwards_inThe frame at moment.

For the extracting method of the valid frame based on difference, in a kind of optional embodiment, i.e., to reference frame with work as Previous frame carries out inter-frame difference operation and obtains difference value；Judge whether difference value is more than threshold value；If difference value is less than threshold value The testing result of reference frame is transmitted to present frame.

Feature extraction can be passed through in an optional embodiment by extracting reference frame and the respective feature of present frame Network extracts reference frame and the respective feature of present frame.It can be ResNet101+FPN that this feature, which extracts network,.

After the feature for extracting reference frame and present frame in through the above steps, worked as using feature (information) enhancing of reference frame The feature of previous frame, and object detection and recognition is carried out using advanced high-precision detection network, while the tool of object can be obtained Body profile, also referred to as semantic segmentation, so that detection reaches higher precision.

Below by one optionally embodiment the detection of above-mentioned unmanned plane image valid frame is illustrated:

The effective frame detector of unmanned plane image is to compare reference frame and present frame, judges whether there is biggish difference It is different.It can choose or carry out the last time the target detection of Enhanced feature with Fixed Time Interval when choosing reference frame Frame with identification is as reference frame.Unmanned plane image is input in the effective frame detector of unmanned plane image, is detected in valid frame First frame is judged whether it is in device, the object detection and recognition module that Enhanced feature is directly sent to if it is first frame carries out Processing, obtained result directly export.It is effective that it is input to unmanned plane image together using first frame as reference frame x and subsequent frame y (x-y) is calculated in frame detector, whether is subjected to by the difference of preset two interframe of threshold decision, if two interframe Difference value is less than preset threshold, and the object detection and recognition module that present frame does not need incoming Enhanced feature is handled, only The testing result of reference frame need to be passed directly to present frame；If the difference value of two interframe is more than preset threshold, present frame is passed Enter the object detection and recognition module of Enhanced feature.The treatment process of the effective frame detector of unmanned plane image is as shown in Figure 2.Fig. 3 The necessity for carrying out valid frame detection is illustrated using the example of ocean, Fig. 3 is the image frame at certain moment, and Fig. 4 is subsequent time Image frame, Fig. 5 is certain moment adjacent next reference frame.It is clear that Fig. 3 and Fig. 4 is without too big variation, ship in image The mobile distance of body can almost be ignored, and many meters will be wasted by this moment carrying out object detection and recognition to Fig. 3 and Fig. 4 respectively again It calculates resource and Fig. 5 is passed to the mesh of Enhanced feature so the object detection results of Fig. 3 are just traveled to Fig. 4 by the embodiment of the present invention Mark detection is detected with identification.

In a kind of optional embodiment, detection Enhanced feature obtains the result and semantic segmentation of object detection and recognition Result can detect network detection by high-precision and carry out, wherein high-precision detection network is that a still image carries out The algorithm of detection and identification, but if be applied directly in video will due to do not have using in video time, on Hereafter etc. information cause detection effect bad.And light stream network is in order to avoid the shadow caused by detection such as motion blur in video It rings, by characteristic aggregation of the before and after frames on motion path into the feature of present frame, to enhance the feature of present frame, improves mesh The precision of mark detection and identification.Above-mentioned high-precision detection network is two stage object detection and recognition method, first stage It is called region and suggests network (Region Proposal Network, RPN, region proposes that network or region suggest network), uses To extract the bounding box of candidate target；Second stage using in candidate frame point of interest Chi Hualai extract feature then into Row classification and bounding box return, while generating a binary mask for each point of interest.Mask is by the space layout of an object It is encoded, unlike class label or frame, high-precision detection network can be aligned to make by the pixel of convolution Space structure is extracted with mask.

High-precision object detection and recognition network is the structure of Mask R-CNN, is carrying out the same of object detection and recognition When there are a parallel branch, which can carry out semantic segmentation to target while carrying out object detection and recognition, this The whole series are also referred to as example segmentation.

In an optional embodiment, the feature of reference frame can be transmitted to by present frame by light stream network model It include: by optical flow field M_y→x=F (x, y) carries out estimation by light stream network model, wherein x is reference frame, and y is current Frame；The feature of reference frame is transmitted to present frame according to light stream propagation function, wherein light stream propagation function is f_x→y=W (f_x, M_y→x)=W (f_x, F (x, y)), wherein W () is a bilinear propagation function, for being applied to each channel of feature On all positions, f_x→yIndicate the feature that frame y is traveled to from frame x.

As shown in fig. 6, the feature of the information enhancement present frame using adjacent preceding reference frame, then enhanced feature is inputted Detection network to the end carries out object detection and recognition, uses feature extraction network to extract respectively in present frame and reference frame first Respective feature；In order to enhance the feature of present frame, adjacent preceding reference frame and current is estimated using a kind of specific light flow network Then the feature of adjacent preceding reference frame is traveled to present frame according to light stream campaign by the movement of frame；Later by adjacent preceding reference frame Feature and the feature of present frame pass through preset weighting network together and be combined；The feature that finally combination is obtained passes It is sent in a kind of advanced high-precision detection network and object detection and recognition and semantic segmentation is carried out to present frame.

The above process is illustrated below by an optional embodiment:

The feature for enhancing present frame by light stream network, for reference frame x and present frame y, optical flow field M_y→x=F (x, y) is logical It is network-evaluated to cross a kind of specific light stream；The feature of reference frame is transmitted to present frame, can be defined according to light stream propagation function For f_x→y=W (f_x, M_y→x)=W (f_x, F (x, y)), wherein W () is a bilinear propagation function, for being applied to feature On all positions in each channel, f_x→yIndicate the feature that frame y is traveled to from frame x.The spy that present frame will be passed over from reference frame Sign is combined, so that illumination, posture, visual angle, non-rigid deformation etc. be avoided to influence.The present embodiment is in difference when combination The spatial position space weight taking different weights, and all feature channels shares is allowed to set, after combination feature is f′_y=w₁f_x→y+w₂f_y, w₁And w₂Indicate the significance level on each spatial position.Then the feature after combination is sent to below Detector in.

After detection Enhanced feature obtains the result of object detection and recognition and the result of semantic segmentation, a kind of optional It, can be by before and after frames inhibition, high confidence level tracking and spatial position correction to the knot of object detection and recognition in embodiment The result of fruit and semantic segmentation further promotes precision, wherein before and after frames inhibition is by the detection knot on whole section of image Fruit is for statistical analysis, sorts to all detection windows according to confidence level, selects score height and the low window of score, and score is low Window subtract particular value, for make detect right and wrong window space out；High confidence level tracking is calculated using tracking Method chooses the high target of confidence level as tracking starting point in testing result, and from tracking, starting point is enterprising in entire image backward forward Line trace generates pursuit path, if the variation for tracking target during tracking leads to confidence level lower than preset threshold, Then stop tracking；Then the high target of confidence level is chosen from remaining target as tracking starting point, repeats aforesaid operations； If window is consistent with the window occurred on pursuit path before, directly skip, finally, using pursuit path to detection Recognition result and meaning of one's words segmentation result are corrected；It rectifys and is exactly based on tracking result to the inspection around each position in spatial position Result is surveyed to compare, the position as IOU > 0.5 is the final position of target, IOU be the overlapping of two regions part divided by The Set-dissection in two regions.

Above-mentioned before and after frames inhibit, high confidence level tracking and spatial position can mutual any combination, repaired by result Just, can further reduce motion blur, block, video defocuses, metamorphosis diversity, illumination variation diversity etc. are to knot The influence of fruit.

Due to the problem of there may be erroneous detections in a small amount of frame of image, the information of single frames can not distinguish these erroneous detections, Most of testing result has balloon score in each frame of video, so erroneous detection can be considered as exceptional value by the present embodiment, then Inhibit their confidence level.It is for statistical analysis to the testing result on whole section of image, to all detection windows according to confidence level Sequence, selects that score is higher and the lower window of score, is likely to erroneous detection to the lower window of score, it is specific to subtract some Value improves the precision of target detection so that the confidence level of detection right and wrong spaces out.

The problem of reducing missing inspection in such a way that before and after frames feature enhances, but missing inspection effect continuous for multiframe is not Good, so the present embodiment carries out high confidence level tracking processing to testing result, the present embodiment chooses the target of detection highest scoring It as the starting point of tracking, is tracked in entire image video backward forward based on this starting point, generates pursuit path； Then highest scoring is selected track from remaining target, it should be noted that if this window is in pursuit path before In occurred, then directly skipping, next target is selected to be tracked；Algorithm iteration executes, and uses score threshold as eventually Only condition.Target recall rate can be improved in obtained pursuit path, and certain amendment is carried out to testing result.Finally, using with The target that track result detects each position around it compares, and when IOU > 0.5 is considered as the final position of target. After the combination of a variety of modified result modes, finally obtained ratio of precision only has in the result that object detection and recognition obtains Certain promotion can reduce the calculation amount of object detection and recognition, to reduce calculating cost under the premise of guaranteeing precision； Simultaneously can reduce in image due to motion blur, block, metamorphosis diversity, illumination variation diversity the problems such as bring Precision influences, and has sufficiently excavated the potential information (such as temporal information and front and back frame information) in image.

Above-mentioned entire identification process is illustrated with an optional embodiment below with reference to Fig. 7:

Using the feature of reference frame frame information enhancing present frame, target detection is carried out using advanced high-precision detection network With identification, simultaneously as the embodiment of the present invention, which detects network by advanced high-precision, can obtain the specific profile of target；Fusion The modified result (before and after frames inhibit, high confidence level tracks and the multiple combinations of spatial position correction) of method has sufficiently excavated image In useful information, and then object detection and recognition result is modified, through the invention can be in the premise for guaranteeing precision The lower calculation amount for reducing object detection and recognition, to reduce calculating cost；Simultaneously can reduce in image due to motion blur, Block, metamorphosis diversity, illumination variation diversity the problems such as bring precision influence, sufficiently excavated potential in image Information (such as temporal information and front and back frame information).

Calculus of differences is carried out to two frames by inter-frame difference operation, the corresponding pixel of different frame subtracts each other, and judges gray scale difference Square value, i.e., calculating (x-y)², wherein x and y respectively indicate reference frame and present frame, by given threshold to calculated result into The object detection and recognition module that present frame is passed to Enhanced feature is carried out mesh if present frame is more than given threshold by row judgement Mark detection and identification；Otherwise, the testing result of reference frame is transmitted to present frame.Wherein selection there are two types of reference frames, one is The first frame of the object detection and recognition of Enhanced feature will be carried out first as reference frame, then with the target for carrying out Enhanced feature Second frame of detection and identification replaces with reference frame, and so on；Another kind is to give a time interval t_in, first frame is first As reference frame, t_inThe reference frame of t moment is exactly t-t after moment_inThe frame at moment.

In the video of unmanned plane shooting, there are many approximate fixed scenes, such as the remote sensing shadow shot across the sea Picture, may only have sea in tens seconds, without any object, and at this time carrying out object detection and recognition frame by frame again will waste Very big computing resource, the embodiment of the present invention lose interest in sea, it is of interest that various objects present on sea. So the embodiment of the present invention, which chooses effective frame using the effective frame detector of unmanned plane image, carries out object detection and recognition, it is right Testing result is transmitted by adjacent preceding reference frame in non-effective frame.The purpose of the effective frame detector of unmanned plane image is exactly to accelerate The speed of image processing.Image has difference and little between temporal locality, that is, two frames of front and back, so if according to biography The method of system detected frame by frame using convolutional neural networks will the inefficiency of detection and seem to be not necessarily to. The difference operation principle of two interframe used in the embodiment of the present invention is simple, and calculation amount is small, can quickly detect the fortune in scene Moving-target.

Using the feature of the information enhancement present frame of adjacent preceding reference frame, then the inspection by the input of enhanced feature to the end Survey grid network carries out object detection and recognition, specifically, uses feature extraction network to extract respectively in present frame and reference frame first each From feature；In order to enhance the feature of present frame, adjacent preceding reference frame and present frame are estimated using a kind of specific light flow network Movement, the feature of adjacent preceding reference frame is then traveled into present frame according to light stream campaign, light stream network is exactly one given Reference frame x and present frame y, optical flow field M_y→x=F (x, y) is network-evaluated by a kind of specific light stream；The feature of reference frame is passed It is multicast to present frame, f can be defined as according to light stream propagation function_x→y=W (f_x, M_y→x)=W (f_x, F (x, y)), wherein W () is One bilinear propagation function, for being applied to all positions in each channel of feature, f_x→yIt indicates to travel to frame y from frame x Feature；The feature of adjacent preceding reference frame and the feature of present frame pass through preset weighting network together and are combined；It will It combines in the detection network of obtained feature transmission to the end and object detection and recognition is carried out to present frame, simultaneously as this hair The advanced high-precision that bright embodiment uses detects the characteristics of network, and the embodiment of the present invention can obtain the specific wheel of target simultaneously It is wide.

The method of object detection and recognition can be divided into two classes: by positioning, identify that the stage as recurrence task rolls up entirely Product method and will test positioning, target identification is divided into two stage method, the method precision in stage is relatively high, but speed ratio It is relatively slow, by the effective frame detector of unmanned plane image that the embodiment of the present invention designs, the speed of two stage object detection and recognition Degree greatly promotes.Object in video will receive motion blur, block, video defocuses, metamorphosis diversity, illumination variation are more The influence of sample etc., so the information of adjacent previous frame is transmitted to currently by the embodiment of the present invention using a kind of specific light stream network Frame carries out feature enhancing, then puts into enhanced feature and carries out Objective extraction in the network of object detection and recognition.

The modified result of fusion method uses the fusion of various ways, including before and after frames inhibit, high confidence level tracking, space Aligning.Before and after frames inhibit by for statistical analysis to the testing result on whole section of image, to all detection windows according to Confidence level sequence, selects that score is higher and the lower window of score, subtracts some specific value to the lower window of score, thus So that the window of detection right and wrong spaces out.High confidence level tracking is to choose to set in testing result using track algorithm Starting point of the highest target of reliability as tracking generates tracking from this starting point forward backward in the enterprising line trace of entire image Track, during tracking if when the variation of tracking target leads to confidence level lower than a certain threshold value (such as 0.1), stop with Track；Then starting point of the highest target of confidence level as tracking is chosen from remaining target, repeats aforesaid operations；If Certain window occurred on track before in subsequent tracking, directly skipped.Finally, using these pursuit paths to result into Row correction.Spatial position correction is compared using the target that tracking result detects each position around it, IOU > 0.5 When be considered as target final position.After the combination of a variety of modified result modes, finally obtained ratio of precision is only in mesh The result that mark detection is obtained with identification has certain promotion.

The embodiment of the invention also provides a kind of detection unmanned plane silhouette target system, which can pass through judgment module 82, extraction module 84, transmitting mould 86, feature enhancing module 88, detection module 810 realize its function.It should be noted that this hair A kind of map generation system of bright embodiment can be used for executing a kind of detection unmanned plane image provided by the embodiment of the present invention The method of goal approach, a kind of detection unmanned plane silhouette target of the embodiment of the present invention can also be mentioned through the embodiment of the present invention A kind of detection unmanned plane silhouette target system supplied executes.Fig. 8 is a kind of detection unmanned plane shadow according to an embodiment of the present invention As the schematic diagram of the system of target.As shown in figure 8, a kind of system for detecting unmanned plane silhouette target includes: a kind of to detect nobody Machine silhouette target system characterized by comprising

Judgment module 82, for judging whether reference frame and the difference value of present frame of target are more than threshold value, wherein reference Frame is the adjacent previous frame of present frame；

Extraction module 84, for extracting reference frame and the respective feature of present frame if difference value is more than threshold value；

Transfer module 86, for the feature of reference frame to be transmitted to present frame by light stream network model；

Feature enhancing module 88, for by present frame from the feature passed over from reference frame according to different default weights Being combined becomes Enhanced feature, wherein weight is changeless space weight in feature channel；

Detection module 810 obtains the result of object detection and recognition and the result of semantic segmentation for detecting Enhanced feature.

It is above-mentioned it is a kind of detection unmanned plane silhouette target system embodiment be and it is a kind of detect unmanned plane silhouette target method It is corresponding, so being repeated no more for beneficial effect.

The embodiment of the invention provides a kind of storage medium, storage medium includes the program of storage, wherein is run in program When control storage medium where equipment execute the above method.

The embodiment of the invention provides a kind of processor, processor includes the program of processing, wherein runs time control in program Equipment executes the above method where processor processed.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of method for detecting unmanned plane image characterized by comprising

Whether the difference value of the reference frame and present frame that judge image is more than threshold value, wherein the reference frame is the present frame Adjacent previous frame, the image is made of multiple targets；

The reference frame and the respective feature of the present frame are extracted if the difference value is more than threshold value；

The feature of the reference frame is transmitted to the present frame by light stream network model；

The present frame is combined to become from the feature passed over from the reference frame according to different default weights and is increased Strong feature, wherein the weight is changeless space weight in feature channel；

Detect the Enhanced feature obtain the image object detection and recognition result and semantic segmentation result.

2. the method according to claim 1, wherein judge target reference frame and present frame difference value whether Include: more than threshold value

Inter-frame difference operation is carried out to the reference frame and the present frame and obtains difference value；

Judge whether the difference value is more than threshold value；

The testing result of the reference frame is transmitted to the present frame if the difference value is less than threshold value.

3. the method according to claim 1, wherein extracting the reference frame and the respective feature of the present frame Include:

The reference frame and the respective feature of the present frame are extracted by feature extraction network.

4. obtaining the target detection the method according to claim 1, wherein detecting the Enhanced feature and knowing The result of other result and semantic segmentation includes:

The Enhanced feature, which is detected, by high-precision detection network obtains the detection and recognition result and semantic segmentation of the image As a result, wherein the high-precision detection network is the algorithm that a kind of pair of still image is detected and identified.

5. the method according to claim 1, wherein detecting the Enhanced feature obtains the target inspection of the image It surveys and includes: after the result of recognition result and semantic segmentation

By before and after frames inhibition, high confidence level tracking and spatial position correction to the object detection and recognition and semanteme of the image The promotion of segmentation result progress precision, wherein the before and after frames inhibition is by uniting to the testing result on whole section of image Meter analysis, sorts to all detection windows according to confidence level, selects score height and the low window of score, the low window of score is subtracted Particular value is removed, for making the window for detecting right and wrong space out；The high confidence level tracking is existed using track algorithm The high target of confidence level is chosen in testing result is used as tracking starting point, it is enterprising in entire image backward forward from the tracking starting point Line trace generates pursuit path, if the variation for tracking target during tracking leads to confidence level lower than preset threshold, Then stop tracking；Then the high target of confidence level is chosen from remaining target as the tracking starting point, is repeated above-mentioned Operation；If the window is consistent with the window occurred on pursuit path before, directly skip, finally, described in utilizing Pursuit path corrects the result of object detection and recognition and the result of semantic segmentation；It rectifys and is exactly based in the spatial position Tracking result compares the testing result around each position, and the position as IOU > 0.5 is the most final position of target It sets, IOU is the part of two regions overlapping divided by the Set-dissection in two regions.

6. the method according to claim 1, wherein the reference frame be using first frame as the reference frame, Then the reference frame is replaced with the second frame for having synthesized Enhanced feature, and so on, or give a time interval t_in, First frame is used as reference frame, t first_inThe reference frame of t moment is exactly t-t after moment_inThe frame at moment.

7. the method according to claim 1, wherein the feature of the reference frame is passed by light stream network model Being delivered to the present frame includes:

By optical flow field M_y→x=F (x, y) carries out estimation by the light stream network model, wherein x is reference frame, and y is to work as Previous frame；

The feature of the reference frame is transmitted to present frame according to light stream propagation function, wherein the light stream propagation function is f_x→y =W (f_x, M_y→x)=W (f_x, F (x, y)), wherein W () is a bilinear propagation function, each for being applied to feature On all positions in channel, f_x→yIndicate the feature that frame y is traveled to from frame x.

8. a kind of system for detecting unmanned plane silhouette target characterized by comprising

Judgment module, for judging whether reference frame and the difference value of present frame of image are more than threshold value, wherein the reference frame It is the adjacent previous frame of the present frame；

Extraction module, for extracting the reference frame and the respective spy of the present frame if the difference value is more than threshold value Sign；

Transfer module, for the feature of the reference frame to be transmitted to the present frame by light stream network model；

Feature enhancing module, for by the present frame from the feature passed over from the reference frame according to different default power It is combined again as Enhanced feature, wherein the weight is changeless space weight in feature channel；

Detection module, the result and semantic segmentation for obtaining the object detection and recognition of the image for detecting the Enhanced feature Result.

9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein described program right of execution Benefit require any one of 1 to 7 described in method.

10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit require any one of 1 to 7 described in method.