CN107066990B

CN107066990B - A kind of method for tracking target and mobile device

Info

Publication number: CN107066990B
Application number: CN201710309346.3A
Authority: CN
Inventors: 徐展; 万鹏飞; 张长定; 张伟
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2017-05-04
Filing date: 2017-05-04
Publication date: 2019-10-11
Anticipated expiration: 2037-05-04
Also published as: CN107066990A

Abstract

The invention discloses a kind of method for tracking target, this method executes in the mobile device for having camera function, comprising: the target position determined in initial frame is inputted according to user, wherein target position is expressed as the target frame at a surrounding target center；Tracker and detector are generated based on the target position training in initial frame, wherein tracker is suitable for tracking the target in shooting video frame by frame, and detector is suitable for detecting the target in shooting video frame by frame；To subsequent each picture frame in shooting video: tracking to obtain target position and the output tracking response of the picture frame using tracker；Judge whether tracking response value is greater than or equal to threshold value, if then continuing the target following of next image frame；If otherwise start-up detector, the target position of detector output correspondence image frame is utilized；After keeping the predetermined frame number of detector continuous service, switches to tracker and continue target following.The present invention discloses corresponding mobile device together.

Description

A kind of method for tracking target and mobile device

Technical field

The present invention relates to technical field of image processing, especially a kind of method for tracking target and mobile device.

Background technique

When carrying out video capture using mobile devices such as mobile phones, photographer, which is often desirable to photographic subjects, can be always maintained at clearly It is clear.For this reason, it may be necessary to guarantee that the camera focus of mobile device is directed at subject goal always in shooting process.In reality, due to Hiding relation between target erratic motion and different objects, it is extremely difficult to the judgement of target position, cause in many feelings Under condition, the subject goal of shooting thickens because out of focus.

In film shooting, the adjusting of focus is manually completed by experienced photographer, this manual adjustment method Obviously demand easy to operate in unsuitable mobile device.The method that some existing mobile devices use Face datection, Neng Goushi Focus is simultaneously navigated to corresponding position by face in other video.However, this method application field is very limited, only for face Etc. certain objects it is effective, and continuity is insufficient in time-domain, and the variation of focus is not smooth enough, can there is jitter phenomenon；It is another Aspect, when there are when multiple identical type objects in video, it is difficult to determine which is only the interested target of user.

Track algorithm provides a kind of feasible program for auto-focusing.But existing tracking is in accuracy and real-time On all need to be improved.For example, being easy to be based on losing target when target is blocked based on the track algorithm of correlation filter The algorithm of re-detection tracking performance when deformation occurs for target is not good enough, and the method based on space constraint item or deep learning is being imitated Meet demand is difficult on rate and portability.

Summary of the invention

For this purpose, the present invention provides target following scheme, to try hard to solve or at least alleviate above existing at least one A problem.

According to an aspect of the invention, there is provided a kind of method for tracking target, this method is in the shifting for having camera function It is executed in dynamic equipment, comprising steps of inputting the target position determined in initial frame according to user, wherein target position is expressed as one The target frame at surrounding target center；Tracker and detector are generated based on the target position training in initial frame, wherein tracker Suitable for tracking to the target in shooting video, detector is suitable for detecting the target in shooting video；Shooting is regarded Subsequent each picture frame in frequency: it tracks to obtain the target position of the picture frame, and output tracking response using tracker；Sentence Whether disconnected tracking response value is greater than or equal to threshold value, if then continuing the target following of next image frame；If otherwise starting detection Device utilizes the target position of detector output correspondence image frame；And it after keeping the predetermined frame number of detector continuous service, switches to Tracker continues target following.

Optionally, in method for tracking target according to the present invention, the target position determined in initial frame is inputted according to user The step of setting includes: the area-of-interest based on user's input, utilizes multiple candidates of RPN network model output current image frame Target frame；Identification is carried out by Fast R-CNN network model and position returns, and exports the confidence level of each candidate target frame；With And after non-maxima suppression, target of the highest candidate target frame of confidence level as target position in characterization initial frame is chosen Frame.

Optionally, in method for tracking target according to the present invention, the target position training based on initial image frame is generated The step of tracker includes: the circular matrix collecting sample using the target frame peripheral region of initial image frame；And using most The initial trace template of the small two optimization method output tracking devices multiplied.

Optionally, in method for tracking target according to the present invention, the target position training based on initial image frame is generated The step of detector includes: the target frame according to initial image frame, exports multiple sample boxes according to the sliding window of predetermined dimension, Generate sample queue.

Optionally, in method for tracking target according to the present invention, the sliding window of predetermined dimension are as follows: at the beginning of sliding window Beginning scale is the 10% of original image, step-size in search scale be adjacent scale the first prearranged multiple or the second prearranged multiple and Value interval is [0.1 times of initial gauges, 10 times of initial gauges].

Optionally, it in method for tracking target according to the present invention, tracks to obtain its target position using tracker, and defeated The step of tracking response value includes: to generate trace template by the target position of a upper picture frame, using tracker out；According to upper The target position of one picture frame generates the region of search of the picture frame；By the neighborhood of each pixel in trace template and region of search Convolution algorithm is carried out, the response of each pixel is obtained；Target's center of the maximum pixel of response as the picture frame is chosen, And maximum response is exported as tracking response value；And it is true by the size of the target's center and the target frame of a upper picture frame The target position of the fixed picture frame.

Optionally, in method for tracking target according to the present invention, which is generated according to the target position of a upper picture frame As frame region of search the step of include: the above picture frame target frame center be search center, with its each ruler of target frame Very little twice is search range, the region of search as the picture frame.

Optionally, in method for tracking target according to the present invention, which is generated according to the target position of a upper picture frame As frame region of search the step of further include: processing is zoomed in and out to the picture frame according to predetermined zoom factor, obtains multiple contractings Picture frame after putting；And the center of the target frame of the above picture frame is search center, with twice of its each size of target frame Region of search for search range, as picture frame after multiple scalings.

Optionally, in method for tracking target according to the present invention, by each pixel in trace template and region of search The step of neighborhood progress convolution algorithm includes: using trace template and each pixel in the region of search of picture frame after multiple scalings Neighborhood carry out convolution algorithm, obtain the response under the different zoom factor.

Optionally, in method for tracking target according to the present invention, pass through the target of the target's center and a upper picture frame The size of frame determines the step of target position of the picture frame further include: with the scaling of the affiliated picture frame of the maximum pixel of response The target frame of a picture frame zooms in and out processing on factor pair, the target frame size as the picture frame；And according to calculating The target frame size of target's center and the picture frame determines the target position of the picture frame.

Optionally, in method for tracking target according to the present invention, the target position of detector output correspondence image frame is utilized The step of setting includes: multiple candidate samples that target in the picture frame is generated according to multiple sample boxes in sample queue；Pass through Three-stage cascade classification is filtered multiple candidate samples, exports the target position of the picture frame.

Optionally, in method for tracking target according to the present invention, further include the steps that update tracker: obtain it is each After the target frame of picture frame, the trace template of the picture frame is calculated according to the content frame；And the tracking to the picture frame Template and the trace template of a upper picture frame are weighted, and obtain updated trace template.

Optionally, in method for tracking target according to the present invention, the weighting coefficient point of the picture frame and a upper picture frame It Wei 0.015 and 0.985.

Optionally, in method for tracking target according to the present invention, further include the steps that updating detector: calculate by detecting The IoU index for multiple candidate samples that device generates；And sample queue is screened according to IoU index.

According to another aspect of the invention, a kind of mobile device is provided, comprising: camera sub-system is suitable for shooting video Image；One or more processors；Memory；One or more programs, wherein one or more programs store in memory And be configured as being executed by one or more of processors, one or more of programs include for executing side as described above The instruction of method either in method.

According to another aspect of the invention, a kind of computer-readable storage medium for storing one or more programs is provided Matter, one or more of programs include instruction, and described instruction is when calculating equipment execution, so that calculating equipment executes institute as above Method either in the method stated.

Target following scheme according to the present invention provides user-friendly friendship compared to existing Atomatic focusing method Mutual mode, user only need simply to put touching on the touchscreen or delineate, and can judge automatically user's area-of-interest, and generate opposite Accurate fine target position, to guarantee the accurate of subsequent tracking.

Further, it is contemplated that the factors such as real-time and accuracy of target following, to every in subsequent shooting video One picture frame tracks target using tracker, and when mistake occurs for target following or is that the target tracked disappears When, start spare detector and target is detected, to ensure that the robustness of long video tracking.

Detailed description of the invention

To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawings Face, these aspects indicate the various modes that can practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical appended drawing reference generally refers to identical Component or element.

Fig. 1 shows the organigram of mobile device 100 according to an embodiment of the invention；

Fig. 2 shows the flow charts of method for tracking target 200 according to an embodiment of the invention；And

Fig. 3, which is shown, according to an embodiment of the invention to be tracked to obtain the process of picture frame target position using tracker Figure.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

Fig. 1 shows the organigram of mobile device 100 according to an embodiment of the invention.Referring to Fig.1, movement is set Standby 100 include: memory interface 102, one or more data processors, image processor and/or central processing unit 104, And peripheral interface 106.Memory interface 102, one or more processors 104 and/or peripheral interface 106 are either discrete Element also can integrate in one or more integrated circuits.In mobile device 100, various elements can by one or A plurality of communication bus or signal wire couple.Sensor, equipment and subsystem may be coupled to peripheral interface 106, to help Realize multiple functions.For example, motion sensor 110, optical sensor 112 and range sensor 114 may be coupled to peripheral interface 106, to facilitate the functions such as orientation, illumination and ranging.Other sensors 116 can equally be connected with peripheral interface 106, such as fixed Position system (such as GPS receiver), angular-rate sensor, temperature sensor, biometric sensor or other sensor devices, by This can help to implement relevant function.

Camera sub-system 120 and optical sensor 122 can be used for the camera of convenient such as record photos and video clips The realization of function, wherein camera sub-system and optical sensor for example can be charge-coupled device (CCD) or complementary metal oxygen Compound semiconductor (CMOS) optical sensor.

It can help to realize communication function by one or more radio communication subsystems 124, wherein wireless communication System may include radio-frequency transmitter and transmitter and/or light (such as infrared) Receiver And Transmitter.Radio communication subsystem 124 particular design and embodiment can depend on one or more communication networks that mobile device 100 is supported.For example, Mobile device 100 may include be designed to support GSM network, GPRS network, EDGE network, Wi-Fi or WiMax network and Bluebooth^TMThe communication subsystem 124 of network.Audio subsystem 126 can be with 130 phase coupling of loudspeaker 128 and microphone It closes, to help the function of implementing to enable voice, such as speech recognition, speech reproduction, digital record and telephony feature.

I/O subsystem 140 may include touch screen controller 142 and/or other one or more input controllers 144. Touch screen controller 142 may be coupled to touch screen 146.For example, the touch screen 146 and touch screen controller 142 can be with The contact carried out therewith and movement or pause are detected using any one of a variety of touch-sensing technologies, wherein sensing skill Art includes but is not limited to capacitive character, resistive, infrared and surface acoustic wave technique.Other one or more input controllers 144 May be coupled to other input/control devicess 148, for example, one or more buttons, rocker switch, thumb wheel, infrared port, The pointer device of USB port, and/or stylus etc.One or more button (not shown)s may include for controlling loudspeaking The up/down button of 130 volume of device 128 and/or microphone.

Memory interface 102 can be coupled with memory 150.The memory 150 may include that high random access is deposited Reservoir and/or nonvolatile memory, such as one or more disk storage equipments, one or more optical storage apparatus, and/ Or flash memories (such as NAND, NOR).Memory 150 can store an operating system 152, for example, Android, IOS or The operating system of Windows Phone etc.The operating system 152 may include for handling basic system services and execution The instruction of task dependent on hardware.Memory 150 can also be stored using 154.These applications in operation, can be from memory 150 are loaded on processor 104, and run on the operating system run via processor 104, and utilize operating system And the interface that bottom hardware provides realizes the various desired functions of user, such as instant messaging, web page browsing, pictures management. Using can be independently of operating system offer, it is also possible to what operating system carried.In some implementations, it applies 154 can be one or more programs.

Implementation according to the present invention, by storing corresponding one or more in the memory 150 of mobile device 100 A program come realize camera sub-system 120 acquire video image when target following function, that is, methods described below 200. It should be noted that the signified mobile device 100 of the present invention can be mobile phone with above-mentioned construction, plate, camera etc..

Fig. 2 shows the flow charts of method for tracking target 200 according to an embodiment of the invention.As shown in Fig. 2, the party Method 200 starts from step S210, and when opening the progress video capture of camera sub-system 120, user can be for example, by the touchscreen The mode input region of interest clicked/delineated either interested target, is optimized by the input to user, is obtained The initially target position in (image) frame, wherein target position is expressed as the target frame at a surrounding target center.

According to one embodiment, the area-of-interest that user inputs is input to the deep learning model of training under line, it is defeated Target position out.Specifically, first with RPN network (Region Proposal Network) model output current image frame Multiple candidate target frames；Identification is carried out by Fast R-CNN network model again and position returns, exports each candidate target frame Confidence level；Finally, choosing the highest candidate target frame of confidence level as characterization initial image frame after non-maxima suppression The target frame of target position.Introduction about Fast R-CNN network model can refer to following paper description: Ren, Shaoqing, et al."Faster R-CNN:Towards real-time object detection with region proposal Networks. " Advances in neural information processing systems.2015, it is no longer superfluous herein It states.

Then in step S220, the target position training based on initial image frame generates tracker and detector.

Wherein, tracker is suitable for tracking the target in shooting video.Optionally, the present embodiment is using discriminate Tracking distinguishes target and ambient enviroment.In the track, what a classifier is trained to need great amount of samples, this is just meaned A large amount of time loss.According to one embodiment of present invention, to the target frame and peripheral region of initial image frame using volume Product matrix (circulant matrix) Lai Shengcheng training sample, that is, the image pattern based on cycle spinning, the benefit done so It is to determine to complete using more efficient frequency domain method to sample set；Then, defeated using the optimization method of least square The initial trace template of tracker out.

Detector is suitable for detecting the target in shooting video.According to an embodiment of the invention, the training of detector The method of sampling based on sliding window, it is multiple according to the sliding window output of predetermined dimension according to the target frame of initial image frame Sample boxes generate sample queue.Optionally, the initial gauges of sliding window take the 10% of original image size, step-size in search ruler Degree is the first prearranged multiple (e.g., 1.2 times) of adjacent scale or the second prearranged multiple (e.g., 0.8 times) and value interval are [just 0.1 times of beginning scale, 10 times of initial gauges], particularly, reject window of the area less than 20 pixels.According to sample boxes and mesh Multiple sample boxes of output are divided into positive and negative two class by the size for marking frame overlapping region, wherein overlap proportion is greater than 50% sampling Frame is stored in positive sample queue, and sample boxes of the overlap proportion less than 20% are stored in negative sample queue.

It can be seen from the above description that detector is computationally intensive in tracker.In view of target following real-time and The factors such as accuracy, to it is subsequent shooting video in each picture frame, target is tracked using tracker, and when target with When track occurs mistake or is the target disappearance of tracking, then start-up detector detects target.

Detailed process is described as follows.

In step S230, track to obtain the target position of the picture frame (for example, the 2nd frame image) using tracker, and Output tracking response.

It such as Fig. 3, shows according to an embodiment of the invention, tracks to obtain picture frame target position using tracker Flow chart.

In step S2302, trace template is generated by the target position of a upper picture frame, using tracker.Namely It says, after tracking obtains the target of each picture frame, the trace template of the picture frame is generated using tracker, to be used for next figure As the tracking of frame.

Then in step S2304, the region of search of the picture frame is generated according to the target position of a upper picture frame.It is optional Ground, the center of the target frame of the above picture frame are search center, with twice of its each size of target frame (that is, wide, high size) Region of search for search range, as the picture frame.For example, the target position of a upper picture frame be expressed as with pixel (200, 500) for target's center, having a size of 100 × 100 target frame, then according to the target position generate the picture frame search Region is exactly centered on pixel (200,500), having a size of 200 × 200 search boxes.

Then in step S2306, the neighborhood of each pixel in trace template and region of search is subjected to convolution algorithm (etc. Valence carries out dot product in trace template and region of search to be transformed on frequency domain), the response of each pixel is obtained, response indicates Each pixel is the probability of final goal central point.

Then in step S2308, target's center of the maximum pixel of response as the picture frame is chosen, and export most Big response is as tracking response value.

Then in step S2310, which is determined by the size of the target's center and the target frame of a upper picture frame The target position of frame.That is, the target position of the picture frame indicates are as follows: using tracking response value corresponding pixel points as in target The heart, the above picture frame target frame having a size of the target frame of size.

In the specific implementation process, due to the variation of shooting focal length etc. or the movement of target object, it may result in mesh It marks object and dimensional variation occurs, therefore, embodiment according to the present invention carries out above-mentioned steps using several different scales respectively S2302 to S2310.

That is, processing is zoomed in and out to this picture frame according to predetermined zoom factor before executing step S2304, Picture frame after obtaining multiple scalings, then according still further to step S2304, the center of the target frame of the above picture frame is in search The heart, twice with its each size of target frame is search range, the region of search as picture frame after multiple scalings.According to the present invention Embodiment, predetermined zoom factor includes one or more of following array: 0.82,0.88,0.94,1.06,1.12, 1.2}。

In step S2306, the neighbour of each pixel in the region of search of picture frame after trace template and multiple scalings is used Domain carries out convolution algorithm, obtains the response under the different zoom factor.

In subsequent step S2308 and S2310, with the zoom factor of the affiliated picture frame of the maximum pixel of response to upper one The target frame of picture frame zooms in and out processing, the target frame size as the picture frame；According to the target's center of calculating and the figure As the target frame size of frame determines the target position of the picture frame.

Then in step S240, judge whether tracking response value is greater than or equal to threshold value, optionally, threshold value is set as 0.27.If tracking response value >=0.27, return step S230 continues the target following of next image frame.

If tracking response value is less than threshold value, then it is assumed that tracking result inaccuracy executes step S250, start-up detector, benefit With the target position of detector output correspondence image frame.According to an embodiment of the invention, when the target position of tracking is got too close to When image border, it is believed that target may disappear, at this time also start-up detector, execute step S250.

Specifically, the detector generated according to step S220 training, being generated according to multiple sample boxes in sample queue should Multiple candidate samples of target in picture frame, since candidate samples quantity is larger, it is lower to directly adopt arest neighbors matching efficiency, because This uses the method for three-stage cascade classification, is filtered to multiple candidate samples, exports the target position of the picture frame.According to one Kind embodiment, the first order filter candidate samples by Variance Constraints, and the second level is classified by random fern further filters candidate Sample, the final third level carry out arest neighbors matching, and the candidate samples of highest scoring are considered as the output of detector.

Target disappears or will not reappear immediately after being blocked in many cases, and the detection of short-time duty can not Correct discovery target.Therefore in step S260, after keeping the predetermined frame number of detector continuous service, tracker is switched to, is continued Target following is carried out using tracker, that is, return step S230 continues to execute target following process.Optionally, predetermined frame number is set For 50 frames.

Embodiment according to the present invention, after judging to target position in each picture frame, system is according to the frame Content is updated tracker and detector.

Specifically, the method for tracker is updated are as follows: after obtaining the target frame of each picture frame, calculate according to the content frame Obtain the trace template of the picture frame；Fortune is weighted to the trace template of the trace template of the picture frame and a upper picture frame again It calculates (that is, linear superposition), obtains updated trace template.Optionally, the weighting coefficient of the picture frame and a upper picture frame point It Wei 0.015 and 0.985.

Equally, update the method for detector are as follows: calculate by detector maturation multiple candidate samples IoU index, according to IoU index screens sample queue.In other words, target is judged as by detector in a new frame according to IoU index The biggish sample of probability is classified, if IoU index is greater than 0.65, then it is assumed that the sample is the sample high with tracking result registration This, is classified to positive sample queue, if IoU index is less than 0.2, then it is assumed that the sample is the sample low with tracking result registration This, is added into negative sample queue.It is random to forget part sample to avoid sample queue too long, maintain population sample quantity Stablize.

To sum up, target following scheme according to the present invention, compared to existing Atomatic focusing method, first there is provided with The interactive mode of family close friend, user only need simply to put touching on the touchscreen or delineate, and can judge automatically user's area-of-interest, And relatively accurate fine target position is generated, to guarantee the accurate of subsequent tracking；It is followed secondly, the tracker of this programme uses The image pattern of ring translation, situations such as obscuring target deformation, motion blur, background, better discriminates between ability, and tracks and calculate Method has real-time speed, can rapidly and accurately judge the position of target object and corresponding scale in picture frame；Finally, working as target When the case where temporary extinction occur or being blocked, this programme provides spare detector, is conceived to the long-term of target appearance Memory, breaks through constraint spatially, its position can be judged again after target reappears, to ensure that long video tracks Robustness.

Various technologies described herein are realized together in combination with hardware or software or their combination.To the present invention Method and apparatus or the process and apparatus of the present invention some aspects or part can take insertion tangible media, such as it is soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to practice this hair Bright equipment.

In the case where program code executes on programmable computers, mobile device generally comprises processor, processor Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely A few output device (as shown in Figure 1).Wherein, memory is configured for storage program code；Processor is configured for According to the instruction in the said program code stored in the memory, method for tracking target of the invention is executed.

By way of example and not limitation, computer-readable medium includes computer storage media and communication media.It calculates Machine readable medium includes computer storage media and communication media.Computer storage medium storage such as computer-readable instruction, The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc. Data-signal processed passes to embody computer readable instructions, data structure, program module or other data including any information Pass medium.Above any combination is also included within the scope of computer-readable medium.

It should be appreciated that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, it is right above In the description of exemplary embodiment of the present invention, each feature of the invention be grouped together into sometimes single embodiment, figure or In person's descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. claimed hair Bright requirement is than feature more features expressly recited in each claim.More precisely, as the following claims As book reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific real Thus the claims for applying mode are expressly incorporated in the specific embodiment, wherein each claim itself is used as this hair Bright separate embodiments.

Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groups Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined into a module or furthermore be segmented into multiple Submodule.

Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.

The present invention discloses together:

A9, the method as described in A8, wherein described to roll up trace template and the neighborhood of pixel each in region of search The step of product operation includes: using the trace template and each pixel in the region of search of picture frame after the multiple scaling Neighborhood carries out convolution algorithm, obtains the response under the different zoom factor.

A10, the method as described in A9, wherein determined by the size of the target's center and the target frame of a upper picture frame The step of target position of the picture frame further include: with the zoom factor of the affiliated picture frame of the maximum pixel of response to a upper figure As the target frame of frame zooms in and out processing, the target frame size as the picture frame；And according to the target's center of calculating and this The target frame size of picture frame determines the target position of the picture frame.

A11, the method as described in any one of A4-10, wherein utilize the target position of detector output correspondence image frame The step of include: multiple candidate samples that target in the picture frame is generated according to multiple sample boxes in sample queue；Pass through three Grade cascade sort is filtered the multiple candidate samples, exports the target position of the picture frame.

A12, the method as described in any one of A1-11 further include the steps that updating tracker: obtaining each picture frame Target frame after, the trace template of the picture frame is calculated according to the content frame；And to the trace template of the picture frame with The trace template of a upper picture frame is weighted, and obtains updated trace template.

A13, the method as described in A12, wherein the weighting coefficient of the picture frame and a upper picture frame is respectively 0.015 He 0.985。

A14, the method as described in any one of A1-13 further include the steps that updating detector: calculate by detector maturation Multiple candidate samples IoU index；And the sample queue is screened according to the IoU index.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.

In addition, be described as herein can be by the processor of computer system or by executing by some in the embodiment The combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or method The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, Installation practice Element described in this is the example of following device: the device be used for implement as in order to implement the purpose of the invention element performed by Function.

As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc. Description plain objects, which are merely representative of, is related to the different instances of similar object, and is not intended to imply that the object being described in this way must Must have the time it is upper, spatially, sequence aspect or given sequence in any other manner.

Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited from It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that Language used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limit Determine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, for this Many modifications and changes are obvious for the those of ordinary skill of technical field.For the scope of the present invention, to this Invent done disclosure be it is illustrative and not restrictive, it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims

1. a kind of method for tracking target, the method execute in the mobile device for having camera function, comprising steps of

The target position determined in initial frame is inputted according to user, wherein the target position is expressed as a surrounding target center Target frame；

Tracker and detector are generated based on the target position training in initial frame, wherein the tracker is suitable for shooting video In target tracked, the detector be suitable for shooting video in target detect；

To subsequent each picture frame in shooting video:

It tracks to obtain the target position of the picture frame, and output tracking response using the tracker, comprising:

Trace template is generated by the target position of a upper picture frame, using tracker；

The region of search of the picture frame is generated according to the target position of a upper picture frame；

The neighborhood of each pixel in the trace template and region of search is subjected to convolution algorithm, obtains the response of each pixel Value；

Target's center of the maximum pixel of response as the picture frame is chosen, and exports maximum response as tracking response Value；And

The target position of the picture frame is determined by the size of the target's center and the target frame of a upper picture frame；

Judge whether the tracking response value is greater than or equal to threshold value, if then continuing the target following of next image frame；

If otherwise starting the detector, the target position of detector output correspondence image frame is utilized；And

After keeping the predetermined frame number of detector continuous service, switches to the tracker and continue target following.

2. the method for claim 1, wherein described input the step of determining the target position in initial frame according to user Include:

Based on the area-of-interest of user's input, multiple candidate target frames of RPN network model output current image frame are utilized；

Identification is carried out by Fast R-CNN network model and position returns, and exports the confidence level of each candidate target frame；And

After non-maxima suppression, the highest candidate target frame of confidence level is chosen as characterization initial image frame target position Target frame.

3. method according to claim 2, wherein the step of target position training based on initial image frame generates tracker Include:

Use the circular matrix collecting sample of the target frame peripheral region of initial image frame；And

Using the initial trace template of the optimization method output tracking device of least square.

4. method as claimed in claim 3, wherein the step of generating detector based on the target position training in initial frame is wrapped It includes:

According to the target frame of the initial image frame, multiple sample boxes are exported according to the sliding window of predetermined dimension, generate sample Queue.

5. method as claimed in claim 4, wherein the sliding window of the predetermined dimension are as follows:

The initial gauges of sliding window are the 10% of original image, step-size in search scale be adjacent scale the first prearranged multiple or Second prearranged multiple and value interval are [0.1 times of initial gauges, 10 times of initial gauges].

6. method as claimed in claim 5, wherein the target position according to a upper picture frame generates searching for the picture frame The step of rope region includes:

The center of the target frame of the above picture frame is search center, and twice with its each size of target frame is search range, is made For the region of search of the picture frame.

7. method as claimed in claim 5, wherein the target position according to a upper picture frame generates searching for the picture frame The step of rope region further include:

Processing is zoomed in and out to the picture frame according to predetermined zoom factor, the picture frame after obtaining multiple scalings；And

The center of the target frame of the above picture frame is search center, and twice with its each size of target frame is search range, is made For the region of search of picture frame after multiple scalings.

8. the method for claim 7, wherein by the neighborhood of each pixel in the trace template and region of search into The step of row convolution algorithm includes:

Convolution is carried out using the neighborhood of each pixel in the region of search of picture frame after the trace template and the multiple scaling Operation obtains the response under the different zoom factor.

9. method according to claim 8, wherein determined by the size of the target's center and the target frame of a upper picture frame The step of target position of the picture frame further include:

Processing is zoomed in and out to the target frame of a upper picture frame with the zoom factor of the affiliated picture frame of the maximum pixel of response, is made For the target frame size of the picture frame；And

The target position of the picture frame is determined according to the target frame size of the target's center of calculating and the picture frame.

10. method as claimed in claim 9, wherein the step of exporting the target position of correspondence image frame using detector is wrapped It includes:

Multiple candidate samples of target in the picture frame are generated according to multiple sample boxes in sample queue；

The multiple candidate samples are filtered by three-stage cascade classification, export the target position of the picture frame.

11. further including the steps that updating tracker such as method of any of claims 1-10:

After obtaining the target frame of each picture frame, the trace template of the picture frame is calculated according to the content frame；And

The trace template of the trace template and a upper picture frame of the picture frame is weighted, updated tracking mould is obtained Plate.

12. method as claimed in claim 11, wherein the weighting coefficient of the picture frame and a upper picture frame is respectively 0.015 With 0.985.

13. further including the steps that updating detector such as method of any of claims 1-10:

It calculates by the IoU index of multiple candidate samples of detector maturation；And

The sample queue is screened according to the IoU index.

14. a kind of mobile device, comprising:

Camera sub-system is suitable for shooting video image；

One or more processors；

Memory；

One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one A or multiple processors execute, and one or more of programs include for executing in -13 the methods according to claim 1 The instruction of either method.

15. a kind of computer readable storage medium for storing one or more programs, one or more of programs include instruction, Described instruction is when calculating equipment execution, so that the calculating equipment is executed according in the method as described in claim 1-13 Either method.