CN110287874A

CN110287874A - Target tracking method and device, electronic equipment and storage medium

Info

Publication number: CN110287874A
Application number: CN201910555741.9A
Authority: CN
Inventors: 战赓; 庄博涵; 孙书洋; 欧阳万里
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2019-09-27
Anticipated expiration: 2039-06-25
Also published as: CN113538517B; CN110287874B; CN113538519A; CN113538517A

Abstract

This disclosure relates to a kind of target tracking method and device, electronic equipment and storage medium, the described method includes: obtaining the first position in the previous frame image of the current frame image where target object for any current frame image after initial frame image in video flowing；Predicted characteristics based on target object in the first position and current frame image, obtain the location information of target object described in the current frame image, wherein, the predicted characteristics of target object are obtained based on the previous frame image of the initial frame image of the video flowing and the present frame in the current frame image.The embodiment of the present disclosure can accurately realize target tracking.

Description

Target tracking method and device, electronic equipment and storage medium

Technical field

This disclosure relates to technical field of computer vision more particularly to a kind of target tracking method and device, electronic equipment And storage medium.

Background technique

Video object tracking is the critical issue that one in computer vision has been explored many decades.Video object tracking In many computer vision subdomains, as having important answer in the tracking of video posture, Video Image Segmentation, video object detection With.

In recent years, the tracing algorithm based on deep learning achieves certain achievement, but existing method is difficult quickly The appearance of object acute variation in adaptive video, therefore its effect is affected.

Summary of the invention

The present disclosure proposes a kind of technical solutions of target tracking.

According to the disclosure in a first aspect, providing a kind of target tracking method comprising:

For any current frame image after initial frame image in video flowing, the former frame of the current frame image is obtained First position in image where target object；

Based on the predicted characteristics of target object in the first position and current frame image, obtain in the current frame image The location information of the target object, wherein the predicted characteristics of target object are based on the video flowing in the current frame image Initial frame image and the previous frame image of the present frame obtain.

In some possible embodiments, the predicted characteristics of current frame image are obtained, comprising:

The corresponding fisrt feature in the first position of target object described in previous frame image based on the current frame image, And in the initial frame image target object the corresponding second feature in the second position, obtain in the current frame image The predicted characteristics of target object.

In some possible embodiments, any present frame figure for after initial frame image in video flowing Picture, before obtaining the first position in the previous frame image of the current frame image where target object, the method also includes:

Obtain the second position and the second position corresponding second in the initial frame image where target object Feature.

In some possible embodiments, the second obtained in the initial frame image where target object It sets, including at least one of following manner:

The position mask figure for being directed to the target object in the initial frame image is obtained, institute is determined based on the mask figure State the second position of target object；

The frame selection operation for being directed to the initial frame image is received, institute is determined based on the corresponding band of position of the frame selection operation State the second position of target object；

The initial frame image performance objective is detected and is operated, institute is determined based on the testing result of target detection operation State the second position of target object.

In some possible embodiments, target pair described in the previous frame image based on the current frame image The second position of the target object corresponding in the corresponding fisrt feature in the first position of elephant and the initial frame image Two features obtain the predicted characteristics of target object in the current frame image, comprising:

Process of convolution is executed to the fisrt feature and second feature respectively, obtains the First Transition feature of fisrt feature, And obtain the second Interim of second feature；

First cross-correlation coded treatment and figure process of convolution are executed to the First Transition feature and the second Interim, Obtain third feature；

Fusion Features processing based on the third feature, First Transition feature and second feature, it is special to obtain the prediction Sign.

In some possible embodiments, described that first is executed mutually to the First Transition feature and the second Interim Correlative coding processing and figure process of convolution, obtain third feature, comprising:

First cross-correlation coded treatment is executed to the First Transition feature and the second Interim, obtains the first coding spy Sign；

First coding characteristic is input to figure neural network execution figure process of convolution, obtains the third feature.

In some possible embodiments, the first cross-correlation is executed to the First Transition feature and the second Interim Coded treatment obtains the first coding characteristic, comprising:

Matrix multiple operation is executed to the First Transition feature and the second Interim, it is special to obtain first coding Sign.

In some possible embodiments, the feature based on the third feature, First Transition feature and second feature Fusion treatment obtains the predicted characteristics, comprising:

The cross-correlation decoding process that the third feature is executed based on the First Transition feature, obtains fourth feature；

Adduction processing is executed to the fourth feature and the second feature, obtains the predicted characteristics.

In some possible embodiments, described based on target object in the first position and the current frame image Predicted characteristics, obtain the location information of target object described in the current frame image, comprising:

Based on the first position, the region of search that the target object is directed in the current frame image is determined, and Fifth feature corresponding with described search region；

Using the predicted characteristics as convolution kernel, the second cross-correlation coded treatment of the fifth feature is executed, is obtained Second coding characteristic；

The object detection process that the target object is executed based on second coding characteristic, obtains the current frame image Described in target object location information.

In some possible embodiments, it is based on the first position, is determined in any frame image for described The region of search of target object, comprising:

Centered on first position, presupposition multiple is amplified to the first position, obtains being directed in the current frame image The region of search of the target object.

In some possible embodiments, using the predicted characteristics as convolution kernel, the of the fifth feature is executed Two cross-correlation coded treatments, comprising:

Using the predicted characteristics as convolution kernel, process of convolution is executed to the fifth feature, obtains second coding Feature.

In some possible embodiments, the target that the target object is executed based on second coding characteristic Detection processing obtains the location information of target object described in the current frame image, comprising:

Second coding characteristic is input to target detection network, is obtained in described search region for the target pair The location information of elephant.

In some possible embodiments, the target tracking method is applied in twin neural network, described twin Neural network updates network and target detection network including being used for the first branching networks, the second branching networks and feature, wherein First branching networks and the second branching networks are identical；

First branching networks are for detecting in the initial frame image second position of target object and described the The corresponding second feature in two positions；

Second branching networks are used to detect the former frame figure of any current frame image after the initial frame image The corresponding fisrt feature in first position and the first position of target object as in；

The feature updates network and is predicted for the previous frame image based on initial frame image and current frame image Feature；

The target detection network is used for the predicted characteristics based on the first position and current frame image, obtains described work as The location information of target object described in prior image frame.

In some possible embodiments, the method also includes:

The location information of the target object is highlighted in the picture frame of the video flowing.

According to the second aspect of the disclosure, a kind of target follow up mechanism is provided comprising:

Detection module is used to be directed to any current frame image in video flowing after initial frame image, works as described in acquisition First position in the previous frame image of prior image frame where target object；

Tracing module is used to obtain based on the predicted characteristics of target object in the first position and current frame image The location information of target object described in the current frame image, wherein the prediction of target object is special in the current frame image The previous frame image of initial frame image and the present frame of the sign based on the video flowing obtains.

In some possible embodiments, the tracing module includes:

Predicting unit is used to obtain the predicted characteristics of current frame image, the former frame based on the current frame image The target object in the corresponding fisrt feature in the first position of target object described in image and the initial frame image The corresponding second feature in the second position, obtains the predicted characteristics of target object in the current frame image.

In some possible embodiments, the detection module is also used to obtain target object in the initial frame image The corresponding second feature in the second position at place and the second position.

In some possible embodiments, the detection module obtains in the initial frame image where target object At least one of second position, including following manner:

In some possible embodiments, the predicting unit is also used to respectively to the fisrt feature and described second Feature executes process of convolution, obtains the First Transition feature of fisrt feature, and obtain the second Interim of second feature；

In some possible embodiments, the predicting unit is also used to the First Transition feature and the second transition Feature executes the first cross-correlation coded treatment, obtains the first coding characteristic；

In some possible embodiments, the predicting unit is also used to the First Transition feature and the second transition Feature executes matrix multiple operation, obtains first coding characteristic.

In some possible embodiments, described in the predicting unit is also used to execute based on the First Transition feature The cross-correlation decoding process of third feature, obtains fourth feature；

In some possible embodiments, the tracing module further includes tracing unit, is used for based on described first Position determines the region of search that the target object is directed in the current frame image, and corresponding with described search region Fifth feature；

In some possible embodiments, the tracing unit is also used to centered on first position, to described first Presupposition multiple is amplified in position, obtains the region of search that the target object is directed in the current frame image.

In some possible embodiments, the tracing unit is also used to using the predicted characteristics as convolution kernel, right The fifth feature executes process of convolution, obtains second coding characteristic.

In some possible embodiments, the tracing unit is also used to second coding characteristic being input to target Network is detected, the location information for being directed to the target object in described search region is obtained.

In some possible embodiments, the target follow up mechanism includes twin neural network, the detection module The first branching networks, the second branching networks including the twin neural network, the tracing module include the twin nerve The feature of network updates network and target detection network, first branching networks and the second branching networks are identical；

In some possible embodiments, display module is used in the picture frame of the video flowing highlight The location information of the target object.

According to the third aspect of the disclosure, a kind of electronic equipment is provided comprising:

Processor；

Memory for storage processor executable instruction；

Wherein, any in first aspect to execute the processor is configured to calling the instruction of the memory storage Method described in one.

According to the fourth aspect of the disclosure, a kind of computer readable storage medium is provided, is stored thereon with computer journey Method described in any one of first aspect is realized in sequence instruction when the computer program instructions are executed by processor.

It in the embodiments of the present disclosure, can be according to the location information of target object in initial frame image, after sequentially obtaining The position of target object in continuous image, wherein can according to the previous frame image and initial frame image of any current frame image, It obtains the predicted characteristics of target object in current frame image, and according to the first position in previous frame image and can obtain Predicted characteristics determine position of the target object in current frame image, wherein can be by way of effective propagated forward, essence True tracks target object, while can rapidly adapt to the appearance of object acute variation.

It should be understood that above general description and following detailed description is only exemplary and explanatory, rather than Limit the disclosure.

According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and those figures show meet this public affairs The embodiment opened, and together with specification it is used to illustrate the technical solution of the disclosure.

Fig. 1 shows a kind of flow chart of target tracking method according to the embodiment of the present disclosure；

Fig. 2 shows the predictions that target object is obtained in a kind of target tracking method according to embodiment of the present disclosure step S20 Feature flow chart；

Fig. 3 shows the flow chart of step S32 in a kind of target tracking method according to the embodiment of the present disclosure；

Fig. 4, which is shown, obtains the structural schematic diagram of predicted characteristics according to the embodiment of the present disclosure；

Fig. 5 shows the flow chart of step S20 in a kind of target tracking method according to the embodiment of the present disclosure；

Fig. 6 shows the process schematic that target tracking is realized according to the embodiment of the present disclosure；

Fig. 7 shows a kind of block diagram of target follow up mechanism according to the embodiment of the present disclosure；

Fig. 8 shows the block diagram of a kind of electronic equipment according to the embodiment of the present disclosure；

Fig. 9 shows another block diagram of a kind of electronic equipment according to the embodiment of the present disclosure.

Specific embodiment

Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.

Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.

The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, herein Middle term "at least one" indicate a variety of in any one or more at least two any combination, it may for example comprise A, B, at least one of C can indicate to include any one or more elements selected from the set that A, B and C are constituted.

In addition, giving numerous details in specific embodiment below in order to which the disclosure is better described. It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.

The embodiment of the present disclosure provides a kind of target tracking method, can be used for tracking the target in continuous picture frame Object.The method of the embodiment of the present disclosure can be applied in arbitrary image processing apparatus, for example, image processing method can be by Terminal device or server or other processing equipments execute, wherein terminal device can be user equipment (User Equipment, UE), mobile device, user terminal, terminal, cellular phone, wireless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, calculate equipment, mobile unit, wearable device etc..In some possible realities In existing mode, the image processing method can in such a way that processor calls the computer-readable instruction stored in memory come It realizes.

Fig. 1 shows a kind of flow chart of target tracking method according to the embodiment of the present disclosure, as shown in Figure 1, the target Method for tracing includes:

S10: for any current frame image after initial frame image in video flowing, before obtaining the current frame image First position in one frame image where target object.

The embodiment of the present disclosure can be used for being tracked the target object in video flowing, which can be any The object of type, such as specific personage, animal or other any objects occurred in the picture, the disclosure is to target object Type be not especially limited, can according to specific application purpose determine.

In some possible embodiments, the embodiment of the present disclosure for selecting frame to operate can be executed to video flowing to execute The multiple image of target object tracking, or all images in video flowing can also be chased after directly as pending target object The multiple image of track.Wherein each frame image can sort according to the sequence of time frame.

In some possible embodiments, it can use the position prediction next frame figure of target object in previous frame image The position of the target object of picture, therefore executing for the mesh in any current frame image after initial frame image in video flowing Mark object detection when, the testing result of target object in the previous frame image of the current frame image can be obtained first, i.e., before First position in one frame image where target object, then in the position for further predicting current frame image according to a position It sets.

Wherein, the position of target object in the initial frame image of video flowing can be first obtained first, in the initial frame image The position of target object can be obtained by target detection, or user's input, it is not limited here, then root According to the position of target object in position prediction the second frame image of target object in the initial frame image, and so on, obtain it The position of target object in remaining frame image.

S20: the predicted characteristics based on target object in the first position and current frame image obtain the present frame figure The location information of the target object as described in, wherein the predicted characteristics of the current frame image are initial based on the video flowing The previous frame image of frame image and the present frame obtains.

In some possible embodiments, can based on target object in the previous frame image of current frame image first Position determines region of search corresponding with first position in current frame image, and is determined in region of search according to predicted characteristics With the matched band of position of predicted characteristics, as position of the target object in current frame image.

Based on above-mentioned configuration, the embodiment of the present disclosure can according to the position of the target object of the previous frame image of present frame, And in obtained current frame image target object predicted characteristics, predict the position of the target object of current frame image, In the target object of present frame can be quickly tracked by way of propagated forward.

The embodiment of the present disclosure is described in detail with reference to the accompanying drawing.

The embodiment of the present disclosure can obtain the position (second position) and of the target object in initial frame image first The corresponding second feature in two positions.In some possible embodiments, which can be expressed as the position of target object Two coordinates to angular vertex of the corresponding rectangle frame in region are set, or the coordinate on a vertex, Yi Jichang can also be expressed as Degree and width information.Target object corresponding band of position in initial frame image can be determined by above- mentioned information, at it The second position can also be expressed as believing in other forms or the embodiment of the present disclosure about the position of target object in his embodiment The representation of breath all can be aforesaid way, or can also indicate by other means, the disclosure does not limit this specifically It is fixed.

Wherein, it may include following for obtaining the mode of the second position in the initial frame image of video flowing where target object It is at least one:

A) the position mask figure for being directed to the target object in the initial frame image is obtained, is determined based on the mask figure The second position of the target object；

In some possible embodiments, mask figure can be expressed as rectangular corresponding with the dimension of initial frame image Formula, the pixel in each mask value and initial frame image in mask figure correspond, and mask value can be expressed as first yard Value or the second code value, wherein the first code value indicate target object where region, such as the first code value can be " 1 ", second Code value can be " 0 ", and the set of pixel corresponding to the first code value " 1 " is the second position where target object at this time The band of position.It in addition, the mask figure can be the information of user's input, or may be to be grasped by target object detection processing Make obtained mask figure.

B) the frame selection operation for being directed to the initial frame image is received, is determined based on the corresponding band of position of the frame selection operation The second position of the target object；

In some possible embodiments, the frame selection operation to initial frame image can be received by input module, Middle input module may include mouse, Trackpad, keyboard etc. other be capable of the device of sink block selection operation, the wherein frame selection operation For selected in initial frame image target object region operation, the available selection region of center selection operation, should The corresponding position of selection region is the second position.

Wherein, the selection region that frame selection operation obtains can be the rectangular of rule, and the second position can be determined as this at this time The selection region that the position of the corresponding selection region of frame selection operation or frame selection operation obtain can also be irregular figure row, this When can determine the smallest square region including the irregular image based on the irregular figure, the second position can be to be really It is set to the position of square region.

C) the initial frame image performance objective is detected and is operated, determined based on the testing result of target detection operation The second position of the target object.

In some possible embodiments, initial frame image can be input to a detection for being able to carry out target object Neural network in, can such as be input to Mask-RCNN (convolutional neural networks of the target identification based on mask), obtain target The mask figure of position where object, so that it is determined that the second position.

In the case where being directed to the second position of target object in obtaining initial frame image, it can obtain the second position Corresponding second feature.The corresponding image-region in the second position can be wherein intercepted from initial frame image, to the image-region Feature extraction processing is executed, second feature is obtained, or feature extraction processing can also be executed to initial frame image, obtains initial The characteristics of image of frame image is then obtained based on the second position corresponding with the second position in the characteristics of image of initial frame image Second feature.Wherein, the fisrt feature of the embodiment of the present disclosure, second feature and subsequent First Transition feature, the second transition Feature, third feature, fourth feature and fifth feature, what is respectively indicated is the characteristics of image of target object, by detection Feature is stated, and the processing such as fusion optimization, the available higher characteristic information of accuracy, thus more accurate are carried out to feature Detect position of the target object in each frame image.

Further, the second position of target object and the case where corresponding second feature in obtaining initial frame image Under, it can the position of the target object in remaining frame image is successively obtained according to the sequence of picture frame.Wherein it is possible to according to first The previous frame image of beginning frame image and current frame image obtains the predicted characteristics of target object in current frame image.

Fig. 2 shows the predicted characteristics processes that target object is obtained in a kind of target tracking method according to the embodiment of the present disclosure Figure.As shown in Fig. 2, the predicted characteristics for obtaining current frame image, comprising:

S31: before obtaining in the initial frame image the corresponding second feature in the second position and any frame image The corresponding fisrt feature in first position described in one frame image；

In some possible embodiments, to any current frame image after initial frame image described in video flowing It, can be according to the testing result (i.e. the second position) of target object in initial frame and should when carrying out the feature prediction of target object The testing result (i.e. first position) of target object is to mesh in any frame image in previous frame image before current frame image The feature of mark object is predicted.

For example, can be to the corresponding image-region in first position in the second position and previous frame image in initial frame image Feature extraction is carried out respectively, obtains corresponding characteristic information, i.e. second feature and fisrt feature.Or it can also be to initial frame figure Picture and previous frame image execute feature extraction processing respectively, then obtain from the characteristics of image of initial frame image and the second position The corresponding second position, and fisrt feature corresponding with first position is obtained from the characteristics of image of previous frame image.Pass through Obtained fisrt feature and second feature obtains the predicted characteristics of current frame image.

Wherein feature extraction can be executed by residual error network, respectively obtain fisrt feature and second feature.In other realities Feature extraction processing can also be executed by other feature extraction networks by applying in example.

It wherein, is the second frame image, institute in the previous frame image of the current frame image in response to the current frame image The first position for stating target object is the second position of target object in the initial frame image.It is corresponding, the of the second position Two features i.e. the fisrt feature of first position.That is, for the second frame image in video flowing, i.e. initial frame image Next frame image, previous frame image is initial frame image, and the first position of the target object of previous frame image is as initial The second position of target object in frame, the corresponding fisrt feature in first position are the corresponding second feature in the second position.It can be with The predicted characteristics of the target object in the second frame image are determined according to the second position of target object in initial frame image.For N-th frame image after two frame images, then can be according to corresponding second spy in the second position of target object in initial frame image The corresponding fisrt feature in first position of target object obtains predicted characteristics in sign and the (n-1)th frame image.N is whole greater than 2 Number indicates the frame number of present frame.

S32: the fisrt feature of target object described in the previous frame image based on the current frame image, Yi Jisuo The corresponding second feature in the second position for stating the target object in initial frame image, obtains target pair in the current frame image The predicted characteristics of elephant.

It in some possible embodiments, can second feature based on target object in initial frame image and current The fisrt feature of target object in the previous frame image of frame image predicts the predicted characteristics of the target object in current frame image. Such as the predicted characteristics can be obtained for modes such as cross correlation process, the process of convolution of fisrt feature and second feature.

Fig. 3 shows the flow chart of step S32 in a kind of target tracking method according to the embodiment of the present disclosure.The wherein base In the fisrt feature of the target object described in the previous frame image of the current frame image and the initial frame image The corresponding second feature in the second position of the target object, obtains the predicted characteristics of target object in the current frame image, Include:

S321: process of convolution is executed to the fisrt feature and second feature respectively, respectively corresponds to obtain fisrt feature First Transition feature, and obtain the second Interim of second feature；

In some possible embodiments, process of convolution can be executed to fisrt feature and second feature respectively and obtains the The corresponding First Transition feature of one feature and corresponding second Interim of second feature.It wherein can by the process of convolution So that the characteristic information about target object for including in First Transition feature is more accurate relative to fisrt feature, and make The characteristic information about target object for including in second Interim is also more accurate relative to second feature.Wherein, to first The convolution kernel that feature and second feature execute process of convolution can be the same or different, and such as can be the convolution kernel of 1*1, or Person may be the convolution kernel of other forms.

Fig. 4, which is shown, obtains the structural schematic diagram of predicted characteristics according to the embodiment of the present disclosure.Wherein, target in initial frame image The second feature of the corresponding second position of object can be expressed as F₀, target object is corresponding in the previous frame image of any frame image The fisrt feature of first position can be expressed as F_t-1, the corresponding frame number of t expression picture frame, t is positive integer.

Wherein, fisrt feature is identical with the dimension of second feature in the embodiment of the present disclosure, is illustrated as C × W × H, Wherein C indicates that port number, W indicate that the width of feature, H indicate the height of feature.Wherein fisrt feature and second feature are respectively Matrix form.First Transition feature that is corresponding, being obtained by process of convolutionDimension can be C₁× W × H, Yi Ji Two InterimsDimension can be C₂× W × H, wherein C₁And C₂It is respectively used to indicate the channel of corresponding Interim Number, the two can be identical numerical value, or different numerical value, W and H can respectively indicate the width and height of Interim Degree.

S322: the first cross-correlation coded treatment and picture scroll product are executed to the First Transition feature and the second Interim Processing, obtains third feature；

It, can be to First Transition feature and the second transition in the case where obtaining First Transition feature and the second Interim Feature executes the first cross-correlation coded treatment (cross correlation) and figure process of convolution (conv1d and conv2d), To merge the characteristic information of the two, the third feature of the characteristic information of the two has been merged, wherein the dimension of third feature can To be expressed as C₂×C₁。

In some possible embodiments, the first cross-correlation coded treatment can be expressed as the operation of matrix multiple, i.e., The operation that matrix multiple can be executed by First Transition feature and the second Interim, executes the first cross-correlation coded treatment, Corresponding third Interim E is obtained, the dimension of the third Interim is C₂×C₁.Then third Interim is input to Figure neural network executes figure process of convolution, obtains third feature.Wherein, the dimension of third feature is also C₂×C₁.The disclosure is implemented The figure neural network of example can execute process of convolution (conv1d and conv2d) twice to third Interim, obtain third feature E^ref, the process of convolution of other numbers can also be executed in other embodiments, and the disclosure is not especially limited this.

S323: the Fusion Features processing based on the third feature, First Transition feature and second feature obtains described pre- Survey feature.

In some possible embodiments, third feature and First Transition feature can be executed at Fusion Features first Reason, it can cross-correlation decoding process is executed to third feature by First Transition feature, decoding feature is obtained, then to decoding Feature executes process of convolution, obtains fourth feature M', which has merged the spy in First Transition feature and third feature Reference breath.The dimension of fourth feature and the dimension of second feature are identical.Wherein, third feature is executed by the second Interim Cross-correlation decoding process can obtain the decoding feature to execute process of convolution to the second Interim and third feature.

Adduction processing then is executed to fourth feature and second feature again, if the characteristic value of corresponding element is added, thus To predicted characteristics F^final, which further merges the characteristic information of second feature.The predicted characteristics can be used for simultaneously Characterize characteristic information of the target object in current frame image.

Wherein, the dimension of fourth feature can be identical as the dimension of second feature, as C × W × H.Or in some realities It applies in mode, Fusion Features processing is being executed to third feature and First Transition feature, i.e., it is special to third feature and First Transition Sign, which executes the feature that process of convolution obtains, to be intermediate features, and the dimension of the intermediate features can be C₂× W × H, further The available fourth feature of process of convolution is executed to intermediate features, i.e. dimension is C × W × H feature.

Then, adduction processing is executed to fourth feature and second feature, obtains predicted characteristics.The dimension of predicted characteristics is also C×W×H。

In the case where obtaining the predicted characteristics of present frame, it can detected in current frame image according to the predicted characteristics The position of target object.

Fig. 5 shows the flow chart of step S20 in a kind of target tracking method according to the embodiment of the present disclosure.Wherein, described Based on the predicted characteristics of target object in the first position and the current frame image, obtain described in the current frame image The location information of target object, comprising:

S201: being based on the first position, determines the region of search that the target object is directed in the current frame image, And fifth feature corresponding with described search region；

It in some possible embodiments, can be according to first of target object in the previous frame image of current frame image Position determines in current frame image about the region of search of target object.Wherein it is possible to the corresponding position area in first position Domain is amplified according to presupposition multiple, and the amplified band of position may act as the region of search in current frame image.By this Configuration can guarantee target object in region of search.

Wherein, presupposition multiple can be pre-set value, can be according to the type or application scenarios of target object It determines, such as can be 2, in other embodiments or other numerical value.

In some possible embodiments, after determining region of search, it can corresponding according to region of search is obtained Fifth feature, wherein can use feature extraction network execute the region of search image feature extraction processing, obtain mesh The fifth feature of object is marked, or feature extraction processing can also be executed to current frame image, and then from the figure of current frame image As selecting characteristic information corresponding with region of search, i.e. fifth feature in feature.

After the characteristic information of the available region of search of the embodiment of the present disclosure, predicted characteristics and region of search can be executed Characteristic information matching, wherein feature extraction processing can also be not especially limited by residual error network implementations, the disclosure.

S202: using the predicted characteristics as convolution kernel, the second cross-correlation coded treatment of the fifth feature is executed, is obtained The second coding characteristic arrived；

In the case where obtaining fifth feature and predicted characteristics, cross-correlation can be executed to fifth feature and predicted characteristics Coded treatment obtains the second coding characteristic.Wherein it is possible to be carried out at convolution as convolution kernel to fifth feature using predicted characteristics Reason executes the cross-correlation coded treatment, to obtain the second coding characteristic.The dimension of second coding characteristic and the dimension of fifth feature It spends identical.Wherein, the second coding characteristic of the embodiment of the present disclosure can indicate each pixel in predicted characteristics and rustling sound region Matching degree.

S203: executing the object detection process of the target object based on second coding characteristic, obtains described current The location information of target object described in image.

It, can be to the second coding characteristic performance objective detection processing, this public affairs in the case where obtaining the second coding characteristic Open embodiment can use region candidate network execute the object detection process operation, obtain the corresponding target of the second coding characteristic The candidate frame of object, the i.e. position of target object.

In some possible embodiments, for the available multiple candidate frames of target object, the embodiment of the present disclosure can To determine the position of target object based on the highest candidate frame of confidence level.Wherein, object detection process can pass through region candidate Network implementations obtains the position of the candidate frame for target object.

Target object can be obtained by way of propagated forward in each image of video flowing according to the embodiment of the present disclosure The fast and accurately tracking of target object is realized in position in frame.

It in some possible embodiments, can be in the case where detecting the position of the target object in image The location information of displaying target object outstanding, such as the band of position where target object is marked out in a manner of detection block, So as to easily know the region where target object, the disclosure is not especially limited highlighted mode.

Following for the clearer display embodiment of the present disclosure, the process of target tracking is illustrated.Fig. 6 is shown according to this Open embodiment realizes the process schematic of target tracking.

The target tracking method of the embodiment of the present disclosure can pass through twin network implementations.It is illustrated in figure 6 the framework of network Schematic diagram.Wherein, it can be applied in twin neural network in the target tracking method of the embodiment of the present disclosure, the twin nerve net Network.Twin neural network may include that network and target inspection are updated for the first branching networks, the second branching networks and feature Survey grid network, wherein the first branching networks and the second branching networks are identical；First branching networks are for detecting the initial frame image The corresponding second feature in the second position of middle target object and the second position；Second branching networks are described first for detecting The first position and the first position of target object in the previous frame image of any current frame image after beginning frame image Corresponding fisrt feature；Feature update network obtains pre- for the previous frame image based on initial frame image and current frame image Survey feature；Target detection network is used for the predicted characteristics based on the first position and current frame image, obtains the present frame The location information of target object described in image.It can also include third branching networks, the third branching networks are for being worked as The corresponding fifth feature in the region of search of prior image frame.Third branching networks can be with the first branching networks and the second branched network Network is identical.Wherein, it for any frame image (following is current frame image) after the initial frame image of video flowing, can be based on The position of the target object of the previous frame image of the position and present frame of target object determines present frame in initial frame image The location information of the target object of image.

Specifically, first by the first branching networks and the second branching networks respectively to the first position pair of initial frame image The corresponding image-region in the second position of the image-region and previous frame image answered executes feature extraction processing respectively, such as divides Not Jing Guo feature extraction network obtain corresponding fisrt feature and second feature.Wherein, the first branching networks and the second branched network Network can wrap residual error module respectively for the network implementations of the feature extraction for realizing target object, feature extraction network (Res) and convolution module (T), residual error module can be made of residual error neural network, such as Resnet-18, pass through two residual errors Neural network executes the residual noise reduction of the image-region of first position and the image-region of the second position respectively, then passes through volume Volume module executes the convolution operation of the structure of residual noise reduction, and then obtains fisrt feature and the second spy of more accurate target object Sign.The corresponding image district in first position and the second position can be more accurately extracted by above-mentioned residual noise reduction and process of convolution The characteristic information of target object in domain.In other embodiments, feature extraction can also only pass through residual error network implementations, Huo Zheye Other feature extraction network implementationss can be passed through.

In the case where obtaining fisrt feature and second feature, updating network using feature, (feature as shown in Figure 6 is more New module, Template Update Modlue) fisrt feature and second feature are handled, obtain mesh in current frame image Mark the predicted characteristics of object.Wherein it is possible to fisrt feature and second feature execute process of convolution, the first cross-correlation coding and Figure process of convolution then executes Fusion Features and obtains predicted characteristics (embodiment referring to shown in Fig. 4), and detailed process may refer to Above embodiment does not do repeated explanation herein.

In the case where obtaining predicted characteristics, the region of search of current frame image can be determined based on first position region, Feature extraction processing then is executed to the corresponding feature in the region of search by third branching networks, it is corresponding to obtain region of search Feature, then by target detection network based on the corresponding characteristic information of predicted characteristics and region of search cross-correlation coding and Target detection obtains the position of target object in final current frame image.Wherein, the embodiment of the present disclosure is examined by twin network Consider the cosmetic variation of object, the accurate update for executing feature obtains predicted characteristics.

The embodiment of the present disclosure mainly includes following several parts, and (second feature mentions for the target signature template extraction of initial frame Take), the target signature template of former frame extracts (fisrt feature extraction), template online updating module (obtaining predicted characteristics), when The feature extraction of previous frame region of search, stencil matching obtain the position of present frame tracking target.Following branch illustrates that module is realized.

(the first branching networks) are extracted for the target signature template of initial frame:

Input: coordinate position of the object in initial frame, the image of initial frame；

Output: the target signature template (second feature) of initial frame；

Specific steps are as follows: obtain image block (the corresponding image-region in the second position) centered on object space, pass through mind Carry out feature extraction through network, as object initial frame feature template (second feature).

The target signature template of former frame extracts (the second branching networks):

Input: coordinate position of the object in former frame, the image of former frame

Output: the target signature template (fisrt feature) of former frame；

Specific steps are as follows: obtain image block (the corresponding image-region in first position) centered on object space, pass through mind Carry out feature extraction through network, as object former frame feature template (fisrt feature).

Template online updating module, obtains predicted characteristics, (feature update network):

Input: initial frame clarification of objective template (second feature), former frame clarification of objective template (fisrt feature)；

Output: the feature template (predicted characteristics) suitable for present frame；

Specific steps are as follows: respectively with a convolutional layer to initial frame target signature template (second feature) and former frame target Feature template (fisrt feature) carry out feature transition (obtaining First Transition feature and the second Interim), then use cross-correlation It operates and the first cross-correlation coding is carried out to two feature templates after transition.Obtained first coding characteristic can be regarded as figure (Graph), it is then updated using the feature interaction of each node of figure neural fusion and feature, two steps of figure process of convolution It operates each with a convolution realization (obtaining second feature).Then by obtained updated second feature, then by mutual It closes operation and is decoded to the identical feature space with the input of this module, as update characteristic information (predicted characteristics).This information and just The exemplary feature of beginning frame target is added the output as this module, i.e., updated exemplary feature.

The region of search feature extraction (third branching networks) of present frame:

Input: coordinate position of the object in present frame, the image of present frame

Output: the region of search feature of present frame.

Specific steps are as follows: image block is obtained centered on first position, i.e. the acquisition region of search (size of such as region of search It is twice of template size), feature extraction is carried out by neural network, the region of search feature as present frame.

Stencil matching obtains the position (target detection network) of present frame tracking target:

Input: updated feature template (predicted characteristics), the feature of present frame region of search；

Output: the position of target object in current frame image.

Specific steps are as follows: by cross-correlation operation (process of convolution), with each position of feature template and region of search into Row similarity comparison, using this similarity result as input, by classifying and returning two neural network modules, by score of classifying As object in the position of present frame, position is modified by Recurrent networks in highest region.

It in the embodiments of the present disclosure, can be according to the location information of target object in initial frame image, after sequentially obtaining The position of target object in continuous image, wherein can according to the previous frame image and the initial frame image of current frame image, The predicted characteristics of target object in present frame frame image are obtained, and according to the first position in previous frame image and can be obtained Predicted characteristics, determine position of the target object in current frame image, wherein can by way of effective propagated forward, Target object is tracked, while the appearance of object acute variation can be rapidly adapted to.

It will be understood by those skilled in the art that each step writes sequence simultaneously in the above method of specific embodiment It does not mean that stringent execution sequence and any restriction is constituted to implementation process, the specific execution sequence of each step should be with its function It can be determined with possible internal logic.

It is appreciated that above-mentioned each embodiment of the method that the disclosure refers to, without prejudice to principle logic, To engage one another while the embodiment to be formed after combining, as space is limited, the disclosure is repeated no more.

In addition, the disclosure additionally provides target follow up mechanism, electronic equipment, computer readable storage medium, program, it is above-mentioned It can be used to realize any target tracking method that the disclosure provides, corresponding technical solution and description and referring to method part It is corresponding to record, it repeats no more.

Fig. 7 shows a kind of block diagram of target follow up mechanism according to the embodiment of the present disclosure, as shown in fig. 7, the target chases after Track device includes:

Detection module 10 is used for for any current frame image after initial frame image in video flowing, described in acquisition First position in the previous frame image of current frame image where target object；

Tracing module 20 is used to obtain based on the predicted characteristics of target object in the first position and current frame image To the location information of target object described in the current frame image, wherein the prediction of target object in the current frame image The previous frame image of initial frame image and the present frame of the feature based on the video flowing obtains.

In some possible embodiments, the tracing module 20 includes:

In some possible embodiments, the detection module 10 is also used to obtain target pair in the initial frame image As the corresponding second feature in the second position and the second position at place.

In some possible embodiments, the detection module 10 obtains in the initial frame image where target object At least one of the second position, including following manner:

In some possible embodiments, described device further include: display module is used for the figure in the video flowing Location information as highlighting the target object in frame.

In some embodiments, the embodiment of the present disclosure provides the function that has of device or comprising module can be used for holding The method of row embodiment of the method description above, specific implementation are referred to the description of embodiment of the method above, for sake of simplicity, this In repeat no more.

The embodiment of the present disclosure also proposes a kind of computer readable storage medium, is stored thereon with computer program instructions, institute It states when computer program instructions are executed by processor and realizes the above method.Computer readable storage medium can be non-volatile meter Calculation machine readable storage medium storing program for executing.

The embodiment of the present disclosure also proposes a kind of electronic equipment, comprising: processor；For storage processor executable instruction Memory；Wherein, the processor is configured to the above method.

The equipment that electronic equipment may be provided as terminal, server or other forms.

Fig. 8 shows the block diagram of a kind of electronic equipment according to the embodiment of the present disclosure.For example, electronic equipment 800 can be shifting Mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices, body-building are set It is standby, the terminals such as personal digital assistant.

Referring to Fig. 8, electronic equipment 800 may include following one or more components: processing component 802, memory 804, Power supply module 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, And communication component 816.

The integrated operation of the usual controlling electronic devices 800 of processing component 802, such as with display, call, data are logical Letter, camera operation and record operate associated operation.Processing component 802 may include one or more processors 820 to hold Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more moulds Block, convenient for the interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, with Facilitate the interaction between multimedia component 808 and processing component 802.

Memory 804 is configured as storing various types of data to support the operation in electronic equipment 800.These data Example include any application or method for being operated on electronic equipment 800 instruction, contact data, telephone directory Data, message, picture, video etc..Memory 804 can by any kind of volatibility or non-volatile memory device or it Combination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, fastly Flash memory, disk or CD.

Power supply module 806 provides electric power for the various assemblies of electronic equipment 800.Power supply module 806 may include power supply pipe Reason system, one or more power supplys and other with for electronic equipment 800 generate, manage, and distribute the associated component of electric power.

Multimedia component 808 includes the screen of one output interface of offer between the electronic equipment 800 and user. In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 808 includes a front camera and/or rear camera.When electronic equipment 800 is in operation mode, as clapped When taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when electronic equipment 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.

I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 814 includes one or more sensors, for providing the state of various aspects for electronic equipment 800 Assessment.For example, sensor module 814 can detecte the state that opens/closes of electronic equipment 800, the relative positioning of component, example As the component be electronic equipment 800 display and keypad, sensor module 814 can also detect electronic equipment 800 or The position change of 800 1 components of electronic equipment, the existence or non-existence that user contacts with electronic equipment 800, electronic equipment 800 The temperature change of orientation or acceleration/deceleration and electronic equipment 800.Sensor module 814 may include proximity sensor, be configured For detecting the presence of nearby objects without any physical contact.Sensor module 814 can also include optical sensor, Such as CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which may be used also To include acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 816 is configured to facilitate the communication of wired or wireless way between electronic equipment 800 and other equipment. Electronic equipment 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one In example property embodiment, communication component 816 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel Relevant information.In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, short to promote Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module (UWB) technology, bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuit (ASIC), number Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 804 of machine program instruction, above-mentioned computer program instructions can be executed by the processor 820 of electronic equipment 800 to complete The above method.

Fig. 9 shows another block diagram of a kind of electronic equipment according to the embodiment of the present disclosure.For example, electronic equipment 1900 can be with It is provided as a server.Referring to Fig. 9, it further comprises one or more that electronic equipment 1900, which includes processing component 1922, Processor and memory resource represented by a memory 1932, can be by the finger of the execution of processing component 1922 for storing It enables, such as application program.The application program stored in memory 1932 may include each one or more correspondence In the module of one group of instruction.In addition, processing component 1922 is configured as executing instruction, to execute the above method.

Electronic equipment 1900 can also include that a power supply module 1926 is configured as executing the power supply of electronic equipment 1900 Management, a wired or wireless network interface 1950 is configured as electronic equipment 1900 being connected to network and an input is defeated (I/O) interface 1958 out.Electronic equipment 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.

In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 1932 of machine program instruction, above-mentioned computer program instructions can by the processing component 1922 of electronic equipment 1900 execute with Complete the above method.

The disclosure can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.

Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire Electric signal.

Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.

Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure Face.

Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/ Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/ Or in block diagram each box combination, can be realized by computer-readable program instructions.

These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram The instruction of the various aspects of defined function action.

Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.

The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.

The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its Its those of ordinary skill can understand each embodiment disclosed herein.

Claims

1. a kind of target tracking method characterized by comprising

For any current frame image after initial frame image in video flowing, the previous frame image of the current frame image is obtained First position where middle target object；

Based on the predicted characteristics of target object in the first position and current frame image, obtain described in the current frame image The location information of target object, wherein the predicted characteristics of target object are based on the first of the video flowing in the current frame image The previous frame image of beginning frame image and the present frame obtains.

2. the method according to claim 1, wherein obtaining the predicted characteristics of current frame image, comprising:

The corresponding fisrt feature in the first position of target object described in previous frame image based on the current frame image, and The corresponding second feature in the second position of the target object, obtains target in the current frame image in the initial frame image The predicted characteristics of object.

3. method according to claim 1 or 2, which is characterized in that described to be directed in video flowing after initial frame image Any current frame image, before obtaining the first position in the previous frame image of the current frame image where target object, institute State method further include:

Obtain the second position in the initial frame image where target object and corresponding second spy in the second position Sign.

4. according to the method described in claim 3, it is characterized in that, described obtain in the initial frame image where target object At least one of the second position, including following manner:

The position mask figure for being directed to the target object in the initial frame image is obtained, the mesh is determined based on the mask figure Mark the second position of object；

The frame selection operation for being directed to the initial frame image is received, the mesh is determined based on the corresponding band of position of the frame selection operation Mark the second position of object；

The initial frame image performance objective is detected and is operated, the mesh is determined based on the testing result of target detection operation Mark the second position of object.

5. according to the method described in claim 2, it is characterized in that, in the previous frame image based on the current frame image The second of the target object in the corresponding fisrt feature in the first position of the target object and the initial frame image Corresponding second feature is set, the predicted characteristics of target object in the current frame image are obtained, comprising:

First cross-correlation coded treatment and figure process of convolution are executed to the First Transition feature and the second Interim, obtained Third feature；

Fusion Features processing based on the third feature, First Transition feature and second feature, obtains the predicted characteristics.

6. according to the method described in claim 5, it is characterized in that, described to the First Transition feature and the second Interim The first cross-correlation coded treatment and figure process of convolution are executed, third feature is obtained, comprising:

First cross-correlation coded treatment is executed to the First Transition feature and the second Interim, obtains the first coding characteristic；

7. according to the method described in claim 6, it is characterized in that, being executed to the First Transition feature and the second Interim First cross-correlation coded treatment, obtains the first coding characteristic, comprising:

Matrix multiple operation is executed to the First Transition feature and the second Interim, obtains first coding characteristic.

8. a kind of target follow up mechanism characterized by comprising

Detection module is used to obtain the present frame for any current frame image after initial frame image in video flowing First position in the previous frame image of image where target object；

Tracing module is used for the predicted characteristics based on target object in the first position and current frame image, obtains described The location information of target object described in current frame image, wherein the predicted characteristics base of target object in the current frame image It is obtained in the initial frame image of the video flowing and the previous frame image of the present frame.

9. a kind of electronic equipment characterized by comprising

Processor；

Memory for storage processor executable instruction；

Wherein, it the processor is configured to calling the instruction of the memory storage, is required with perform claim any in 1 to 7 Method described in one.

10. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that the computer Method described in any one of claim 1 to 7 is realized when program instruction is executed by processor.