CN107452025A

CN107452025A - Method for tracking target, device and electronic equipment

Info

Publication number: CN107452025A
Application number: CN201710710392.4A
Authority: CN
Inventors: 陈志超; 马骁; 周剑
Original assignee: Chengdu Tongjia Youbo Technology Co Ltd
Current assignee: Chengdu Tongjia Youbo Technology Co Ltd
Priority date: 2017-08-18
Filing date: 2017-08-18
Publication date: 2017-12-08

Abstract

The present invention relates to technical field of image processing, there is provided a kind of method for tracking target, device and electronic equipment, methods described include：The previous frame image of video is obtained first, mark the position of the target to be tracked in previous frame image, then it is connected by the way that clarification of objective figure to be tracked is carried out into image channel with the characteristic pattern of latter two field picture, and the position for carrying out target to be tracked returns, position coordinates of the target to be tracked in latter two field picture is determined, realizes target following.Compared with the existing local search approach based on image, the present invention supports full figure search, and with it is existing based on Edge box method for tracking target compared with, the present invention realizes target following by the search of full figure end to end, it is not rely on target in itself, it can quickly be given for change after target is lost, effectively realize target following.

Description

Method for tracking target, device and electronic equipment

Technical field

The present invention relates to technical field of image processing, is set in particular to a kind of method for tracking target, device and electronics It is standby.

Background technology

In monotrack, how to be given for change again after track algorithm loses target, be always the difficulty of monotrack Point.Existing many method for tracking target are all based on the local search approach of image, such as particle filter, correlation filtering etc., The condition that this tracking is realized is that target must be present in inside regional area, when target is not present in inside regional area When, tracking will fail.In recent years, there are some track algorithms for using full figure to search for, such as 2016 CVPR (Conference on Computer Vision and Pattern Recognition, international computer vision and mould Formula identifies meeting) method for tracking target based on Edge box (marginal information) that proposes, but this method is highly dependent upon mesh Target quality, therefore, in many cases, it still can not detect lost target.

The content of the invention

It is an object of the invention to provide a kind of method for tracking target, device and electronic equipment, to improve above mentioned problem.

To achieve these goals, the technical scheme that the embodiment of the present invention uses is as follows：

In a first aspect, the invention provides a kind of method for tracking target, methods described includes：Obtain the former frame figure of video Picture, mark the position of the target to be tracked in previous frame image；According to the position of target to be tracked, default convolution god is utilized Through Network Capture fisrt feature figure, wherein, fisrt feature figure is clarification of objective figure to be tracked；According to a later frame figure of video Picture, second feature figure is obtained using default convolutional neural networks, wherein, latter two field picture is to connect in video with previous frame image Continuous image, second feature figure are the characteristic pattern of latter two field picture；Fisrt feature figure and second feature figure are subjected to image channel Connection, obtains third feature figure, wherein, the image channel number of third feature figure is the image channel number and second of fisrt feature figure The image channel number sum of characteristic pattern；In third feature figure, the position for carrying out target to be tracked returns, and obtains target to be tracked Position coordinates in latter two field picture.

Second aspect, the invention provides a kind of target tracker, described device includes the first image collection module, the One characteristic pattern acquisition module, second feature figure acquisition module, image channel link block and position regression block.Wherein, first Image collection module is used for the previous frame image for obtaining video, marks the position of the target to be tracked in previous frame image；The One characteristic pattern acquisition module is used for the position according to target to be tracked, and fisrt feature is obtained using default convolutional neural networks Figure, wherein, fisrt feature figure is clarification of objective figure to be tracked；Second feature figure acquisition module is used for a later frame according to video Image, using default convolutional neural networks obtain second feature figure, wherein, latter two field picture be video in previous frame image Continuous image, second feature figure are the characteristic pattern of latter two field picture；Image channel link block, for by fisrt feature figure and Second feature figure carries out image channel connection, obtains third feature figure, wherein, the image channel number of third feature figure is special for first Levy the image channel number of figure and the image channel number sum of second feature figure；Position regression block, in third feature figure, The position for carrying out target to be tracked returns, and obtains position coordinates of the target to be tracked in latter two field picture.

The third aspect, the invention provides a kind of electronic equipment, the electronic equipment include memory, processor and Target tracker, described device are stored in the memory and including one or more softwares by the computing device Functional module group.Described device include the first image collection module, fisrt feature figure acquisition module, second feature figure acquisition module, Image channel link block and position regression block.Wherein, the first image collection module is used for the previous frame image for obtaining video, Mark the position of the target to be tracked in previous frame image；Fisrt feature figure acquisition module is used for the position according to target to be tracked Put, fisrt feature figure is obtained using default convolutional neural networks, wherein, fisrt feature figure is clarification of objective figure to be tracked； Second feature figure acquisition module is used for the latter two field picture according to video, and second feature is obtained using default convolutional neural networks Figure, wherein, it in video with the continuous image of previous frame image, second feature figure is the feature of latter two field picture that latter two field picture, which is, Figure；Image channel link block, connected for fisrt feature figure to be carried out into image channel with second feature figure, obtain third feature Figure, wherein, the image channel number of third feature figure is the image channel number of fisrt feature figure and the image channel of second feature figure Number sum；Position regression block, the position in third feature figure, carrying out target to be tracked return, and obtain mesh to be tracked The position coordinates being marked in latter two field picture.

Compared with the prior art, the invention has the advantages that：A kind of method for tracking target provided by the invention, device And electronic equipment, the previous frame image of video is obtained first, marks the position of the target to be tracked in previous frame image, then It is connected by the way that clarification of objective figure to be tracked is carried out into image channel with the characteristic pattern of latter two field picture, and carries out target to be tracked Position return, determine position coordinates of the target to be tracked in latter two field picture, realize target following.It is based on existing The local search approach of image is compared, and method for tracking target provided by the invention supports full figure search, and is based on existing Edge box method for tracking target is compared, and the present invention realizes target following by the search of full figure end to end, is not rely on Target in itself, can quickly be given for change after target is lost, effectively realize target following.

To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate Appended accompanying drawing, is described in detail below.

Brief description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.

Fig. 1 shows the block diagram of electronic equipment provided in an embodiment of the present invention.

Fig. 2 shows method for tracking target flow chart provided in an embodiment of the present invention.

Fig. 3 is the sub-step flow chart of the step S103 shown in Fig. 2.

Fig. 4 is the sub-step flow chart of the step S107 shown in Fig. 2.

Fig. 5 shows the block diagram of target tracker provided in an embodiment of the present invention.

Fig. 6 is the block diagram that module is finely tuned in the target tracker shown in Fig. 5.

Fig. 7 be Fig. 5 shown in target tracker in position regression block block diagram.

Icon：100- electronic equipments；101- memories；102- storage controls；103- processors；104- Peripheral Interfaces； 105- display screens；200- target trackers；The image collection modules of 201- first；202- pre-training modules；203- finely tunes module； 2031- Face datection units；2032- training sample generation units；2033- fine-adjusting units；204- fisrt feature figure acquisition modules； 205- second feature figure acquisition modules；206- image channel link blocks；207- positions regression block；2071- characteristic patterns divide Unit；2072- positions return unit.

Embodiment

Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Generally exist The component of the embodiment of the present invention described and illustrated in accompanying drawing can be configured to arrange and design with a variety of herein.Cause This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.

It should be noted that：Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent accompanying drawing in individual accompanying drawing.Meanwhile the present invention's In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.

Fig. 1 is refer to, Fig. 1 shows the block diagram of electronic equipment 100 provided in an embodiment of the present invention.Electronic equipment 100 may be, but not limited to, smart mobile phone, tablet personal computer, pocket computer on knee, vehicle-mounted computer, personal digital assistant (personal digital assistant, PDA), wearable mobile terminal etc..The electronic equipment 100 include target with Track device 200, memory 101, storage control 102, processor 103, Peripheral Interface 104 and display screen 105.

The memory 101, storage control 102, processor 103, Peripheral Interface 104 and each element phase of display screen 105 Directly or indirectly it is electrically connected between mutually, to realize the transmission of data or interaction.For example, these elements can pass through between each other One or more communication bus or signal wire, which are realized, to be electrically connected with.The target tracker 200 include it is at least one can be with soft The form of part or firmware (firmware) is stored in the memory 101 or is solidificated in the operation system of the electronic equipment 100 Software function module in system (operating system, OS).The processor 103 is used to perform to store in memory 101 Executable module, such as the software function module or computer program that the target tracker 200 includes.

Wherein, memory 101 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only storage (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc.. Wherein, memory 101 is used for storage program, and the processor 103 performs described program, this hair after execute instruction is received Method performed by the server for the flow definition that bright any embodiment discloses can apply in processor 103, or by Reason device 103 is realized.

Processor 103 can be a kind of IC chip, have signal handling capacity.Above-mentioned processor 103 can be with It is general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP), speech processor and video processor etc.；Can also be digital signal processor, application specific integrated circuit, Field programmable gate array either other PLDs, discrete gate or transistor logic, discrete hardware components. It can realize or perform disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor can be Microprocessor or the processor 103 can also be any conventional processors etc..

The Peripheral Interface 104 is used to various input/output devices being coupled to processor 103 and memory 101. In some embodiments, Peripheral Interface 104, processor 103 and storage control 102 can be realized in one single chip.At it In his some examples, they can be realized by independent chip respectively.

Display screen 105 is used to realize interacting between user and electronic equipment 100, specifically may be, but not limited to, display Screen 105 is shown the video for needing progress target following or image.

First embodiment

Fig. 2 is refer to, Fig. 2 shows method for tracking target flow chart provided in an embodiment of the present invention.Method for tracking target Comprise the following steps：

Step S101, the previous frame image of video is obtained, mark the position of the target to be tracked in previous frame image.

In embodiments of the present invention, previous frame image can be removed in the starting two field picture or video of video Any one two field picture beyond beginning frame.Target to be tracked may be, but not limited to, face.

Step S102, using supervised method, pre-training is carried out to the offline human face data collection comprising multiple facial images, Obtain convolutional neural networks.

In embodiments of the present invention, offline human face data collection can be the data set for including multiple facial images, these people Face image can be downloaded from network and obtained in advance.Convolutional neural networks are used to carry out feature extraction, and its structure can be：Input Layer-convolution and sample level-output layer, wherein, input layer is the facial image of input, convolution and sample level include process of convolution and Max Pooling processing, the corresponding face characteristic of each neuron of output layer.

In embodiments of the present invention, because the feature of facial image is mainly made up of curve and straight line, and curve can also Represented by the feature of many similar circular arcs, along with the direct orthonormal feature of rectangle, therefore the shape facility of two dimensional image divides For three classes：Circular arc, rectangle, straight line, therefore this three category feature is extracted using three layers of fuzzy convolutional neural networks, because three layers Convolutional neural networks will not both increase workload, and and can preferably carries out feature recognition.

As a kind of embodiment, when establishing convolutional neural networks, the type of input data is considered, due in this hair The type of input data is image in bright embodiment, and image is not easy to be quantified as fuzzy type data, therefore introduces fuzzy set to input Facial image is handled.The method being blurred to facial image can use membership function, in the work of membership function Under, the image information of facial image is divided into high, medium and low three parts, three sections as convolutional neural networks input layer Point, image information input is trained from the input layer of convolutional neural networks.Further, since introducing fuzzy set, therefore need To each corresponding FUZZY WEIGHTED operator of input node one, FUZZY WEIGHTED operator scope between [0,1], be subordinate to by fuzzy Category degree and eigenmatrix composition.

Framework, which obscures weighted operator, can use formula W={ w₁,w₂,w₃, wherein,W can To be straight line, circular arc, the normalized fuzzy characteristics degree of membership in right angle, A is straight line, eigenmatrix corresponding to circular arc, right angle.

The computational methods of eigenmatrix can be：First, analytic representation is carried out under rectangular coordinate system；Second, in level The second differnce of discrete point is represented with vertical direction；3rd, by the way that the expression that above two steps obtain is carried out into simultaneous Determine feature decision expression formula；4th, eigenmatrix is extracted by feature decision expression formula.

Convolutional neural networks are finely adjusted by step S103, the convolutional neural networks after being finely tuned.

In embodiments of the present invention, pre-training is carried out to the offline human face data collection comprising multiple facial images and obtains convolution , it is necessary to be finely adjusted to convolutional neural networks after neutral net, the convolutional Neural net after the fine setting for feature extraction is obtained Network, as a kind of embodiment, the method being finely adjusted to convolutional neural networks can be：First, using human-face detector pair Each two field picture in video carries out Face datection, obtains training data；Secondly, each pixel in each two field picture With the position relationship of face, the pixel in each two field picture is made a distinction, generates positive and negative two classes training sample；Finally, with The positive negative training sample obtained is entered as input using Siamese networks to the convolutional neural networks of pre-training in step S102 Row fine setting, on-line study have more the face characteristic of distinction and adaptability.

In the present embodiment, two convolutional neural networks that Siamese networks are identical by structure and weights are shared form, with Two facial images are carried out micro- as input using contrast loss function to the convolutional neural networks of pre-training in step S102 Adjust.In Siamese networks, the process of face characteristic extraction can be expressed as f (x)=Conv (x, w), wherein, Conv is mapping Function, x are the facial images of input, and f (x) represents the characteristic vector extracted.

Fig. 3 is refer to, step S103 also includes following sub-step：

Sub-step S1031, Face datection is carried out to each two field picture in video, obtains training data.

Sub-step S1032, in training data, the location point of region of search is made a distinction, generate positive and negative two classes training Sample.

In embodiments of the present invention, Face datection is carried out to each two field picture in video, it is right after obtaining training data Each two field picture in video, according to the position relationship of each pixel and face in image, pixel is made a distinction, it is raw Into positive and negative two classes training sample.Pixel distinguish method can be：Centered on face, by the pixel in face preset range Point is used as Positive training sample, can be according to people using other pixels beyond this scope as negative training sample, preset range Face particular location in the picture and size determine.

Sub-step S1033, using the positive negative training sample of acquisition as input, the volume using Siamese networks to pre-training Product neutral net is finely adjusted, the convolutional neural networks after being finely tuned, wherein, Siamese networks are identical by structure and weights Shared two convolutional neural networks composition.

Step S104, according to the position of target to be tracked, fisrt feature figure is obtained using default convolutional neural networks, its In, fisrt feature figure is clarification of objective figure to be tracked.

In embodiments of the present invention, default convolutional neural networks can be the volume after the fine setting obtained using step S103 Product neutral net, according to the position of target to be tracked, obtaining the method for fisrt feature figure can be：Utilize the convolution god after fine setting Through network, feature extraction is carried out to target to be tracked, obtains fisrt feature figure.

Step S105, according to the latter two field picture of video, second feature figure is obtained using default convolutional neural networks, its In, it in video with the continuous image of previous frame image, second feature figure is the characteristic pattern of latter two field picture that latter two field picture, which is,.

In embodiments of the present invention, default convolutional neural networks can be the volume after the fine setting obtained using step S103 Product neutral net, according to the latter two field picture of video, obtaining the method for second feature figure can be：Utilize the convolution god after fine setting Through network, feature extraction is carried out to latter two field picture, obtains second feature figure.

Step S106, fisrt feature figure is subjected to image channel with second feature figure and connected, obtains third feature figure, its In, the image channel number of third feature figure for fisrt feature figure image channel number and second feature figure image channel number it With.

In embodiments of the present invention, it is after obtaining fisrt feature figure and second feature figure, fisrt feature figure and second is special Sign figure carries out image channel connection, and obtaining the method for third feature figure can be：Obtain the figure of each pixel of fisrt feature figure As passage, and the image channel of each pixel of second feature figure is obtained, then by the image of each pixel of fisrt feature figure Passage carries out pixel-level image passage with the image channel of each pixel of second feature figure and connected, with regard to that can obtain down third feature Figure, the image channel number of third feature figure for fisrt feature figure image channel number and second feature figure image channel number it With the channel attached method of Pixel-level can effectively avoid the re-detection problem in object tracking process.

Step S107, in third feature figure, the position for carrying out target to be tracked returns, and obtains target to be tracked latter Position coordinates in two field picture.

In embodiments of the present invention, using the target location homing method based on YOLO, treated in third feature figure The position for tracking target returns.Can be using the target location homing method based on YOLO：First, third feature figure is divided Into S × S (for example, 7 × 7) grids；Secondly, for each grid, 2 frames are all predicted, wherein, each frame is to be tracked The probability of the confidence level of target and each frame region in multiple classifications；Finally, 7*7*2 can be predicted according to previous step Individual target window, possibility is removed than relatively low target window, the position of target to be tracked is directly returned, further according to treating Determine the position coordinates in latter two field picture in the position of tracking target, you can complete target following.

In embodiments of the present invention, YOLO uses side and error to calculate each picture in third feature figure as LOSS functions The mistake of each pixel corresponding with real latter two field picture in the third feature figure of the loss of vegetarian refreshments, i.e. final output Difference.LOSS functions can represent according to following formula：

Wherein, x, y, c, p are predicted value；x、y、w、For mark value；Represent that target to be tracked falls into grid i In,Represent that target to be tracked is fallen into grid i j-th of bounding box；Represent that target to be tracked does not fall Enter in grid i j-th of bounding box；α_coord=5；α_noobj=0.5；B represents each foreseeable B bounding of grid box。

Fig. 4 is refer to, step S107 also includes following sub-step：

Sub-step S1071, third feature figure is divided into S × S grid.

Sub-step S1072, each grid are predicted recurrence to the position of target to be tracked, obtain target to be tracked rear Position coordinates in one two field picture.

It should be noted that if previous frame image were not starting two field picture, the target to be tracked in previous frame image Position determine also with above method.In other words, if to determine the position of target to be tracked in the 4th two field picture, Then first since start frame, using starting two field picture as previous frame image, using the second two field picture as latter two field picture, utilize Above method determines the position of the target to be tracked in the second two field picture；Next, using the second two field picture as former frame figure Picture, using the 3rd two field picture as latter two field picture, the position of the target to be tracked in the 3rd two field picture is determined using above method Put；Next, using the 3rd two field picture as previous frame image, using the 4th two field picture as latter two field picture, utilize with top Method determines the position of the target to be tracked in the 4th two field picture, so using method end to end, just can finally determine the 4th The position of target to be tracked in two field picture.

In embodiments of the present invention, first, pre-training is carried out to data set, obtains convolutional neural networks, then to convolution god It is finely adjusted through network, the spy of clarification of objective to be tracked and latter two field picture full figure is extracted with the convolutional neural networks after fine setting Sign, improve the efficiency and treatment effect of target following；Secondly, by by clarification of objective figure to be tracked and latter two field picture Characteristic pattern carries out pixel-level image passage connection, can effectively avoid the re-detection problem in object tracking process；Finally, carry out The position of target to be tracked returns, and finally determines position coordinates of the target to be tracked in latter two field picture；Utilize YOLO side The position that method carries out target to be tracked returns, and determines position coordinates of the target to be tracked in latter two field picture, realizes target Tracking, improve the speed of target following.

Second embodiment

Fig. 5 is refer to, Fig. 5 shows the block diagram of target tracker 200 provided in an embodiment of the present invention.Target Tracks of device 200 includes the first image collection module 201, pre-training module 202, fine setting module 203, fisrt feature figure and obtains mould Block 204, second feature figure acquisition module 205, image channel link block 206 and position regression block 207.

First image collection module 201, for obtaining the previous frame image of video, mark in previous frame image treat with The position of track target.

In embodiments of the present invention, the first image collection module 201 can be used for performing step S101.

Pre-training module 202, for utilizing supervised method, the offline human face data collection comprising multiple facial images is entered Row pre-training, obtains convolutional neural networks.

In embodiments of the present invention, pre-training module 202 can be used for performing step S102.

Finely tune module 203, for being finely adjusted to convolutional neural networks, the convolutional neural networks after being finely tuned.

In embodiments of the present invention, fine setting module 203 can be used for performing step S103.

Fig. 6 is refer to, Fig. 6 is the block diagram that module 203 is finely tuned in the target tracker 200 shown in Fig. 5.Fine setting Module 203 includes Face datection unit 2031, training sample generation unit 2032 and fine-adjusting unit 2033.

Face datection unit 2031, for carrying out Face datection to each two field picture in video, obtain training data.

In embodiments of the present invention, Face datection unit 2031 can be used for performing sub-step S1031.

Training sample generation unit 2032, in training data, being made a distinction to the location point of region of search, generation Positive and negative two classes training sample.

In embodiments of the present invention, training sample generation unit 2032 can be used for performing sub-step S1032.

Fine-adjusting unit 2033, for using the positive negative training sample of acquisition as input, using Siamese networks to pre-training Convolutional neural networks be finely adjusted, the convolutional neural networks after being finely tuned, wherein, Siamese networks it is identical by structure and The shared two convolutional neural networks composition of weights.

In embodiments of the present invention, fine-adjusting unit 2033 can be used for performing sub-step S1033.

Fisrt feature figure acquisition module 204, for the position according to target to be tracked, utilize default convolutional neural networks Fisrt feature figure is obtained, wherein, fisrt feature figure is clarification of objective figure to be tracked.

In embodiments of the present invention, fisrt feature figure acquisition module 204 can be used for performing step S104.

Second feature figure acquisition module 205, for the latter two field picture according to video, utilize default convolutional neural networks Second feature figure is obtained, wherein, it in video with the continuous image of previous frame image, second feature figure is latter that latter two field picture, which is, The characteristic pattern of two field picture.

In embodiments of the present invention, second feature figure acquisition module 205 can be used for performing step S105.

Image channel link block 206, connect, obtain for fisrt feature figure to be carried out into image channel with second feature figure Third feature figure, wherein, the image channel number of third feature figure is the image channel number and second feature figure of fisrt feature figure Image channel number sum.

In embodiments of the present invention, image channel link block 206 can be used for performing step S106.

Position regression block 207, the position in third feature figure, carrying out target to be tracked return, obtain treating with Position coordinates of the track target in latter two field picture.

In embodiments of the present invention, position regression block 207 can be used for performing step S107.

It refer to Fig. 7, Fig. 7 is the block diagram of position regression block 207 in target tracker 200 shown in Fig. 5. Position regression block 207 includes characteristic pattern division unit 2071 and position returns unit 2072.

Characteristic pattern division unit 2071, for third feature figure to be divided into S × S grid.

In embodiments of the present invention, characteristic pattern division unit 2071 can be used for performing sub-step S1071.

Position returns unit 2072, and recurrence is predicted to the position of target to be tracked for each grid, obtain treating with Position coordinates of the track target in latter two field picture.

In embodiments of the present invention, position returns unit 2072 and can be used for performing sub-step S1072.

In summary, a kind of method for tracking target, device and electronic equipment provided by the invention, methods described include：Obtain The previous frame image of video is taken, marks the position of the target to be tracked in previous frame image；According to the position of target to be tracked, Fisrt feature figure is obtained using default convolutional neural networks, wherein, fisrt feature figure is clarification of objective figure to be tracked；According to The latter two field picture of video, second feature figure is obtained using default convolutional neural networks, wherein, latter two field picture is in video With the continuous image of previous frame image, second feature figure is the characteristic pattern of latter two field picture；By fisrt feature figure and second feature Figure carries out image channel connection, obtains third feature figure, wherein, the image channel number of third feature figure is the figure of fisrt feature figure As the image channel number sum of port number and second feature figure；In third feature figure, the position for carrying out target to be tracked returns, Obtain position coordinates of the target to be tracked in latter two field picture.Compared with the existing local search approach based on image, this Invention provide method for tracking target support full figure search, and with it is existing based on Edge box method for tracking target compared with, The present invention realizes target following by the search of full figure end to end, is not rely on target in itself, can be after target loss Quickly give for change, effectively realize target following.

In several embodiments provided herein, it should be understood that disclosed apparatus and method, can also pass through Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing Show the device of multiple embodiments according to the present invention, method and computer program product architectural framework in the cards, Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of a module, program segment or code Part, a part for the module, program segment or code include one or more and are used to realize holding for defined logic function Row instruction.It should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also with different from The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially perform substantially in parallel, they are sometimes It can perform in the opposite order, this is depending on involved function.It is it is also noted that every in block diagram and/or flow chart The combination of individual square frame and block diagram and/or the square frame in flow chart, function or the special base of action as defined in performing can be used Realize, or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.

In addition, each functional module in each embodiment of the present invention can integrate to form an independent portion Point or modules individualism, can also two or more modules be integrated to form an independent part.

If the function is realized in the form of software function module and is used as independent production marketing or in use, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are causing a computer equipment (can be People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the present invention. And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.Need Illustrate, herein, such as first and second or the like relational terms be used merely to by an entity or operation with Another entity or operation make a distinction, and not necessarily require or imply between these entities or operation any this reality be present The relation or order on border.Moreover, term " comprising ", "comprising" or its any other variant are intended to the bag of nonexcludability Contain, so that process, method, article or equipment including a series of elements not only include those key elements, but also including The other element being not expressly set out, or also include for this process, method, article or the intrinsic key element of equipment. In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including the key element Process, method, other identical element also be present in article or equipment.

The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that：Similar label and letter exists Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing It is further defined and explained.

Claims

1. a kind of method for tracking target, it is characterised in that methods described includes：

The previous frame image of video is obtained, marks the position of the target to be tracked in the previous frame image；

According to the position of the target to be tracked, fisrt feature figure is obtained using default convolutional neural networks, wherein, described the One characteristic pattern is the clarification of objective figure to be tracked；

According to the latter two field picture of the video, second feature figure is obtained using default convolutional neural networks, wherein, after described It in the video with the continuous image of previous frame image, the second feature figure is the spy of the latter two field picture that one two field picture, which is, Sign figure；

The fisrt feature figure is carried out into image channel with the second feature figure to connect, obtains third feature figure, wherein, it is described The image channel number of third feature figure is the image channel number of the fisrt feature figure and the image channel of the second feature figure Number sum；

In the third feature figure, the position for carrying out the target to be tracked returns, and obtains the target to be tracked described Position coordinates in latter two field picture.

2. the method as described in claim 1, it is characterised in that the position according to the target to be tracked, using default Convolutional neural networks obtain fisrt feature figure the step of before, in addition to：

Using supervised method, pre-training is carried out to the offline human face data collection comprising multiple facial images, obtains convolutional Neural Network；

The convolutional neural networks are finely adjusted, the convolutional neural networks after being finely tuned.

3. method as claimed in claim 2, it is characterised in that it is described that the convolutional neural networks are finely adjusted, obtain micro- The step of convolutional neural networks after tune, including：

Face datection is carried out to each two field picture in the video, obtains training data；

In the training data, the location point of region of search is made a distinction, generates positive and negative two classes training sample；

Using the positive negative training sample of acquisition as input, the convolutional neural networks of pre-training are carried out using Siamese networks micro- Adjust, the convolutional neural networks after being finely tuned, wherein, two volumes that the Siamese networks are identical by structure and weights are shared Product neutral net composition.

4. method as claimed in claim 2, it is characterised in that the position according to the target to be tracked, using default Convolutional neural networks obtain fisrt feature figure the step of, including：

Feature extraction is carried out to the target to be tracked using the convolutional neural networks after fine setting, obtains fisrt feature figure.

5. method as claimed in claim 2, it is characterised in that the latter two field picture according to the video, using default Convolutional neural networks obtain second feature figure the step of, including：

Feature extraction is carried out to the latter two field picture using the convolutional neural networks after fine setting, obtains second feature figure.

6. the method as described in claim 1, it is characterised in that the position of the target to be tracked is carried out to the third feature figure Put back into and return, obtain the target to be tracked in the rear position coordinates in a two field picture the step of, including：

The third feature figure is divided into S × S grid；

Each grid is predicted recurrence to the position of the target to be tracked, obtains the target to be tracked in the rear Position coordinates in one two field picture.

7. a kind of target tracker, it is characterised in that described device includes：

First image collection module, for obtaining the previous frame image of video, mark to be tracked in the previous frame image The position of target；

Fisrt feature figure acquisition module, for the position according to the target to be tracked, obtained using default convolutional neural networks Fisrt feature figure is taken, wherein, the fisrt feature figure is the clarification of objective figure to be tracked；

Second feature figure acquisition module, for the latter two field picture according to the video, obtained using default convolutional neural networks Second feature figure is taken, wherein, the latter two field picture is special with the continuous image of previous frame image, described second in the video Sign figure is the characteristic pattern of the latter two field picture；

Image channel link block, connect, obtain for the fisrt feature figure to be carried out into image channel with the second feature figure To third feature figure, wherein, the image channel number and institute of the image channel number of the third feature figure for the fisrt feature figure State the image channel number sum of second feature figure；

Position regression block, the position in the third feature figure, carrying out the target to be tracked return, and obtain described Target to be tracked position coordinates in a two field picture in the rear.

8. device as claimed in claim 7, it is characterised in that described device also includes：

Pre-training module, for utilizing supervised method, the offline human face data collection comprising multiple facial images is instructed in advance Practice, obtain convolutional neural networks；

Finely tune module, for being finely adjusted to the convolutional neural networks, the convolutional neural networks after being finely tuned.

9. device as claimed in claim 7, it is characterised in that the position regression block includes：

Characteristic pattern division unit, for the third feature figure to be divided into S × S grid；

Position returns unit, and recurrence is predicted to the position of the target to be tracked for each grid, obtains described Target to be tracked position coordinates in a two field picture in the rear.

10. a kind of electronic equipment, it is characterised in that the electronic equipment includes：

Memory；

Processor；And

Target tracker, described device are stored in the memory and including one or more by the computing device Software function module, it includes：