CN110189364A

CN110189364A - For generating the method and apparatus and method for tracking target and device of information

Info

Publication number: CN110189364A
Application number: CN201910480692.7A
Authority: CN
Inventors: 卢艺帆
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2019-08-30
Anticipated expiration: 2039-06-04
Also published as: CN110189364B

Abstract

Embodiment of the disclosure discloses method and apparatus for generating information and method for tracking target and device.One specific embodiment of the method for being used to generate information includes: acquisition target video；The selecting video frame from target video；Determine the wrist location information of wrist object in video frame；Based on wrist location information, the palm location information of the palm object in the subsequent video frame of video frame is generated.The embodiment can be based on wrist location information, determine palm location information, enrich the method for determination of palm location information, help to improve the accuracy of identified palm position.

Description

For generating the method and apparatus and method for tracking target and device of information

Technical field

Embodiment of the disclosure is related to field of computer technology, and in particular to the method and apparatus for generating information, with And method for tracking target and device.

Background technique

Target following is the technology that the subject in a kind of pair of video is positioned.Target following is at image/video The popular problem for receiving significant attention and studying in reason field.The technology is to initialize first frame video frame in video, wherein The target that needs track is determined in first frame video frame, in succeeding target video, it is thus necessary to determine that go out target to be tracked each Position in a video frame.

Existing target tracking algorism is broadly divided into following two class:

Production (generative) model: establishing object module by on-line study mode, then uses pattern search The smallest image-region of reconstruction error completes target positioning.

Discriminate (discrimination) model: regard target following as a binary classification problems, extract simultaneously Target and background information is used to train classifier, target is separated from image sequence background, to obtain present frame Target position.

Summary of the invention

The present disclosure proposes the method and apparatus and method for tracking target and device for generating information.

In a first aspect, embodiment of the disclosure provides a kind of method for generating information, this method comprises: obtaining mesh Mark video；The selecting video frame from target video；Determine the wrist location information of wrist object in video frame；Based on wrist location Information generates the palm location information of the palm object in the subsequent video frame of video frame.

In some embodiments, the wrist location information of wrist object in video frame is determined, comprising: be input to video frame Trained joint orientation model in advance, obtains the wrist location information of wrist object in video frame, wherein joint orientation model is used In the position for determining human joint points, human joint points include wrist.

In some embodiments, it is based on wrist location information, generates the palm object in the subsequent video frame of video frame Palm location information, comprising: be input to image-region in the subsequent video frame of video frame, corresponding with wrist location information Trained trace model in advance, obtains palm location information, wherein trace model is for determining hand in inputted image-region Slap the position of object.

In some embodiments, it is based on wrist location information, generates the palm object in the subsequent video frame of video frame Palm location information, comprising: in the subsequent video frame of video frame, the corresponding image-region of wrist location information is amplified Processing, obtains enlarged drawing region；Enlarged drawing region is input to trace model trained in advance, obtains palm position letter Breath, wherein trace model is for determining the position of palm object in inputted image-region；Refer in response to palm location information Show do not include palm object in enlarged drawing region, subsequent video frame is input to joint orientation model, obtains wrist object Wrist location information；In subsequent video frame, the corresponding image-region of wrist location information is input to trace model, is obtained in one's hands Slap the palm location information of object.

In some embodiments, target video is the video of current shooting and presentation.

Second aspect, embodiment of the disclosure provide a kind of method for tracking target, this method comprises: obtaining current shooting And the target video presented；Selecting video frame is as first object video frame from the target video, and execute as follows with Track step: first object video frame is input to joint orientation model trained in advance, obtains wrist in first object video frame The wrist location information of object, wherein the joint orientation model is used to determine the position of human joint points, human joint points packet Wrist is included, and executes following tracking sub-step: choosing the subsequent video frame of first object video frame from the target video As the second target video frame: the determining wrist position with the wrist object in first object video frame in the second target video frame Confidence ceases corresponding image-region；Image-region in second target video frame is input to trace model trained in advance, is obtained To palm location information, wherein the trace model is for determining the position of palm object in inputted image-region；Response Include palm object in the palm location information instruction image-region of palm object in the second target video frame, is based on palm The palm location information of object in the image area determines palm location information of the palm object in the second target video frame.

In some embodiments, this method further include: in response to last in the second non-targeted video of target video frame Frame continues to execute tracking sub-step using the second target video frame as first object video frame.

In some embodiments, this method further include: in response to the palm position of the palm object in the second target video frame Do not include palm object in confidence breath instruction image-region to continue to hold using the second target video frame as first object video frame The row tracking step.

The third aspect, embodiment of the disclosure provide a kind of for generating the device of information, which includes: first to obtain Unit is taken, is configured to obtain target video；First selection unit is configured to the selecting video frame from target video；It determines Unit is configured to determine the wrist location information of wrist object in video frame；Generation unit is configured to based on wrist location Information generates the palm location information of the palm object in the subsequent video frame of video frame.

In some embodiments, determination unit includes: the first input module, is configured to for video frame to be input to preparatory instruction Experienced joint orientation model obtains the wrist location information of wrist object in video frame, wherein joint orientation model is for determining The position of human joint points, human joint points include wrist.

In some embodiments, generation unit includes: the second input module, is configured to the subsequent video frame of video frame In, corresponding with wrist location information image-region be input to trace model trained in advance, obtain palm location information, In, trace model is for determining the position of palm object in inputted image-region.

In some embodiments, generation unit includes: enhanced processing module, is configured to the subsequent video frame to video frame In, the corresponding image-region of wrist location information amplify processing, obtain enlarged drawing region；Third input module, quilt It is configured to for being input in enlarged drawing region trace model trained in advance, obtains palm location information, wherein trace model is used In the position for determining palm object in inputted image-region；4th input module is configured in response to palm position letter Breath instruction does not include palm object in enlarged drawing region, and subsequent video frame is input to joint orientation model, obtains wrist pair The wrist location information of elephant；5th input module is configured in subsequent video frame, the corresponding image of wrist location information Region is input to trace model, obtains the palm location information of palm object.

Fourth aspect, embodiment of the disclosure provide a kind of target tracker, which includes: the second acquisition list Member is configured to obtain the target video of current shooting and presentation；Second selection unit is configured to from the target video Selecting video frame is as first object video frame, and executes following tracking step: first object video frame is input in advance Trained joint orientation model obtains the wrist location information of wrist object in first object video frame, wherein the joint is fixed Bit model is used to determine the position of human joint points, and human joint points include wrist, and executes following tracking sub-step: from institute It states and chooses the subsequent video frame of first object video frame in target video as the second target video frame: in the second target video frame Middle determination image-region corresponding with the wrist location information of wrist object in first object video frame；By the second target video Image-region in frame is input to trace model trained in advance, obtains palm location information, wherein the trace model is used for Determine the position of palm object in inputted image-region；In response to the palm position of the palm object in the second target video frame It include palm object in confidence breath instruction image-region, the palm location information based on palm object in the image area determines Palm location information of the palm object in the second target video frame.

In some embodiments, device further include: the first execution unit is configured in response to the second target video frame Last frame in non-targeted video continues to execute tracking sub-step using the second target video frame as first object video frame.

In some embodiments, device further include: the second execution unit is configured in response to the second target video frame In palm object palm location information instruction image-region in do not include palm object, using the second target video frame as the One target video frame, continues to execute the tracking step.

5th aspect, embodiment of the disclosure provide a kind of electronic equipment, comprising: one or more processors；Storage Device is stored thereon with one or more programs, when said one or multiple programs are executed by said one or multiple processors, So that the one or more processors are realized such as target in the method or second aspect in above-mentioned first aspect for generating information The method of any embodiment in tracking.

6th aspect, embodiment of the disclosure provide a kind of target following computer-readable medium, are stored thereon with meter Calculation machine program realizes method or second aspect as being used to generate information in above-mentioned first aspect when the program is executed by processor The method of any embodiment in middle method for tracking target.

The method and apparatus for being used to generate information and method for tracking target and device that embodiment of the disclosure provides, By obtaining target video, then, the selecting video frame from target video determines the wrist of wrist object in video frame later Location information generates the palm position letter of the palm object in the subsequent video frame of video frame finally, being based on wrist location information Breath is based on wrist location information as a result, determines palm location information, enrich the method for determination of palm location information, facilitate The accuracy of palm position determined by improving.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the disclosure can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for generating information of the disclosure；

Fig. 3 A- Fig. 3 C is the schematic diagram according to an application scenarios of the method for generating information of the disclosure；

Fig. 4 is the flow chart according to one embodiment of the method for tracking target of the disclosure；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for generating information of the disclosure；

Fig. 6 is the structural schematic diagram according to one embodiment of the target tracker of the disclosure；

Fig. 7 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of embodiment of the disclosure.

Specific embodiment

The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can the method for generating information using embodiment of the disclosure or the dress for generating information It sets, alternatively, the exemplary system architecture 100 of the embodiment of method for tracking target or target tracker.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send data etc..Various client applications, such as video jukebox software, news can be installed on terminal device 101,102,103 Information class application, image processing class application, web browser applications, shopping class application, searching class application, instant messaging tools, Mailbox client, social platform software etc..

Terminal device 101,102,103 can be hardware, be also possible to software.As example, when terminal device 101, 102,103 be hardware when, can be with display screen and support the various electronic equipments of video playing, including but not limited to intelligence It can mobile phone, tablet computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, pocket computer on knee and desk-top meter Calculation machine etc..When terminal device 101,102,103 is software, may be mounted in above-mentioned cited electronic equipment.It can To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented At single software or software module.It is not specifically limited herein.

Server 105 can be to provide the server of various services, such as to the view that terminal device 101,102,103 is sent The background server that frequency is handled.Background server can carry out the video received the processing such as analyzing, and obtain processing knot Fruit (such as palm location information of the palm object in video frame).As an example, server 105 can be virtual server, It is also possible to physical server.

It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented At single software or software module.It is not specifically limited herein.

It should also be noted that, the method provided by embodiment of the disclosure for generating information can be held by server Row, can also be executed, can also be fitted to each other execution by server and terminal device by terminal device.Correspondingly, for generating The various pieces (such as each unit, subelement, module, submodule) that the device of information includes can all be set to server In, it can also all be set in terminal device, can also be respectively arranged in server and terminal device.In addition, the disclosure Embodiment provided by method for tracking target can be executed by server, can also be executed by terminal device, can also be by taking Business device and terminal device are fitted to each other execution.Various pieces that correspondingly, target tracker includes (such as each unit, son Unit, module, submodule) it can all be set in server, it can also all be set in terminal device, can also distinguish It is set in server and terminal device.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.For example, when the method for being used to generate information runs on it On electronic equipment when not needing to carry out data transmission with other electronic equipments, which can only include target following side The electronic equipment (such as server or terminal device) of method operation thereon.

With continued reference to Fig. 2, the process of one embodiment of the method for generating information according to the disclosure is shown 200.The method for being used to generate information, comprising the following steps:

Step 201, target video is obtained.

In the present embodiment, for generating executing subject (such as terminal device shown in FIG. 1 or the service of the method for information Device) target video can be obtained by wired connection mode or radio connection from other electronic equipments or local.

Wherein, target video can be any video.As an example, the target video can be to the hand of people (including Wrist and palm) shot obtained from video.It is appreciated that when target video is view obtained from shooting to hand When frequency, can wrap in all or part of video frame included by target video containing palm object and/or wrist object.At this In, palm object can be the image of the palm presented in video frame.Wrist object can be the wrist presented in video frame Image.

Step 202, the selecting video frame from target video.

In the present embodiment, above-mentioned executing subject can from the target video that step 201 is got selecting video frame.

As an example, above-mentioned executing subject can be from target video, randomly selecting video frame, can also regard from target In frequency, the video frame for meeting preset condition is chosen.Illustratively, above-mentioned preset condition may include: that selected video frame is First frame video frame in target video, alternatively, selected video frame is the view that target terminal is currently presented in target video Frequency frame.Wherein, when above-mentioned executing subject is terminal device, which can be above-mentioned executing subject；When above-mentioned execution When main body is server, which can be the terminal device communicated to connect with above-mentioned executing subject.

Step 203, the wrist location information of wrist object in video frame is determined.

In the present embodiment, above-mentioned executing subject can determine the hand of wrist object in video frame selected by step 202 Wrist location information.

Wherein, wrist location information can serve to indicate that the position of wrist object in the video frame.Above-mentioned wrist location letter Breath can by video frame include wrist object rectangle frame (such as in video frame include wrist object minimum circumscribed rectangle Frame), round or wrist object contour line characterization.It can also be characterized using coordinate.As an example, the coordinate can be video The central point of wrist object or the coordinate of center of mass point in frame are also possible to the seat of the rectangle frame in video frame comprising wrist object Mark.For example, above-mentioned coordinate can be " (x, y, w, h) ".Wherein, x characterizes a left side for the rectangle frame in video frame comprising wrist object Abscissa of the angle point of top under the coordinate system determined for the video frame, the upper left angle point that y characterizes the rectangle frame exist Ordinate under above-mentioned coordinate system, w characterize the width of the rectangle frame, and h characterizes the height of the rectangle frame.

Illustratively, the above-mentioned coordinate system determined for the video frame can be with the upper left corner in the video frame Pixel is origin, and two vertical edges with the video frame are respectively the coordinate system of x-axis and y-axis.

Optionally, above-mentioned wrist location information can also characterize " not including wrist object in video frame ".As an example, Under this scene, wrist location information can be " null ".

As an example, above-mentioned executing subject can execute above-mentioned steps 203 in the following way:

Video frame selected by step 202 is input to wrist location model trained in advance, is obtained selected by step 202 Video frame in wrist object wrist location information.Wherein, wrist location model is determined for the hand in video frame The wrist location information of wrist object.

As an example, above-mentioned wrist location model can be using deep learning algorithm, trained based on training sample set The deep neural network arrived.Wherein, the training sample that above-mentioned training sample is concentrated may include Sample video frame and Sample video The wrist location information of wrist object in frame.It is appreciated that the wrist location letter of the wrist object in above-mentioned Sample video frame Breath can be by mark personnel or the equipment with marking Function marks obtain in advance.

Optionally, above-mentioned wrist location model is also possible to obtained from a large amount of statistics of technical staff's process, association is deposited Contain the bivariate table or database of the wrist location information of the wrist object in video frame and video frame.

In some optional implementations of the present embodiment, above-mentioned executing subject can also execute in the following way Above-mentioned steps 203:

Video frame is input to joint orientation model trained in advance, obtains the wrist location letter of wrist object in video frame Breath.Wherein, joint orientation model is used to determine the position of human joint points, and human joint points include wrist.

Herein, joint orientation model can be used only as determine wrist position (rather than determine in addition to wrist The position of other artis).In this scenario, joint orientation model can be using machine learning algorithm, based on including video The set of the training sample of the wrist location information of the position of frame and wrist marking in advance, being used to indicate in video frame, instruction The deep learning model got.

Optionally, above-mentioned joint orientation model is also used as determining the position of multiple artis of human body.For example, above-mentioned Joint orientation model is determined for the position of following each artis: shoulder, elbow, wrist, hip, knee, ankle.In this scene Under, joint orientation model can be using machine learning algorithm, based on include video frame and mark in advance, be used to indicate video The set of the training sample of the hand joint position information of the position of above-mentioned each artis in frame, the deep learning that training obtains Model.

It is appreciated that above-mentioned joint orientation model can be human body attitude estimation model, such as: DensePose, OpenPose, Realtime Multi-Person Pose Estimation etc..

It should be appreciated that when joint orientation model is used as the position for determining multiple artis of human body, it is more due to human body Relative position between a artis has certain rule, thus, in this scenario, the wrist location information phase determined It is more acurrate for the wrist location information determined using other modes.

Step 204, it is based on wrist location information, generates the palm position of the palm object in the subsequent video frame of video frame Information.

In the present embodiment, above-mentioned executing subject can be based on wrist location information, generate the subsequent video frame of video frame In palm object palm location information.

Wherein, above-mentioned subsequent video frame can be in target video, (hereafter will with video frame selected in step 202 " selected video frame in step 202 " is known as REF video frame) video frame adjacent and after REF video frame, it can also To be in target video, with REF video frame period preset quantity (such as 5,1 etc.) a video frame and positioned at REF video frame Video frame later.

Above-mentioned palm location information can serve to indicate that the position of palm object in the video frame.Above-mentioned palm location information Can by video frame include palm object rectangle frame (such as in video frame include palm object minimum circumscribed rectangle Frame), round or palm object contour line characterization.It can also be characterized using coordinate.As an example, the coordinate can be video The central point of palm object or the coordinate of center of mass point in frame are also possible to the seat of the rectangle frame in video frame comprising palm object Mark.For example, above-mentioned coordinate can be " (x, y, w, h) ".Wherein, x characterizes a left side for the rectangle frame in video frame comprising palm object Abscissa of the angle point of top under the coordinate system determined for the video frame, the upper left angle point that y characterizes the rectangle frame exist Ordinate under above-mentioned coordinate system, w characterize the width of the rectangle frame, and h characterizes the height of the rectangle frame.

Optionally, above-mentioned palm location information can also characterize " not including palm object in video frame ".As an example, Under this scene, palm location information can be " null ".

In some optional implementations of the present embodiment, above-mentioned executing subject can in the following way, in execution State step 204:

Image-region in the subsequent video frame of video frame, corresponding with wrist location information is input to training in advance Trace model obtains palm location information.Wherein, trace model is for determining the position of palm object in inputted image-region It sets.

Herein, position of the image-region corresponding with wrist location information in subsequent video frame, can be with the hand Position of the image-region of wrist location information instruction in REF video frame is identical.It is appreciated that if wrist location information is " (100,100,100,100) ", the wrist location information can characterize the upper left angle point of the rectangle frame comprising wrist object Cross, ordinate under the coordinate system determined for the video frame are 100 pixels, the length and width of the rectangle frame of wrist object Degree is 100 pixels.So, in this scenario, image-region corresponding with the wrist location information can be REF video It is located at the image-region of (100,100,100,100) in the subsequent video frame of frame.

It is appreciated that since the size of the image-region of wrist location information instruction can be predetermined.For example, hand The size of the image-region of wrist location information instruction can be " 100 pixels × 100 pixels ", be also possible to based in video frame What the size of wrist object determined.It may include palm object in the image-region of above-mentioned wrist location information instruction, it can also be with Not comprising palm object.

It should be appreciated that since trace model is used to determine the wrist object information of the wrist object in image-region, The technical solution of this optional implementation can relative to the wrist object information for determining wrist object directly from video frame To reduce the operand of above-mentioned executing subject, the formation speed of wrist object information is improved.

In some optional implementations of the present embodiment, above-mentioned executing subject can also execute in the following way Above-mentioned steps 204:

The first step amplifies place in the subsequent video frame of video frame, the corresponding image-region of wrist location information Reason, obtains enlarged drawing region.

It should be noted that amplifying the amplified band of position obtained after processing, amplification front position may include The band of position of information instruction.As an example, the area of the amplified band of position, or including the quantity of pixel can be with It is 1.2 times, 1.5 times etc. of the area of the band of position of amplification front position information instruction.

Enlarged drawing region is input to trace model trained in advance, obtains palm location information by second step.Wherein, Trace model is for determining the position of palm object in inputted image-region.

As an example, above-mentioned trace model can be what training in the following way obtained:

Firstly, obtaining training sample set.Wherein, training sample includes that image-region and predetermined, palm object exist Location information in the image-region.

Then, using machine learning algorithm, the image-region for including using the training sample that training sample is concentrated is as initial The input data of model is right using location information corresponding with the image-region of input as the desired output data of initial model Initial model is trained.The initial model obtained after the completion of training is determined as the trace model that training obtains.

Herein, trained completion condition can be preset, to determine whether initial model is completed to train.Above-mentioned training is complete Can include but is not limited at condition at least one of following: frequency of training be more than preset times, training time be more than preset duration, It is less than preset threshold based on the functional value that predetermined loss function is calculated.

Above-mentioned initial model can be indiscipline, or the mould of above-mentioned trained completion condition is still unsatisfactory for by training Type (such as convolutional neural networks).

Optionally, above-mentioned trace model can also be that technical staff has by obtained from a large amount of statistics, associated storage The bivariate table or database of image-region and the location information predetermined, palm object is in the image-region.

Third step, in response to not including palm object in palm location information instruction enlarged drawing region, by subsequent video Frame is input to above-mentioned joint orientation model, obtains the wrist location information of wrist object.

In subsequent video frame, the corresponding image-region of wrist location information is input to trace model, obtained by the 4th step The palm location information of palm object.

It is appreciated that since joint orientation model is for determining the location information of wrist object in the video frame, and track Model is then used to determine location information of the palm object in the image-region that video frame includes.Thus, joint orientation model pair Yu Yici input data it is computationally intensive in trace model.Therefore, this optional implementation is put in the instruction of palm location information In the case where not including palm object in big image-region, it is based on joint orientation model, to obtain the wrist location of wrist object Information, and then determine palm location information；And the feelings including palm object in palm location information instruction enlarged drawing region Under condition, then palm location information can be directly obtained, both ensure that the standard positioned to the palm object in video frame as a result, True property, and improve the speed of positioning.

In some cases, when the palm location information instruction obtained based on trace model " does not include hand in image-region When palm object ", other image-regions in video frame in addition to the image-region may include palm object, in this scenario, Position of the wrist object in subsequent video frame can be obtained by the way that subsequent video frame is input to above-mentioned intra-articular irrigation model Information, and then obtain palm location information.As a result, this optional implementation relative to, " once obtained based on trace model Location information indicate image-region in do not include palm object, it is determined that in subsequent video frame do not include palm object " technology The accuracy of palm positioning can be improved in scheme.On the other hand, this optional implementation is relative to " defeated by every frame video frame Enter to for determining the intra-articular irrigation model of multiple artis " technical solution, it is possible to reduce the operation of above-mentioned executing subject Amount improves arithmetic speed.

In some optional implementations of the present embodiment, above-mentioned target video is the video of current shooting and presentation.

Herein, above-mentioned executing subject can be terminal device.It is appreciated that this optional implementation can be in real time Determine the position of the palm object in video that terminal device is currently presented.It is above-mentioned after the position for determining palm object Executing subject can render pre-set image in the target location of palm object, or add default special efficacy, enrich shadow as a result, The presentation mode of picture.Above-mentioned target position can be the position determined in advance for the position of palm.

It with continued reference to Fig. 3 A- Fig. 3 C, Fig. 3 A- Fig. 3 C is answered according to one of the first aim tracking of the present embodiment With the schematic diagram of scene.In figure 3 a, mobile phone obtains target video first and (such as is clapped in real time by the image acquiring device of mobile phone The video taken the photograph).Then, mobile phone selecting video frame 301 (such as mobile phone currently present video frame) from target video.Later, Fig. 3 B is please referred to, mobile phone defines (herein, the wrist location letter of wrist location information 302 of wrist object in video frame 301 Breath 302 is characterized by the rectangle frame in video frame 301 comprising wrist object).Finally, mobile phone is based on wrist location information 302, it is raw At 304 (herein, the palm location information of palm location information of the palm object in the subsequent video frame 303 of video frame 301 304 are characterized by the rectangle frame in video frame 303 comprising palm object).

Currently, the scheme that the existing palm object in video frame is positioned often uses machine learning algorithm training Obtained model determines the hand in the position of the palm object in REF video frame and the subsequent video frame of REF video frame Slap the position of object.Above scheme usually requires the position that palm object is determined based on entire video frame, positioning time it is long and Expend operand.Also, since the palm of human body is more flexible, movement is very fast, thus, it is each in the video including palm object There may be very big differences for the position of palm in a video frame.

Then the method provided by the above embodiment of the disclosure, chooses view by obtaining target video from target video Frequency frame determines the wrist location information of wrist object in video frame later, finally, being based on wrist location information, generates video frame Subsequent video frame in palm object palm location information, the position as a result, based on the lower wrist of flexibility ratio palm opposite Set, to determine the position of palm, can be based on the corresponding image-region of wrist location information rather than entire video frame positions view Palm object in frequency frame reduces and generates operand spent by palm location information, enriches palm location information really Determine mode, helps to improve the accuracy of identified palm position.

With further reference to Fig. 4, it illustrates the processes 400 of one embodiment of method for tracking target.The target following side The process 400 of method, comprising the following steps:

Step 401, the target video of current shooting and presentation is obtained.

In the present embodiment, the executing subject (such as terminal device shown in FIG. 1) of method for tracking target is available works as The target video of preceding shooting and presentation.

Later, above-mentioned executing subject can execute step 402.

Herein, above-mentioned executing subject can be the terminal device with video capture function.Above-mentioned target video as a result, It can be the video of above-mentioned executing subject current shooting.It, can be in real time during above-mentioned executing subject shoots video The video (i.e. target video) is presented.

As an example, the target video can be video obtained from shooting to hand (such as palm and wrist). It is appreciated that when the video obtained from target video is shot to hand, all or part included by target video It is can wrap in video frame containing palm object and wrist object.Herein, palm object can be the palm presented in video frame Image.Wrist object can be the image of the wrist presented in video frame.

Step 402, from target video selecting video frame as first object video frame.

In the present embodiment, above-mentioned executing subject can choose any video frame from target video and regard as first object Frequency frame.

Later, above-mentioned executing subject can execute tracking step.Wherein, tracking step includes step 403- step 408.

Step 403, first object video frame is input to joint orientation model trained in advance, obtains first object video The wrist location information of wrist object in frame.

In the present embodiment, first object video frame can be input to joint orientation trained in advance by above-mentioned executing subject Model obtains the wrist location information of wrist object in first object video frame.Wherein, joint orientation model is for determining human body The position of artis, human joint points include wrist.

Later, above-mentioned executing subject can execute tracking sub-step.Wherein, tracking sub-step includes step 404- step 408。

Step 404, the subsequent video frame of first object video frame is chosen from target video as the second target video frame.

In the present embodiment, above-mentioned executing subject can choose the subsequent video of first object video frame from target video Frame is as the second target video frame.

Later, above-mentioned executing subject can execute step 405.

Step 405, the determining wrist location with the wrist object in first object video frame in the second target video frame The corresponding image-region of information.

In the present embodiment, above-mentioned executing subject can be determined in the second target video frame with first object video frame in Wrist object the corresponding image-region of wrist location information.

Later, above-mentioned executing subject can execute step 406.

Step 406, the image-region in the second target video frame is input to trace model trained in advance, obtains palm Location information.

In the present embodiment, the image-region in the second target video frame can be input to preparatory instruction by above-mentioned executing subject Experienced trace model obtains palm location information.Wherein, trace model is for determining palm object in inputted image-region Position.

Later, above-mentioned executing subject can execute step 407.

Step 407, determine whether the palm location information of the palm object in the second target video frame indicates image-region In include palm object.

In the present embodiment, above-mentioned executing subject can determine the palm position of the palm object in the second target video frame Whether information indicates in image-region comprising palm object.

Later, if including palm in the palm location information instruction image-region of the palm object in the second target video frame Object, then above-mentioned executing subject can execute step 408.

In some optional implementations of the present embodiment, if the palm position of the palm object in the second target video frame Palm object is not included in confidence breath instruction image-region, above-mentioned executing subject can also be performed step 410 and " regard the second target Frequency frame is as first object video frame ", and execute step 403.

Herein, after executing step 410, above-mentioned executing subject can be using the second target video frame as new first Target video frame.It is appreciated that after executing step 410, before the first object video frame and execution step 410 in subsequent step The second target video frame refer to identical video frame.

Step 408, the palm location information based on palm object in the image area determines palm object in the second target Palm location information in video frame.

In the present embodiment, above-mentioned executing subject can based on the palm location information of palm object in the image area, Determine palm location information of the palm object in the second target video frame.

In some optional implementations of the present embodiment, after executing above-mentioned steps 408, above-mentioned executing subject is also Step 409 can be executed: determine whether the second target video frame is last frame in target video.Later, if the second target Last frame in the non-targeted video of video frame, then above-mentioned executing subject can execute step 410 and " make the second target video frame For first object video frame " and step 404.

It should be noted that the embodiment of the present application can also include reality corresponding with Fig. 2 in addition to documented content above The same or similar feature of example, and the identical beneficial effect of generation embodiment corresponding with Fig. 2 are applied, details are not described herein.

Figure 4, it is seen that compared with the corresponding embodiment of Fig. 2, the process of the method for tracking target in the present embodiment 400 carry out the positioning of palm object for each video frame in current shooting and the target video of presentation, thus, it is possible in real time Ground determines the position of the palm object in the video currently presented.

With further reference to Fig. 5, as the realization to method shown in above-mentioned Fig. 2, present disclose provides one kind for generating letter One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, except following documented special Sign is outer, which can also include feature identical or corresponding with embodiment of the method shown in Fig. 2, and generates and scheme Embodiment of the method shown in 2 is identical or corresponding effect.The device specifically can be applied in various electronic equipments.

As shown in figure 5, the target tracker 500 of the present embodiment includes: first acquisition unit 501, it is configured to obtain Target video；First selection unit 502, is configured to the selecting video frame from target video；Determination unit 503, is configured to Determine the wrist location information of wrist object in video frame；Generation unit 504 is configured to generate based on wrist location information The palm location information of palm object in the subsequent video frame of video frame.

It in the present embodiment, can be by wired connection side for generating the first acquisition unit 501 of the device 500 of information Formula or radio connection obtain target video from other electronic equipments or local.Wherein, target video can be arbitrarily Video.As an example, the target video can be video obtained from shooting to the palm of people.

In the present embodiment, the target video that above-mentioned first selection unit 502 can be got from first acquisition unit 501 Middle selecting video frame.

In the present embodiment, above-mentioned determination unit 503 can determine wrist in the video frame of the first selection unit 502 selection The wrist location information of object.Wherein, wrist location information can serve to indicate that the position of wrist object in the video frame.

In the present embodiment, above-mentioned generation unit 504 can be raw based on the wrist location information that determination unit 503 obtains At the palm location information of the palm object in the subsequent video frame of video frame.Above-mentioned palm location information can serve to indicate that hand Slap the position of object in the video frame.

In some optional implementations of the present embodiment, determination unit 503 may include: the first input module (figure In be not shown) be configured to for video frame being input in advance trained joint orientation model, obtain wrist object in video frame Wrist location information, wherein joint orientation model is used to determine the position of human joint points, and human joint points include wrist.

In some optional implementations of the present embodiment, generation unit 504 may include: the second input module (figure In be not shown) be configured in the subsequent video frame by video frame, corresponding with wrist location information image-region and be input to Trained trace model in advance, obtains palm location information, wherein trace model is for determining hand in inputted image-region Slap the position of object.

In some optional implementations of the present embodiment, generation unit 504 also may include: enhanced processing module (not shown) is configured in the subsequent video frame to video frame, the corresponding image-region of wrist location information and puts Big processing, obtains enlarged drawing region.Third input module (not shown) is configured to for being input in enlarged drawing region Trained trace model in advance, obtains palm location information, wherein trace model is for determining hand in inputted image-region Slap the position of object.4th input module (not shown) is configured in response to palm location information instruction enlarged drawing area Do not include palm object in domain, subsequent video frame is input to joint orientation model, obtains the wrist location information of wrist object. 5th input module (not shown) is configured in subsequent video frame, the corresponding image-region of wrist location information is defeated Enter to trace model, obtains the palm location information of palm object.

In some optional implementations of the present embodiment, target video is the video of current shooting and presentation.

The device provided by the above embodiment of the disclosure obtains target video by first acquisition unit 501, then, the The one selecting video frame from target video of selection unit 502, later, determination unit 503 determine the hand of wrist object in video frame Wrist location information generates the palm pair in the subsequent video frame of video frame finally, generation unit 504 is based on wrist location information The palm location information of elephant is based on wrist location information as a result, determines palm location information, enrich palm location information Method of determination helps to improve the accuracy of identified palm position.

Referring next to Fig. 6, as the realization to method shown in above-mentioned Fig. 4, present disclose provides a kind of target followings One embodiment of device, the Installation practice is corresponding with embodiment of the method shown in Fig. 4, except following documented feature Outside, which can also include feature identical or corresponding with embodiment of the method shown in Fig. 4, and generation and Fig. 4 Shown in embodiment of the method is identical or corresponding effect.The device specifically can be applied in various electronic equipments.

As shown in fig. 6, the target tracker 600 of the present embodiment includes: second acquisition unit 601, it is configured to obtain Current shooting and the target video of presentation；Second selection unit 602 is configured to from target video selecting video frame as One target video frame, and execute following tracking step: first object video frame is input to joint orientation mould trained in advance Type obtains the wrist location information of wrist object in first object video frame, wherein joint orientation model is for determining that human body closes The position of node, human joint points include wrist, and execute following tracking sub-step: first object is chosen from target video The subsequent video frame of video frame is as the second target video frame: in the second target video frame in determining and first object video frame Wrist object the corresponding image-region of wrist location information；Image-region in second target video frame is input in advance Trained trace model obtains palm location information, wherein trace model is for determining palm pair in inputted image-region The position of elephant；In response to including palm in the palm location information instruction image-region of the palm object in the second target video frame Object, the palm location information based on palm object in the image area determine palm object in the second target video frame Palm location information.

In the present embodiment, the available current shooting of second acquisition unit 601 of target tracker 600 and presentation Target video.

In the present embodiment, the target video that above-mentioned second selection unit 602 can be got from second acquisition unit 601 Middle selecting video frame is as first object video frame, and executes following tracking step: first object video frame being input to pre- First trained joint orientation model, obtains the wrist location information of wrist object in first object video frame, wherein joint orientation Model is used to determine the position of human joint points, and human joint points include wrist, and executes following tracking sub-step: from target The subsequent video frame of first object video frame is chosen in video as the second target video frame: being determined in the second target video frame Image-region corresponding with the wrist location information of wrist object in first object video frame；It will be in the second target video frame Image-region is input to trace model trained in advance, obtains palm location information, wherein trace model is inputted for determination Image-region in palm object position；In response to the palm location information instruction of the palm object in the second target video frame It include palm object in image-region, the palm location information based on palm object in the image area determines that palm object exists Palm location information in second target video frame.

In some optional implementations of the present embodiment, the device 600 further include: the first execution unit is (in figure not Show) be configured in response to the last frame in the second non-targeted video of target video frame, using the second target video frame as First object video frame continues to execute tracking sub-step.

In some optional implementations of the present embodiment, the device 600 further include: the second execution unit is (in figure not Show) it does not wrap in the palm location information instruction image-region of the palm object that is configured in response in the second target video frame Object containing palm continues to execute tracking step using the second target video frame as first object video frame.

The device provided by the above embodiment of the disclosure obtains current shooting and presentation by second acquisition unit 601 Target video, then, the second selection unit 602 selecting video frame from target video and are held as first object video frame The following tracking step of row: first object video frame is input to joint orientation model trained in advance, obtains first object video The wrist location information of wrist object in frame, wherein joint orientation model is used to determine the position of human joint points, human synovial Point includes wrist, and executes following tracking sub-step: the subsequent video frame of first object video frame is chosen from target video As the second target video frame: the determining wrist position with the wrist object in first object video frame in the second target video frame Confidence ceases corresponding image-region；Image-region in second target video frame is input to trace model trained in advance, is obtained To palm location information, wherein trace model is for determining the position of palm object in inputted image-region；In response to Include palm object in the palm location information instruction image-region of palm object in two target video frames, is based on palm object Palm location information in the image area determines palm location information of the palm object in the second target video frame, as a result, The position of the palm object in the video currently presented can be determined in real time.

Below with reference to Fig. 7, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1 Server or terminal device) 700 structural schematic diagram.Terminal device in embodiment of the disclosure can include but is not limited to all As mobile phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD (tablet computer), PMP are (portable Formula multimedia player), the mobile terminal and such as number TV, desk-top meter of car-mounted terminal (such as vehicle mounted guidance terminal) etc. The fixed terminal of calculation machine etc..Terminal device/server shown in Fig. 7 is only an example, should not be to the implementation of the disclosure The function and use scope of example bring any restrictions.

As shown in fig. 7, electronic equipment 700 may include processing unit (such as central processing unit, graphics processor etc.) 701, random access can be loaded into according to the program being stored in read-only memory (ROM) 702 or from storage device 708 Program in memory (RAM) 703 and execute various movements appropriate and processing.In RAM 703, it is also stored with electronic equipment Various programs and data needed for 700 operations.Processing unit 701, ROM 702 and RAM703 are connected with each other by bus 704. Input/output (I/O) interface 705 is also connected to bus 704.

In general, following device can connect to I/O interface 705: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 706 of head, microphone, accelerometer, gyroscope etc.；Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 707 of dynamic device etc.；Storage device 708 including such as tape, hard disk etc.；And communication device 709.Communication device 709, which can permit electronic equipment 700, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 7 shows tool There is the electronic equipment 700 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.Each box shown in Fig. 7 can represent a device, can also root According to needing to represent multiple devices.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 709, or from storage device 708 It is mounted, or is mounted from ROM 702.When the computer program is executed by processing unit 701, the implementation of the disclosure is executed The above-mentioned function of being limited in the method for example.

It is situated between it should be noted that computer-readable medium described in embodiment of the disclosure can be computer-readable signal Matter or computer readable storage medium either the two any combination.Computer readable storage medium for example can be with System, device or the device of --- but being not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or it is any more than Combination.The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires Electrical connection, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type are programmable Read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited Memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer readable storage medium, which can be, appoints What include or the tangible medium of storage program that the program can be commanded execution system, device or device use or and its It is used in combination.And in embodiment of the disclosure, computer-readable signal media may include in a base band or as carrier wave The data-signal that a part is propagated, wherein carrying computer-readable program code.The data-signal of this propagation can be adopted With diversified forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal is situated between Matter can also be any computer-readable medium other than computer readable storage medium, which can be with It sends, propagate or transmits for by the use of instruction execution system, device or device or program in connection.Meter The program code for including on calculation machine readable medium can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. or above-mentioned any appropriate combination.

Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment；It is also possible to individualism, and not It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more When a program is executed by the electronic equipment, so that the electronic equipment: obtaining target video；The selecting video frame from target video； Determine the wrist location information of wrist object in video frame；Based on wrist location information, generate in the subsequent video frame of video frame Palm object palm location information.Alternatively, making the electronic equipment: obtaining the target video of current shooting and presentation；From Selecting video frame is as first object video frame in target video, and executes following tracking step: by first object video frame It is input to joint orientation model trained in advance, obtains the wrist location information of wrist object in first object video frame, wherein Joint orientation model is used to determine the position of human joint points, and human joint points include wrist, and executes following tracking sub-step It is rapid: to choose the subsequent video frame of first object video frame from target video as the second target video frame: being regarded in the second target Image-region corresponding with the wrist location information of wrist object in first object video frame is determined in frequency frame；By the second target Image-region in video frame is input to trace model trained in advance, obtains palm location information, wherein trace model is used for Determine the position of palm object in inputted image-region；In response to the palm position of the palm object in the second target video frame It include palm object in confidence breath instruction image-region, the palm location information based on palm object in the image area determines Palm location information of the palm object in the second target video frame.

The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof The computer program code of work, described program design language include object oriented program language-such as Java, Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor Including first acquisition unit, the first selection unit, determination unit and generation unit.Wherein, the title of these units is in certain feelings The restriction to the unit itself is not constituted under condition, for example, first acquisition unit is also described as " obtaining target video Unit ".

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for generating information, comprising:

Obtain target video；

The selecting video frame from the target video；

Determine the wrist location information of wrist object in the video frame；

Based on the wrist location information, the palm position letter of the palm object in the subsequent video frame of the video frame is generated Breath.

2. according to the method described in claim 1, wherein, the wrist location of wrist object is believed in the determination video frame Breath, comprising:

The video frame is input to joint orientation model trained in advance, obtains the wrist position of wrist object in the video frame Confidence breath, wherein the joint orientation model is used to determine the position of human joint points, and human joint points include wrist.

3. it is described to be based on the wrist location information according to the method described in claim 1, wherein, generate the video frame The palm location information of palm object in subsequent video frame, comprising:

Image-region in the subsequent video frame of the video frame, corresponding with the wrist location information is input to preparatory instruction Experienced trace model obtains palm location information, wherein the trace model is for determining palm in inputted image-region The position of object.

4. it is described to be based on the wrist location information according to the method described in claim 2, wherein, generate the video frame The palm location information of palm object in subsequent video frame, comprising:

Processing is amplified to the corresponding image-region of in the subsequent video frame of the video frame, the wrist location information, Obtain enlarged drawing region；

The enlarged drawing region is input to trace model trained in advance, obtains palm location information, wherein the tracking Model is for determining the position of palm object in inputted image-region；

Indicate that in the enlarged drawing region do not include palm object in response to the palm location information, by the subsequent video Frame is input to the joint orientation model, obtains the wrist location information of wrist object；

The corresponding image-region of in the subsequent video frame, the wrist location information is input to the trace model, is obtained To the palm location information of palm object.

5. method described in one of -4 according to claim 1, wherein the target video is the video of current shooting and presentation.

6. a kind of method for tracking target, comprising:

Obtain the target video of current shooting and presentation；

Selecting video frame is as first object video frame from the target video, and executes following tracking step:

First object video frame is input to joint orientation model trained in advance, obtains wrist object in first object video frame Wrist location information, wherein the joint orientation model is used to determine the positions of human joint points, and human joint points include hand Wrist, and execute following tracking sub-step:

The subsequent video frame of first object video frame is chosen from the target video as the second target video frame:

Figure corresponding with the wrist location information of wrist object in first object video frame is determined in the second target video frame As region；

Image-region in second target video frame is input to trace model trained in advance, obtains palm location information, In, the trace model is for determining the position of palm object in inputted image-region；

In response to including palm object in the palm location information instruction image-region of the palm object in the second target video frame, Palm location information based on palm object in the image area determines palm position of the palm object in the second target video frame Confidence breath.

7. according to the method described in claim 6, wherein, the method also includes:

In response to the last frame in the non-target video of the second target video frame, using the second target video frame as the first mesh Video frame is marked, the tracking sub-step is continued to execute.

8. method according to claim 6 or 7, wherein the method also includes:

In response to not including palm pair in the palm location information instruction image-region of the palm object in the second target video frame As continuing to execute the tracking step using the second target video frame as first object video frame.

9. a kind of for generating the device of information, comprising:

First acquisition unit is configured to obtain target video；

First selection unit is configured to the selecting video frame from the target video；

Determination unit is configured to determine the wrist location information of wrist object in the video frame；

Generation unit is configured to generate the palm in the subsequent video frame of the video frame based on the wrist location information The palm location information of object.

10. a kind of target tracker, comprising:

Second acquisition unit is configured to obtain the target video of current shooting and presentation；

Second selection unit is configured to from the target video selecting video frame as first object video frame, and holds The following tracking step of row:

11. a kind of electronic equipment, comprising:

One or more processors；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method described in any one of claims 1-8.

12. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor Now such as method described in any one of claims 1-8.