CN110189364A - For generating the method and apparatus and method for tracking target and device of information - Google Patents
For generating the method and apparatus and method for tracking target and device of information Download PDFInfo
- Publication number
- CN110189364A CN110189364A CN201910480692.7A CN201910480692A CN110189364A CN 110189364 A CN110189364 A CN 110189364A CN 201910480692 A CN201910480692 A CN 201910480692A CN 110189364 A CN110189364 A CN 110189364A
- Authority
- CN
- China
- Prior art keywords
- video frame
- palm
- location information
- wrist
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Embodiment of the disclosure discloses method and apparatus for generating information and method for tracking target and device.One specific embodiment of the method for being used to generate information includes: acquisition target video;The selecting video frame from target video;Determine the wrist location information of wrist object in video frame;Based on wrist location information, the palm location information of the palm object in the subsequent video frame of video frame is generated.The embodiment can be based on wrist location information, determine palm location information, enrich the method for determination of palm location information, help to improve the accuracy of identified palm position.
Description
Technical field
Embodiment of the disclosure is related to field of computer technology, and in particular to the method and apparatus for generating information, with
And method for tracking target and device.
Background technique
Target following is the technology that the subject in a kind of pair of video is positioned.Target following is at image/video
The popular problem for receiving significant attention and studying in reason field.The technology is to initialize first frame video frame in video, wherein
The target that needs track is determined in first frame video frame, in succeeding target video, it is thus necessary to determine that go out target to be tracked each
Position in a video frame.
Existing target tracking algorism is broadly divided into following two class:
Production (generative) model: establishing object module by on-line study mode, then uses pattern search
The smallest image-region of reconstruction error completes target positioning.
Discriminate (discrimination) model: regard target following as a binary classification problems, extract simultaneously
Target and background information is used to train classifier, target is separated from image sequence background, to obtain present frame
Target position.
Summary of the invention
The present disclosure proposes the method and apparatus and method for tracking target and device for generating information.
In a first aspect, embodiment of the disclosure provides a kind of method for generating information, this method comprises: obtaining mesh
Mark video;The selecting video frame from target video;Determine the wrist location information of wrist object in video frame;Based on wrist location
Information generates the palm location information of the palm object in the subsequent video frame of video frame.
In some embodiments, the wrist location information of wrist object in video frame is determined, comprising: be input to video frame
Trained joint orientation model in advance, obtains the wrist location information of wrist object in video frame, wherein joint orientation model is used
In the position for determining human joint points, human joint points include wrist.
In some embodiments, it is based on wrist location information, generates the palm object in the subsequent video frame of video frame
Palm location information, comprising: be input to image-region in the subsequent video frame of video frame, corresponding with wrist location information
Trained trace model in advance, obtains palm location information, wherein trace model is for determining hand in inputted image-region
Slap the position of object.
In some embodiments, it is based on wrist location information, generates the palm object in the subsequent video frame of video frame
Palm location information, comprising: in the subsequent video frame of video frame, the corresponding image-region of wrist location information is amplified
Processing, obtains enlarged drawing region;Enlarged drawing region is input to trace model trained in advance, obtains palm position letter
Breath, wherein trace model is for determining the position of palm object in inputted image-region;Refer in response to palm location information
Show do not include palm object in enlarged drawing region, subsequent video frame is input to joint orientation model, obtains wrist object
Wrist location information;In subsequent video frame, the corresponding image-region of wrist location information is input to trace model, is obtained in one's hands
Slap the palm location information of object.
In some embodiments, target video is the video of current shooting and presentation.
Second aspect, embodiment of the disclosure provide a kind of method for tracking target, this method comprises: obtaining current shooting
And the target video presented;Selecting video frame is as first object video frame from the target video, and execute as follows with
Track step: first object video frame is input to joint orientation model trained in advance, obtains wrist in first object video frame
The wrist location information of object, wherein the joint orientation model is used to determine the position of human joint points, human joint points packet
Wrist is included, and executes following tracking sub-step: choosing the subsequent video frame of first object video frame from the target video
As the second target video frame: the determining wrist position with the wrist object in first object video frame in the second target video frame
Confidence ceases corresponding image-region;Image-region in second target video frame is input to trace model trained in advance, is obtained
To palm location information, wherein the trace model is for determining the position of palm object in inputted image-region;Response
Include palm object in the palm location information instruction image-region of palm object in the second target video frame, is based on palm
The palm location information of object in the image area determines palm location information of the palm object in the second target video frame.
In some embodiments, this method further include: in response to last in the second non-targeted video of target video frame
Frame continues to execute tracking sub-step using the second target video frame as first object video frame.
In some embodiments, this method further include: in response to the palm position of the palm object in the second target video frame
Do not include palm object in confidence breath instruction image-region to continue to hold using the second target video frame as first object video frame
The row tracking step.
The third aspect, embodiment of the disclosure provide a kind of for generating the device of information, which includes: first to obtain
Unit is taken, is configured to obtain target video;First selection unit is configured to the selecting video frame from target video;It determines
Unit is configured to determine the wrist location information of wrist object in video frame;Generation unit is configured to based on wrist location
Information generates the palm location information of the palm object in the subsequent video frame of video frame.
In some embodiments, determination unit includes: the first input module, is configured to for video frame to be input to preparatory instruction
Experienced joint orientation model obtains the wrist location information of wrist object in video frame, wherein joint orientation model is for determining
The position of human joint points, human joint points include wrist.
In some embodiments, generation unit includes: the second input module, is configured to the subsequent video frame of video frame
In, corresponding with wrist location information image-region be input to trace model trained in advance, obtain palm location information,
In, trace model is for determining the position of palm object in inputted image-region.
In some embodiments, generation unit includes: enhanced processing module, is configured to the subsequent video frame to video frame
In, the corresponding image-region of wrist location information amplify processing, obtain enlarged drawing region;Third input module, quilt
It is configured to for being input in enlarged drawing region trace model trained in advance, obtains palm location information, wherein trace model is used
In the position for determining palm object in inputted image-region;4th input module is configured in response to palm position letter
Breath instruction does not include palm object in enlarged drawing region, and subsequent video frame is input to joint orientation model, obtains wrist pair
The wrist location information of elephant;5th input module is configured in subsequent video frame, the corresponding image of wrist location information
Region is input to trace model, obtains the palm location information of palm object.
In some embodiments, target video is the video of current shooting and presentation.
Fourth aspect, embodiment of the disclosure provide a kind of target tracker, which includes: the second acquisition list
Member is configured to obtain the target video of current shooting and presentation;Second selection unit is configured to from the target video
Selecting video frame is as first object video frame, and executes following tracking step: first object video frame is input in advance
Trained joint orientation model obtains the wrist location information of wrist object in first object video frame, wherein the joint is fixed
Bit model is used to determine the position of human joint points, and human joint points include wrist, and executes following tracking sub-step: from institute
It states and chooses the subsequent video frame of first object video frame in target video as the second target video frame: in the second target video frame
Middle determination image-region corresponding with the wrist location information of wrist object in first object video frame;By the second target video
Image-region in frame is input to trace model trained in advance, obtains palm location information, wherein the trace model is used for
Determine the position of palm object in inputted image-region;In response to the palm position of the palm object in the second target video frame
It include palm object in confidence breath instruction image-region, the palm location information based on palm object in the image area determines
Palm location information of the palm object in the second target video frame.
In some embodiments, device further include: the first execution unit is configured in response to the second target video frame
Last frame in non-targeted video continues to execute tracking sub-step using the second target video frame as first object video frame.
In some embodiments, device further include: the second execution unit is configured in response to the second target video frame
In palm object palm location information instruction image-region in do not include palm object, using the second target video frame as the
One target video frame, continues to execute the tracking step.
5th aspect, embodiment of the disclosure provide a kind of electronic equipment, comprising: one or more processors;Storage
Device is stored thereon with one or more programs, when said one or multiple programs are executed by said one or multiple processors,
So that the one or more processors are realized such as target in the method or second aspect in above-mentioned first aspect for generating information
The method of any embodiment in tracking.
6th aspect, embodiment of the disclosure provide a kind of target following computer-readable medium, are stored thereon with meter
Calculation machine program realizes method or second aspect as being used to generate information in above-mentioned first aspect when the program is executed by processor
The method of any embodiment in middle method for tracking target.
The method and apparatus for being used to generate information and method for tracking target and device that embodiment of the disclosure provides,
By obtaining target video, then, the selecting video frame from target video determines the wrist of wrist object in video frame later
Location information generates the palm position letter of the palm object in the subsequent video frame of video frame finally, being based on wrist location information
Breath is based on wrist location information as a result, determines palm location information, enrich the method for determination of palm location information, facilitate
The accuracy of palm position determined by improving.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that one embodiment of the disclosure can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the method for generating information of the disclosure;
Fig. 3 A- Fig. 3 C is the schematic diagram according to an application scenarios of the method for generating information of the disclosure;
Fig. 4 is the flow chart according to one embodiment of the method for tracking target of the disclosure;
Fig. 5 is the structural schematic diagram according to one embodiment of the device for generating information of the disclosure;
Fig. 6 is the structural schematic diagram according to one embodiment of the target tracker of the disclosure;
Fig. 7 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of embodiment of the disclosure.
Specific embodiment
The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase
Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can the method for generating information using embodiment of the disclosure or the dress for generating information
It sets, alternatively, the exemplary system architecture 100 of the embodiment of method for tracking target or target tracker.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send data etc..Various client applications, such as video jukebox software, news can be installed on terminal device 101,102,103
Information class application, image processing class application, web browser applications, shopping class application, searching class application, instant messaging tools,
Mailbox client, social platform software etc..
Terminal device 101,102,103 can be hardware, be also possible to software.As example, when terminal device 101,
102,103 be hardware when, can be with display screen and support the various electronic equipments of video playing, including but not limited to intelligence
It can mobile phone, tablet computer, E-book reader, MP3 player (Moving Picture Experts Group Audio
Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group
Audio Layer IV, dynamic image expert's compression standard audio level 4) player, pocket computer on knee and desk-top meter
Calculation machine etc..When terminal device 101,102,103 is software, may be mounted in above-mentioned cited electronic equipment.It can
To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented
At single software or software module.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as to the view that terminal device 101,102,103 is sent
The background server that frequency is handled.Background server can carry out the video received the processing such as analyzing, and obtain processing knot
Fruit (such as palm location information of the palm object in video frame).As an example, server 105 can be virtual server,
It is also possible to physical server.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented
At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software
To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented
At single software or software module.It is not specifically limited herein.
It should also be noted that, the method provided by embodiment of the disclosure for generating information can be held by server
Row, can also be executed, can also be fitted to each other execution by server and terminal device by terminal device.Correspondingly, for generating
The various pieces (such as each unit, subelement, module, submodule) that the device of information includes can all be set to server
In, it can also all be set in terminal device, can also be respectively arranged in server and terminal device.In addition, the disclosure
Embodiment provided by method for tracking target can be executed by server, can also be executed by terminal device, can also be by taking
Business device and terminal device are fitted to each other execution.Various pieces that correspondingly, target tracker includes (such as each unit, son
Unit, module, submodule) it can all be set in server, it can also all be set in terminal device, can also distinguish
It is set in server and terminal device.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.For example, when the method for being used to generate information runs on it
On electronic equipment when not needing to carry out data transmission with other electronic equipments, which can only include target following side
The electronic equipment (such as server or terminal device) of method operation thereon.
With continued reference to Fig. 2, the process of one embodiment of the method for generating information according to the disclosure is shown
200.The method for being used to generate information, comprising the following steps:
Step 201, target video is obtained.
In the present embodiment, for generating executing subject (such as terminal device shown in FIG. 1 or the service of the method for information
Device) target video can be obtained by wired connection mode or radio connection from other electronic equipments or local.
Wherein, target video can be any video.As an example, the target video can be to the hand of people (including
Wrist and palm) shot obtained from video.It is appreciated that when target video is view obtained from shooting to hand
When frequency, can wrap in all or part of video frame included by target video containing palm object and/or wrist object.At this
In, palm object can be the image of the palm presented in video frame.Wrist object can be the wrist presented in video frame
Image.
Step 202, the selecting video frame from target video.
In the present embodiment, above-mentioned executing subject can from the target video that step 201 is got selecting video frame.
As an example, above-mentioned executing subject can be from target video, randomly selecting video frame, can also regard from target
In frequency, the video frame for meeting preset condition is chosen.Illustratively, above-mentioned preset condition may include: that selected video frame is
First frame video frame in target video, alternatively, selected video frame is the view that target terminal is currently presented in target video
Frequency frame.Wherein, when above-mentioned executing subject is terminal device, which can be above-mentioned executing subject;When above-mentioned execution
When main body is server, which can be the terminal device communicated to connect with above-mentioned executing subject.
Step 203, the wrist location information of wrist object in video frame is determined.
In the present embodiment, above-mentioned executing subject can determine the hand of wrist object in video frame selected by step 202
Wrist location information.
Wherein, wrist location information can serve to indicate that the position of wrist object in the video frame.Above-mentioned wrist location letter
Breath can by video frame include wrist object rectangle frame (such as in video frame include wrist object minimum circumscribed rectangle
Frame), round or wrist object contour line characterization.It can also be characterized using coordinate.As an example, the coordinate can be video
The central point of wrist object or the coordinate of center of mass point in frame are also possible to the seat of the rectangle frame in video frame comprising wrist object
Mark.For example, above-mentioned coordinate can be " (x, y, w, h) ".Wherein, x characterizes a left side for the rectangle frame in video frame comprising wrist object
Abscissa of the angle point of top under the coordinate system determined for the video frame, the upper left angle point that y characterizes the rectangle frame exist
Ordinate under above-mentioned coordinate system, w characterize the width of the rectangle frame, and h characterizes the height of the rectangle frame.
Illustratively, the above-mentioned coordinate system determined for the video frame can be with the upper left corner in the video frame
Pixel is origin, and two vertical edges with the video frame are respectively the coordinate system of x-axis and y-axis.
Optionally, above-mentioned wrist location information can also characterize " not including wrist object in video frame ".As an example,
Under this scene, wrist location information can be " null ".
As an example, above-mentioned executing subject can execute above-mentioned steps 203 in the following way:
Video frame selected by step 202 is input to wrist location model trained in advance, is obtained selected by step 202
Video frame in wrist object wrist location information.Wherein, wrist location model is determined for the hand in video frame
The wrist location information of wrist object.
As an example, above-mentioned wrist location model can be using deep learning algorithm, trained based on training sample set
The deep neural network arrived.Wherein, the training sample that above-mentioned training sample is concentrated may include Sample video frame and Sample video
The wrist location information of wrist object in frame.It is appreciated that the wrist location letter of the wrist object in above-mentioned Sample video frame
Breath can be by mark personnel or the equipment with marking Function marks obtain in advance.
Optionally, above-mentioned wrist location model is also possible to obtained from a large amount of statistics of technical staff's process, association is deposited
Contain the bivariate table or database of the wrist location information of the wrist object in video frame and video frame.
In some optional implementations of the present embodiment, above-mentioned executing subject can also execute in the following way
Above-mentioned steps 203:
Video frame is input to joint orientation model trained in advance, obtains the wrist location letter of wrist object in video frame
Breath.Wherein, joint orientation model is used to determine the position of human joint points, and human joint points include wrist.
Herein, joint orientation model can be used only as determine wrist position (rather than determine in addition to wrist
The position of other artis).In this scenario, joint orientation model can be using machine learning algorithm, based on including video
The set of the training sample of the wrist location information of the position of frame and wrist marking in advance, being used to indicate in video frame, instruction
The deep learning model got.
Optionally, above-mentioned joint orientation model is also used as determining the position of multiple artis of human body.For example, above-mentioned
Joint orientation model is determined for the position of following each artis: shoulder, elbow, wrist, hip, knee, ankle.In this scene
Under, joint orientation model can be using machine learning algorithm, based on include video frame and mark in advance, be used to indicate video
The set of the training sample of the hand joint position information of the position of above-mentioned each artis in frame, the deep learning that training obtains
Model.
It is appreciated that above-mentioned joint orientation model can be human body attitude estimation model, such as: DensePose,
OpenPose, Realtime Multi-Person Pose Estimation etc..
It should be appreciated that when joint orientation model is used as the position for determining multiple artis of human body, it is more due to human body
Relative position between a artis has certain rule, thus, in this scenario, the wrist location information phase determined
It is more acurrate for the wrist location information determined using other modes.
Step 204, it is based on wrist location information, generates the palm position of the palm object in the subsequent video frame of video frame
Information.
In the present embodiment, above-mentioned executing subject can be based on wrist location information, generate the subsequent video frame of video frame
In palm object palm location information.
Wherein, above-mentioned subsequent video frame can be in target video, (hereafter will with video frame selected in step 202
" selected video frame in step 202 " is known as REF video frame) video frame adjacent and after REF video frame, it can also
To be in target video, with REF video frame period preset quantity (such as 5,1 etc.) a video frame and positioned at REF video frame
Video frame later.
Above-mentioned palm location information can serve to indicate that the position of palm object in the video frame.Above-mentioned palm location information
Can by video frame include palm object rectangle frame (such as in video frame include palm object minimum circumscribed rectangle
Frame), round or palm object contour line characterization.It can also be characterized using coordinate.As an example, the coordinate can be video
The central point of palm object or the coordinate of center of mass point in frame are also possible to the seat of the rectangle frame in video frame comprising palm object
Mark.For example, above-mentioned coordinate can be " (x, y, w, h) ".Wherein, x characterizes a left side for the rectangle frame in video frame comprising palm object
Abscissa of the angle point of top under the coordinate system determined for the video frame, the upper left angle point that y characterizes the rectangle frame exist
Ordinate under above-mentioned coordinate system, w characterize the width of the rectangle frame, and h characterizes the height of the rectangle frame.
Illustratively, the above-mentioned coordinate system determined for the video frame can be with the upper left corner in the video frame
Pixel is origin, and two vertical edges with the video frame are respectively the coordinate system of x-axis and y-axis.
Optionally, above-mentioned palm location information can also characterize " not including palm object in video frame ".As an example,
Under this scene, palm location information can be " null ".
In some optional implementations of the present embodiment, above-mentioned executing subject can in the following way, in execution
State step 204:
Image-region in the subsequent video frame of video frame, corresponding with wrist location information is input to training in advance
Trace model obtains palm location information.Wherein, trace model is for determining the position of palm object in inputted image-region
It sets.
Herein, position of the image-region corresponding with wrist location information in subsequent video frame, can be with the hand
Position of the image-region of wrist location information instruction in REF video frame is identical.It is appreciated that if wrist location information is
" (100,100,100,100) ", the wrist location information can characterize the upper left angle point of the rectangle frame comprising wrist object
Cross, ordinate under the coordinate system determined for the video frame are 100 pixels, the length and width of the rectangle frame of wrist object
Degree is 100 pixels.So, in this scenario, image-region corresponding with the wrist location information can be REF video
It is located at the image-region of (100,100,100,100) in the subsequent video frame of frame.
It is appreciated that since the size of the image-region of wrist location information instruction can be predetermined.For example, hand
The size of the image-region of wrist location information instruction can be " 100 pixels × 100 pixels ", be also possible to based in video frame
What the size of wrist object determined.It may include palm object in the image-region of above-mentioned wrist location information instruction, it can also be with
Not comprising palm object.
It should be appreciated that since trace model is used to determine the wrist object information of the wrist object in image-region,
The technical solution of this optional implementation can relative to the wrist object information for determining wrist object directly from video frame
To reduce the operand of above-mentioned executing subject, the formation speed of wrist object information is improved.
In some optional implementations of the present embodiment, above-mentioned executing subject can also execute in the following way
Above-mentioned steps 204:
The first step amplifies place in the subsequent video frame of video frame, the corresponding image-region of wrist location information
Reason, obtains enlarged drawing region.
Herein, position of the image-region corresponding with wrist location information in subsequent video frame, can be with the hand
Position of the image-region of wrist location information instruction in REF video frame is identical.It is appreciated that if wrist location information is
" (100,100,100,100) ", the wrist location information can characterize the upper left angle point of the rectangle frame comprising wrist object
Cross, ordinate under the coordinate system determined for the video frame are 100 pixels, the length and width of the rectangle frame of wrist object
Degree is 100 pixels.So, in this scenario, image-region corresponding with the wrist location information can be REF video
It is located at the image-region of (100,100,100,100) in the subsequent video frame of frame.
It should be noted that amplifying the amplified band of position obtained after processing, amplification front position may include
The band of position of information instruction.As an example, the area of the amplified band of position, or including the quantity of pixel can be with
It is 1.2 times, 1.5 times etc. of the area of the band of position of amplification front position information instruction.
Enlarged drawing region is input to trace model trained in advance, obtains palm location information by second step.Wherein,
Trace model is for determining the position of palm object in inputted image-region.
As an example, above-mentioned trace model can be what training in the following way obtained:
Firstly, obtaining training sample set.Wherein, training sample includes that image-region and predetermined, palm object exist
Location information in the image-region.
Then, using machine learning algorithm, the image-region for including using the training sample that training sample is concentrated is as initial
The input data of model is right using location information corresponding with the image-region of input as the desired output data of initial model
Initial model is trained.The initial model obtained after the completion of training is determined as the trace model that training obtains.
Herein, trained completion condition can be preset, to determine whether initial model is completed to train.Above-mentioned training is complete
Can include but is not limited at condition at least one of following: frequency of training be more than preset times, training time be more than preset duration,
It is less than preset threshold based on the functional value that predetermined loss function is calculated.
Above-mentioned initial model can be indiscipline, or the mould of above-mentioned trained completion condition is still unsatisfactory for by training
Type (such as convolutional neural networks).
Optionally, above-mentioned trace model can also be that technical staff has by obtained from a large amount of statistics, associated storage
The bivariate table or database of image-region and the location information predetermined, palm object is in the image-region.
Third step, in response to not including palm object in palm location information instruction enlarged drawing region, by subsequent video
Frame is input to above-mentioned joint orientation model, obtains the wrist location information of wrist object.
In subsequent video frame, the corresponding image-region of wrist location information is input to trace model, obtained by the 4th step
The palm location information of palm object.
It is appreciated that since joint orientation model is for determining the location information of wrist object in the video frame, and track
Model is then used to determine location information of the palm object in the image-region that video frame includes.Thus, joint orientation model pair
Yu Yici input data it is computationally intensive in trace model.Therefore, this optional implementation is put in the instruction of palm location information
In the case where not including palm object in big image-region, it is based on joint orientation model, to obtain the wrist location of wrist object
Information, and then determine palm location information;And the feelings including palm object in palm location information instruction enlarged drawing region
Under condition, then palm location information can be directly obtained, both ensure that the standard positioned to the palm object in video frame as a result,
True property, and improve the speed of positioning.
In some cases, when the palm location information instruction obtained based on trace model " does not include hand in image-region
When palm object ", other image-regions in video frame in addition to the image-region may include palm object, in this scenario,
Position of the wrist object in subsequent video frame can be obtained by the way that subsequent video frame is input to above-mentioned intra-articular irrigation model
Information, and then obtain palm location information.As a result, this optional implementation relative to, " once obtained based on trace model
Location information indicate image-region in do not include palm object, it is determined that in subsequent video frame do not include palm object " technology
The accuracy of palm positioning can be improved in scheme.On the other hand, this optional implementation is relative to " defeated by every frame video frame
Enter to for determining the intra-articular irrigation model of multiple artis " technical solution, it is possible to reduce the operation of above-mentioned executing subject
Amount improves arithmetic speed.
In some optional implementations of the present embodiment, above-mentioned target video is the video of current shooting and presentation.
Herein, above-mentioned executing subject can be terminal device.It is appreciated that this optional implementation can be in real time
Determine the position of the palm object in video that terminal device is currently presented.It is above-mentioned after the position for determining palm object
Executing subject can render pre-set image in the target location of palm object, or add default special efficacy, enrich shadow as a result,
The presentation mode of picture.Above-mentioned target position can be the position determined in advance for the position of palm.
It with continued reference to Fig. 3 A- Fig. 3 C, Fig. 3 A- Fig. 3 C is answered according to one of the first aim tracking of the present embodiment
With the schematic diagram of scene.In figure 3 a, mobile phone obtains target video first and (such as is clapped in real time by the image acquiring device of mobile phone
The video taken the photograph).Then, mobile phone selecting video frame 301 (such as mobile phone currently present video frame) from target video.Later,
Fig. 3 B is please referred to, mobile phone defines (herein, the wrist location letter of wrist location information 302 of wrist object in video frame 301
Breath 302 is characterized by the rectangle frame in video frame 301 comprising wrist object).Finally, mobile phone is based on wrist location information 302, it is raw
At 304 (herein, the palm location information of palm location information of the palm object in the subsequent video frame 303 of video frame 301
304 are characterized by the rectangle frame in video frame 303 comprising palm object).
Currently, the scheme that the existing palm object in video frame is positioned often uses machine learning algorithm training
Obtained model determines the hand in the position of the palm object in REF video frame and the subsequent video frame of REF video frame
Slap the position of object.Above scheme usually requires the position that palm object is determined based on entire video frame, positioning time it is long and
Expend operand.Also, since the palm of human body is more flexible, movement is very fast, thus, it is each in the video including palm object
There may be very big differences for the position of palm in a video frame.
Then the method provided by the above embodiment of the disclosure, chooses view by obtaining target video from target video
Frequency frame determines the wrist location information of wrist object in video frame later, finally, being based on wrist location information, generates video frame
Subsequent video frame in palm object palm location information, the position as a result, based on the lower wrist of flexibility ratio palm opposite
Set, to determine the position of palm, can be based on the corresponding image-region of wrist location information rather than entire video frame positions view
Palm object in frequency frame reduces and generates operand spent by palm location information, enriches palm location information really
Determine mode, helps to improve the accuracy of identified palm position.
With further reference to Fig. 4, it illustrates the processes 400 of one embodiment of method for tracking target.The target following side
The process 400 of method, comprising the following steps:
Step 401, the target video of current shooting and presentation is obtained.
In the present embodiment, the executing subject (such as terminal device shown in FIG. 1) of method for tracking target is available works as
The target video of preceding shooting and presentation.
Later, above-mentioned executing subject can execute step 402.
Herein, above-mentioned executing subject can be the terminal device with video capture function.Above-mentioned target video as a result,
It can be the video of above-mentioned executing subject current shooting.It, can be in real time during above-mentioned executing subject shoots video
The video (i.e. target video) is presented.
As an example, the target video can be video obtained from shooting to hand (such as palm and wrist).
It is appreciated that when the video obtained from target video is shot to hand, all or part included by target video
It is can wrap in video frame containing palm object and wrist object.Herein, palm object can be the palm presented in video frame
Image.Wrist object can be the image of the wrist presented in video frame.
Step 402, from target video selecting video frame as first object video frame.
In the present embodiment, above-mentioned executing subject can choose any video frame from target video and regard as first object
Frequency frame.
Later, above-mentioned executing subject can execute tracking step.Wherein, tracking step includes step 403- step 408.
Step 403, first object video frame is input to joint orientation model trained in advance, obtains first object video
The wrist location information of wrist object in frame.
In the present embodiment, first object video frame can be input to joint orientation trained in advance by above-mentioned executing subject
Model obtains the wrist location information of wrist object in first object video frame.Wherein, joint orientation model is for determining human body
The position of artis, human joint points include wrist.
Later, above-mentioned executing subject can execute tracking sub-step.Wherein, tracking sub-step includes step 404- step
408。
Step 404, the subsequent video frame of first object video frame is chosen from target video as the second target video frame.
In the present embodiment, above-mentioned executing subject can choose the subsequent video of first object video frame from target video
Frame is as the second target video frame.
Later, above-mentioned executing subject can execute step 405.
Step 405, the determining wrist location with the wrist object in first object video frame in the second target video frame
The corresponding image-region of information.
In the present embodiment, above-mentioned executing subject can be determined in the second target video frame with first object video frame in
Wrist object the corresponding image-region of wrist location information.
Later, above-mentioned executing subject can execute step 406.
Step 406, the image-region in the second target video frame is input to trace model trained in advance, obtains palm
Location information.
In the present embodiment, the image-region in the second target video frame can be input to preparatory instruction by above-mentioned executing subject
Experienced trace model obtains palm location information.Wherein, trace model is for determining palm object in inputted image-region
Position.
Later, above-mentioned executing subject can execute step 407.
Step 407, determine whether the palm location information of the palm object in the second target video frame indicates image-region
In include palm object.
In the present embodiment, above-mentioned executing subject can determine the palm position of the palm object in the second target video frame
Whether information indicates in image-region comprising palm object.
Later, if including palm in the palm location information instruction image-region of the palm object in the second target video frame
Object, then above-mentioned executing subject can execute step 408.
In some optional implementations of the present embodiment, if the palm position of the palm object in the second target video frame
Palm object is not included in confidence breath instruction image-region, above-mentioned executing subject can also be performed step 410 and " regard the second target
Frequency frame is as first object video frame ", and execute step 403.
Herein, after executing step 410, above-mentioned executing subject can be using the second target video frame as new first
Target video frame.It is appreciated that after executing step 410, before the first object video frame and execution step 410 in subsequent step
The second target video frame refer to identical video frame.
Step 408, the palm location information based on palm object in the image area determines palm object in the second target
Palm location information in video frame.
In the present embodiment, above-mentioned executing subject can based on the palm location information of palm object in the image area,
Determine palm location information of the palm object in the second target video frame.
In some optional implementations of the present embodiment, after executing above-mentioned steps 408, above-mentioned executing subject is also
Step 409 can be executed: determine whether the second target video frame is last frame in target video.Later, if the second target
Last frame in the non-targeted video of video frame, then above-mentioned executing subject can execute step 410 and " make the second target video frame
For first object video frame " and step 404.
Herein, after executing step 410, above-mentioned executing subject can be using the second target video frame as new first
Target video frame.It is appreciated that after executing step 410, before the first object video frame and execution step 410 in subsequent step
The second target video frame refer to identical video frame.
It should be noted that the embodiment of the present application can also include reality corresponding with Fig. 2 in addition to documented content above
The same or similar feature of example, and the identical beneficial effect of generation embodiment corresponding with Fig. 2 are applied, details are not described herein.
Figure 4, it is seen that compared with the corresponding embodiment of Fig. 2, the process of the method for tracking target in the present embodiment
400 carry out the positioning of palm object for each video frame in current shooting and the target video of presentation, thus, it is possible in real time
Ground determines the position of the palm object in the video currently presented.
With further reference to Fig. 5, as the realization to method shown in above-mentioned Fig. 2, present disclose provides one kind for generating letter
One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, except following documented special
Sign is outer, which can also include feature identical or corresponding with embodiment of the method shown in Fig. 2, and generates and scheme
Embodiment of the method shown in 2 is identical or corresponding effect.The device specifically can be applied in various electronic equipments.
As shown in figure 5, the target tracker 500 of the present embodiment includes: first acquisition unit 501, it is configured to obtain
Target video;First selection unit 502, is configured to the selecting video frame from target video;Determination unit 503, is configured to
Determine the wrist location information of wrist object in video frame;Generation unit 504 is configured to generate based on wrist location information
The palm location information of palm object in the subsequent video frame of video frame.
It in the present embodiment, can be by wired connection side for generating the first acquisition unit 501 of the device 500 of information
Formula or radio connection obtain target video from other electronic equipments or local.Wherein, target video can be arbitrarily
Video.As an example, the target video can be video obtained from shooting to the palm of people.
In the present embodiment, the target video that above-mentioned first selection unit 502 can be got from first acquisition unit 501
Middle selecting video frame.
In the present embodiment, above-mentioned determination unit 503 can determine wrist in the video frame of the first selection unit 502 selection
The wrist location information of object.Wherein, wrist location information can serve to indicate that the position of wrist object in the video frame.
In the present embodiment, above-mentioned generation unit 504 can be raw based on the wrist location information that determination unit 503 obtains
At the palm location information of the palm object in the subsequent video frame of video frame.Above-mentioned palm location information can serve to indicate that hand
Slap the position of object in the video frame.
In some optional implementations of the present embodiment, determination unit 503 may include: the first input module (figure
In be not shown) be configured to for video frame being input in advance trained joint orientation model, obtain wrist object in video frame
Wrist location information, wherein joint orientation model is used to determine the position of human joint points, and human joint points include wrist.
In some optional implementations of the present embodiment, generation unit 504 may include: the second input module (figure
In be not shown) be configured in the subsequent video frame by video frame, corresponding with wrist location information image-region and be input to
Trained trace model in advance, obtains palm location information, wherein trace model is for determining hand in inputted image-region
Slap the position of object.
In some optional implementations of the present embodiment, generation unit 504 also may include: enhanced processing module
(not shown) is configured in the subsequent video frame to video frame, the corresponding image-region of wrist location information and puts
Big processing, obtains enlarged drawing region.Third input module (not shown) is configured to for being input in enlarged drawing region
Trained trace model in advance, obtains palm location information, wherein trace model is for determining hand in inputted image-region
Slap the position of object.4th input module (not shown) is configured in response to palm location information instruction enlarged drawing area
Do not include palm object in domain, subsequent video frame is input to joint orientation model, obtains the wrist location information of wrist object.
5th input module (not shown) is configured in subsequent video frame, the corresponding image-region of wrist location information is defeated
Enter to trace model, obtains the palm location information of palm object.
In some optional implementations of the present embodiment, target video is the video of current shooting and presentation.
The device provided by the above embodiment of the disclosure obtains target video by first acquisition unit 501, then, the
The one selecting video frame from target video of selection unit 502, later, determination unit 503 determine the hand of wrist object in video frame
Wrist location information generates the palm pair in the subsequent video frame of video frame finally, generation unit 504 is based on wrist location information
The palm location information of elephant is based on wrist location information as a result, determines palm location information, enrich palm location information
Method of determination helps to improve the accuracy of identified palm position.
Referring next to Fig. 6, as the realization to method shown in above-mentioned Fig. 4, present disclose provides a kind of target followings
One embodiment of device, the Installation practice is corresponding with embodiment of the method shown in Fig. 4, except following documented feature
Outside, which can also include feature identical or corresponding with embodiment of the method shown in Fig. 4, and generation and Fig. 4
Shown in embodiment of the method is identical or corresponding effect.The device specifically can be applied in various electronic equipments.
As shown in fig. 6, the target tracker 600 of the present embodiment includes: second acquisition unit 601, it is configured to obtain
Current shooting and the target video of presentation;Second selection unit 602 is configured to from target video selecting video frame as
One target video frame, and execute following tracking step: first object video frame is input to joint orientation mould trained in advance
Type obtains the wrist location information of wrist object in first object video frame, wherein joint orientation model is for determining that human body closes
The position of node, human joint points include wrist, and execute following tracking sub-step: first object is chosen from target video
The subsequent video frame of video frame is as the second target video frame: in the second target video frame in determining and first object video frame
Wrist object the corresponding image-region of wrist location information;Image-region in second target video frame is input in advance
Trained trace model obtains palm location information, wherein trace model is for determining palm pair in inputted image-region
The position of elephant;In response to including palm in the palm location information instruction image-region of the palm object in the second target video frame
Object, the palm location information based on palm object in the image area determine palm object in the second target video frame
Palm location information.
In the present embodiment, the available current shooting of second acquisition unit 601 of target tracker 600 and presentation
Target video.
In the present embodiment, the target video that above-mentioned second selection unit 602 can be got from second acquisition unit 601
Middle selecting video frame is as first object video frame, and executes following tracking step: first object video frame being input to pre-
First trained joint orientation model, obtains the wrist location information of wrist object in first object video frame, wherein joint orientation
Model is used to determine the position of human joint points, and human joint points include wrist, and executes following tracking sub-step: from target
The subsequent video frame of first object video frame is chosen in video as the second target video frame: being determined in the second target video frame
Image-region corresponding with the wrist location information of wrist object in first object video frame;It will be in the second target video frame
Image-region is input to trace model trained in advance, obtains palm location information, wherein trace model is inputted for determination
Image-region in palm object position;In response to the palm location information instruction of the palm object in the second target video frame
It include palm object in image-region, the palm location information based on palm object in the image area determines that palm object exists
Palm location information in second target video frame.
In some optional implementations of the present embodiment, the device 600 further include: the first execution unit is (in figure not
Show) be configured in response to the last frame in the second non-targeted video of target video frame, using the second target video frame as
First object video frame continues to execute tracking sub-step.
In some optional implementations of the present embodiment, the device 600 further include: the second execution unit is (in figure not
Show) it does not wrap in the palm location information instruction image-region of the palm object that is configured in response in the second target video frame
Object containing palm continues to execute tracking step using the second target video frame as first object video frame.
The device provided by the above embodiment of the disclosure obtains current shooting and presentation by second acquisition unit 601
Target video, then, the second selection unit 602 selecting video frame from target video and are held as first object video frame
The following tracking step of row: first object video frame is input to joint orientation model trained in advance, obtains first object video
The wrist location information of wrist object in frame, wherein joint orientation model is used to determine the position of human joint points, human synovial
Point includes wrist, and executes following tracking sub-step: the subsequent video frame of first object video frame is chosen from target video
As the second target video frame: the determining wrist position with the wrist object in first object video frame in the second target video frame
Confidence ceases corresponding image-region;Image-region in second target video frame is input to trace model trained in advance, is obtained
To palm location information, wherein trace model is for determining the position of palm object in inputted image-region;In response to
Include palm object in the palm location information instruction image-region of palm object in two target video frames, is based on palm object
Palm location information in the image area determines palm location information of the palm object in the second target video frame, as a result,
The position of the palm object in the video currently presented can be determined in real time.
Below with reference to Fig. 7, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1
Server or terminal device) 700 structural schematic diagram.Terminal device in embodiment of the disclosure can include but is not limited to all
As mobile phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD (tablet computer), PMP are (portable
Formula multimedia player), the mobile terminal and such as number TV, desk-top meter of car-mounted terminal (such as vehicle mounted guidance terminal) etc.
The fixed terminal of calculation machine etc..Terminal device/server shown in Fig. 7 is only an example, should not be to the implementation of the disclosure
The function and use scope of example bring any restrictions.
As shown in fig. 7, electronic equipment 700 may include processing unit (such as central processing unit, graphics processor etc.)
701, random access can be loaded into according to the program being stored in read-only memory (ROM) 702 or from storage device 708
Program in memory (RAM) 703 and execute various movements appropriate and processing.In RAM 703, it is also stored with electronic equipment
Various programs and data needed for 700 operations.Processing unit 701, ROM 702 and RAM703 are connected with each other by bus 704.
Input/output (I/O) interface 705 is also connected to bus 704.
In general, following device can connect to I/O interface 705: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph
As the input unit 706 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration
The output device 707 of dynamic device etc.;Storage device 708 including such as tape, hard disk etc.;And communication device 709.Communication device
709, which can permit electronic equipment 700, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 7 shows tool
There is the electronic equipment 700 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with
Alternatively implement or have more or fewer devices.Each box shown in Fig. 7 can represent a device, can also root
According to needing to represent multiple devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communication device 709, or from storage device 708
It is mounted, or is mounted from ROM 702.When the computer program is executed by processing unit 701, the implementation of the disclosure is executed
The above-mentioned function of being limited in the method for example.
It is situated between it should be noted that computer-readable medium described in embodiment of the disclosure can be computer-readable signal
Matter or computer readable storage medium either the two any combination.Computer readable storage medium for example can be with
System, device or the device of --- but being not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or it is any more than
Combination.The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires
Electrical connection, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type are programmable
Read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited
Memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer readable storage medium, which can be, appoints
What include or the tangible medium of storage program that the program can be commanded execution system, device or device use or and its
It is used in combination.And in embodiment of the disclosure, computer-readable signal media may include in a base band or as carrier wave
The data-signal that a part is propagated, wherein carrying computer-readable program code.The data-signal of this propagation can be adopted
With diversified forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal is situated between
Matter can also be any computer-readable medium other than computer readable storage medium, which can be with
It sends, propagate or transmits for by the use of instruction execution system, device or device or program in connection.Meter
The program code for including on calculation machine readable medium can transmit with any suitable medium, including but not limited to: electric wire, optical cable,
RF (radio frequency) etc. or above-mentioned any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not
It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more
When a program is executed by the electronic equipment, so that the electronic equipment: obtaining target video;The selecting video frame from target video;
Determine the wrist location information of wrist object in video frame;Based on wrist location information, generate in the subsequent video frame of video frame
Palm object palm location information.Alternatively, making the electronic equipment: obtaining the target video of current shooting and presentation;From
Selecting video frame is as first object video frame in target video, and executes following tracking step: by first object video frame
It is input to joint orientation model trained in advance, obtains the wrist location information of wrist object in first object video frame, wherein
Joint orientation model is used to determine the position of human joint points, and human joint points include wrist, and executes following tracking sub-step
It is rapid: to choose the subsequent video frame of first object video frame from target video as the second target video frame: being regarded in the second target
Image-region corresponding with the wrist location information of wrist object in first object video frame is determined in frequency frame;By the second target
Image-region in video frame is input to trace model trained in advance, obtains palm location information, wherein trace model is used for
Determine the position of palm object in inputted image-region;In response to the palm position of the palm object in the second target video frame
It include palm object in confidence breath instruction image-region, the palm location information based on palm object in the image area determines
Palm location information of the palm object in the second target video frame.
The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof
The computer program code of work, described program design language include object oriented program language-such as Java,
Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language
Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence
Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or
It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through
The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor
Including first acquisition unit, the first selection unit, determination unit and generation unit.Wherein, the title of these units is in certain feelings
The restriction to the unit itself is not constituted under condition, for example, first acquisition unit is also described as " obtaining target video
Unit ".
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (12)
1. a kind of method for generating information, comprising:
Obtain target video;
The selecting video frame from the target video;
Determine the wrist location information of wrist object in the video frame;
Based on the wrist location information, the palm position letter of the palm object in the subsequent video frame of the video frame is generated
Breath.
2. according to the method described in claim 1, wherein, the wrist location of wrist object is believed in the determination video frame
Breath, comprising:
The video frame is input to joint orientation model trained in advance, obtains the wrist position of wrist object in the video frame
Confidence breath, wherein the joint orientation model is used to determine the position of human joint points, and human joint points include wrist.
3. it is described to be based on the wrist location information according to the method described in claim 1, wherein, generate the video frame
The palm location information of palm object in subsequent video frame, comprising:
Image-region in the subsequent video frame of the video frame, corresponding with the wrist location information is input to preparatory instruction
Experienced trace model obtains palm location information, wherein the trace model is for determining palm in inputted image-region
The position of object.
4. it is described to be based on the wrist location information according to the method described in claim 2, wherein, generate the video frame
The palm location information of palm object in subsequent video frame, comprising:
Processing is amplified to the corresponding image-region of in the subsequent video frame of the video frame, the wrist location information,
Obtain enlarged drawing region;
The enlarged drawing region is input to trace model trained in advance, obtains palm location information, wherein the tracking
Model is for determining the position of palm object in inputted image-region;
Indicate that in the enlarged drawing region do not include palm object in response to the palm location information, by the subsequent video
Frame is input to the joint orientation model, obtains the wrist location information of wrist object;
The corresponding image-region of in the subsequent video frame, the wrist location information is input to the trace model, is obtained
To the palm location information of palm object.
5. method described in one of -4 according to claim 1, wherein the target video is the video of current shooting and presentation.
6. a kind of method for tracking target, comprising:
Obtain the target video of current shooting and presentation;
Selecting video frame is as first object video frame from the target video, and executes following tracking step:
First object video frame is input to joint orientation model trained in advance, obtains wrist object in first object video frame
Wrist location information, wherein the joint orientation model is used to determine the positions of human joint points, and human joint points include hand
Wrist, and execute following tracking sub-step:
The subsequent video frame of first object video frame is chosen from the target video as the second target video frame:
Figure corresponding with the wrist location information of wrist object in first object video frame is determined in the second target video frame
As region;
Image-region in second target video frame is input to trace model trained in advance, obtains palm location information,
In, the trace model is for determining the position of palm object in inputted image-region;
In response to including palm object in the palm location information instruction image-region of the palm object in the second target video frame,
Palm location information based on palm object in the image area determines palm position of the palm object in the second target video frame
Confidence breath.
7. according to the method described in claim 6, wherein, the method also includes:
In response to the last frame in the non-target video of the second target video frame, using the second target video frame as the first mesh
Video frame is marked, the tracking sub-step is continued to execute.
8. method according to claim 6 or 7, wherein the method also includes:
In response to not including palm pair in the palm location information instruction image-region of the palm object in the second target video frame
As continuing to execute the tracking step using the second target video frame as first object video frame.
9. a kind of for generating the device of information, comprising:
First acquisition unit is configured to obtain target video;
First selection unit is configured to the selecting video frame from the target video;
Determination unit is configured to determine the wrist location information of wrist object in the video frame;
Generation unit is configured to generate the palm in the subsequent video frame of the video frame based on the wrist location information
The palm location information of object.
10. a kind of target tracker, comprising:
Second acquisition unit is configured to obtain the target video of current shooting and presentation;
Second selection unit is configured to from the target video selecting video frame as first object video frame, and holds
The following tracking step of row:
First object video frame is input to joint orientation model trained in advance, obtains wrist object in first object video frame
Wrist location information, wherein the joint orientation model is used to determine the positions of human joint points, and human joint points include hand
Wrist, and execute following tracking sub-step:
The subsequent video frame of first object video frame is chosen from the target video as the second target video frame:
Figure corresponding with the wrist location information of wrist object in first object video frame is determined in the second target video frame
As region;
Image-region in second target video frame is input to trace model trained in advance, obtains palm location information,
In, the trace model is for determining the position of palm object in inputted image-region;
In response to including palm object in the palm location information instruction image-region of the palm object in the second target video frame,
Palm location information based on palm object in the image area determines palm position of the palm object in the second target video frame
Confidence breath.
11. a kind of electronic equipment, comprising:
One or more processors;
Storage device is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method described in any one of claims 1-8.
12. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor
Now such as method described in any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910480692.7A CN110189364B (en) | 2019-06-04 | 2019-06-04 | Method and device for generating information, and target tracking method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910480692.7A CN110189364B (en) | 2019-06-04 | 2019-06-04 | Method and device for generating information, and target tracking method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110189364A true CN110189364A (en) | 2019-08-30 |
CN110189364B CN110189364B (en) | 2022-04-01 |
Family
ID=67720133
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910480692.7A Active CN110189364B (en) | 2019-06-04 | 2019-06-04 | Method and device for generating information, and target tracking method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110189364B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688992A (en) * | 2019-12-09 | 2020-01-14 | 中智行科技有限公司 | Traffic signal identification method and device, vehicle navigation equipment and unmanned vehicle |
CN113849687A (en) * | 2020-11-23 | 2021-12-28 | 阿里巴巴集团控股有限公司 | Video processing method and device |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101447082A (en) * | 2008-12-05 | 2009-06-03 | 华中科技大学 | Detection method of moving target on a real-time basis |
WO2010086866A1 (en) * | 2009-02-02 | 2010-08-05 | Eyesight Mobile Technologies Ltd. | System and method for object recognition and tracking in a video stream |
CN103065312A (en) * | 2012-12-26 | 2013-04-24 | 四川虹微技术有限公司 | Foreground extraction method in gesture tracking process |
US20150199824A1 (en) * | 2014-01-10 | 2015-07-16 | Electronics And Telecommunications Research Institute | Apparatus and method for detecting multiple arms and hands by using three-dimensional image |
CN107274433A (en) * | 2017-06-21 | 2017-10-20 | 吉林大学 | Method for tracking target, device and storage medium based on deep learning |
CN108399367A (en) * | 2018-01-31 | 2018-08-14 | 深圳市阿西莫夫科技有限公司 | Hand motion recognition method, apparatus, computer equipment and readable storage medium storing program for executing |
CN108564596A (en) * | 2018-03-01 | 2018-09-21 | 南京邮电大学 | A kind of the intelligence comparison analysis system and method for golf video |
CN108961315A (en) * | 2018-08-01 | 2018-12-07 | 腾讯科技(深圳)有限公司 | Method for tracking target, device, computer equipment and storage medium |
CN109447996A (en) * | 2017-08-28 | 2019-03-08 | 英特尔公司 | Hand Segmentation in 3-D image |
CN109525891A (en) * | 2018-11-29 | 2019-03-26 | 北京字节跳动网络技术有限公司 | Multi-user's special video effect adding method, device, terminal device and storage medium |
CN109636828A (en) * | 2018-11-20 | 2019-04-16 | 北京京东尚科信息技术有限公司 | Object tracking methods and device based on video image |
-
2019
- 2019-06-04 CN CN201910480692.7A patent/CN110189364B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101447082A (en) * | 2008-12-05 | 2009-06-03 | 华中科技大学 | Detection method of moving target on a real-time basis |
WO2010086866A1 (en) * | 2009-02-02 | 2010-08-05 | Eyesight Mobile Technologies Ltd. | System and method for object recognition and tracking in a video stream |
CN103065312A (en) * | 2012-12-26 | 2013-04-24 | 四川虹微技术有限公司 | Foreground extraction method in gesture tracking process |
US20150199824A1 (en) * | 2014-01-10 | 2015-07-16 | Electronics And Telecommunications Research Institute | Apparatus and method for detecting multiple arms and hands by using three-dimensional image |
CN107274433A (en) * | 2017-06-21 | 2017-10-20 | 吉林大学 | Method for tracking target, device and storage medium based on deep learning |
CN109447996A (en) * | 2017-08-28 | 2019-03-08 | 英特尔公司 | Hand Segmentation in 3-D image |
CN108399367A (en) * | 2018-01-31 | 2018-08-14 | 深圳市阿西莫夫科技有限公司 | Hand motion recognition method, apparatus, computer equipment and readable storage medium storing program for executing |
CN108564596A (en) * | 2018-03-01 | 2018-09-21 | 南京邮电大学 | A kind of the intelligence comparison analysis system and method for golf video |
CN108961315A (en) * | 2018-08-01 | 2018-12-07 | 腾讯科技(深圳)有限公司 | Method for tracking target, device, computer equipment and storage medium |
CN109636828A (en) * | 2018-11-20 | 2019-04-16 | 北京京东尚科信息技术有限公司 | Object tracking methods and device based on video image |
CN109525891A (en) * | 2018-11-29 | 2019-03-26 | 北京字节跳动网络技术有限公司 | Multi-user's special video effect adding method, device, terminal device and storage medium |
Non-Patent Citations (2)
Title |
---|
JING-MING GUO ET AL: "Hybrid hand tracking system", 《2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 * |
李智娴等: "1种基于Kinect深度图像的指尖检测与跟踪算法", 《江苏农业科学》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688992A (en) * | 2019-12-09 | 2020-01-14 | 中智行科技有限公司 | Traffic signal identification method and device, vehicle navigation equipment and unmanned vehicle |
CN113849687A (en) * | 2020-11-23 | 2021-12-28 | 阿里巴巴集团控股有限公司 | Video processing method and device |
CN113849687B (en) * | 2020-11-23 | 2022-10-28 | 阿里巴巴集团控股有限公司 | Video processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110189364B (en) | 2022-04-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109462776B (en) | Video special effect adding method and device, terminal equipment and storage medium | |
US20210029305A1 (en) | Method and apparatus for adding a video special effect, terminal device and storage medium | |
CN110188719A (en) | Method for tracking target and device | |
CN109525891B (en) | Multi-user video special effect adding method and device, terminal equipment and storage medium | |
CN109584276B (en) | Key point detection method, device, equipment and readable medium | |
CN109599113A (en) | Method and apparatus for handling information | |
CN109858445A (en) | Method and apparatus for generating model | |
CN109600559B (en) | Video special effect adding method and device, terminal equipment and storage medium | |
CN108345387A (en) | Method and apparatus for output information | |
CN110162670A (en) | Method and apparatus for generating expression packet | |
CN111050271B (en) | Method and apparatus for processing audio signal | |
CN109348277B (en) | Motion pixel video special effect adding method and device, terminal equipment and storage medium | |
CN109829432A (en) | Method and apparatus for generating information | |
CN110210501B (en) | Virtual object generation method, electronic device and computer-readable storage medium | |
CN110009059A (en) | Method and apparatus for generating model | |
CN109754464A (en) | Method and apparatus for generating information | |
CN113467603A (en) | Audio processing method and device, readable medium and electronic equipment | |
CN109800730A (en) | The method and apparatus for generating model for generating head portrait | |
CN108446658A (en) | The method and apparatus of facial image for identification | |
CN110264539A (en) | Image generating method and device | |
CN114972591A (en) | Animation generation model training method, animation generation method and device | |
CN110189364A (en) | For generating the method and apparatus and method for tracking target and device of information | |
CN111652675A (en) | Display method and device and electronic equipment | |
CN109829431A (en) | Method and apparatus for generating information | |
CN111447379B (en) | Method and device for generating information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |