CN110263743A - The method and apparatus of image for identification - Google Patents
The method and apparatus of image for identification Download PDFInfo
- Publication number
- CN110263743A CN110263743A CN201910558852.5A CN201910558852A CN110263743A CN 110263743 A CN110263743 A CN 110263743A CN 201910558852 A CN201910558852 A CN 201910558852A CN 110263743 A CN110263743 A CN 110263743A
- Authority
- CN
- China
- Prior art keywords
- video
- movement
- palm
- frame
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Social Psychology (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Psychiatry (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Embodiment of the disclosure discloses the method and apparatus of image for identification.One specific embodiment of this method includes: acquisition target video;The two frame video frames comprising palm object are extracted from target video;Based on two frame video frames, identify whether the movement of the corresponding palm of target video is movement of waving;In response to determining that the movement of the corresponding palm of target video is movement of waving, the movement that generation is used to indicate the corresponding palm of target video is the recognition result of movement of waving.The embodiment can make electronic equipment determine in video whether comprising executing the palm object for movement of waving to identify richer human body attitude information in turn.
Description
Technical field
Embodiment of the disclosure is related to field of computer technology, and in particular to the method and apparatus of image for identification.
Background technique
Image recognition refers to and is handled image, analyzed and understood using computer, to identify various targets and object
Technology.
Under human-computer interaction scene, more and more researchers are dedicated to the research of the interaction technique of hand.Compared to it
His human body, hand freedom and flexibility are responsible for a large amount of interworking in the daily life of user, pass through what hand was completed
It operates innumerable.As it can be seen that the demand identified comprising the image of hand object or the semantic information of video exists in the prior art.
Summary of the invention
The present disclosure proposes the method and apparatus of image for identification.
In a first aspect, embodiment of the disclosure provides a kind of method of image for identification, this method comprises: obtaining mesh
Mark video;The two frame video frames comprising palm object are extracted from target video;Based on two frame video frames, target video pair is identified
Whether the movement for the palm answered is movement of waving;It is raw in response to determining that the movement of the corresponding palm of target video is movement of waving
It is the recognition result of movement of waving at the movement for being used to indicate the corresponding palm of target video.
In some embodiments, two frame video frames are based on, whether the movement of the corresponding palm of identification target video is to wave
Movement, comprising: two frame video frames are input to identification model trained in advance, to determine the movement of the corresponding palm of target video
It whether is movement of waving, wherein identification model includes the corresponding palm of video of two inputted frame video frames for identification
Whether movement is movement of waving.
In some embodiments, two frame video frames are based on, whether the movement of the corresponding palm of identification target video is to wave
Movement, comprising: determine projection coordinate of the normal vector in predetermined projection plane of the corresponding palm of two frame video frames;
Based on identified Liang Ge projection coordinate, identify whether the movement of the corresponding palm of target video is movement of waving.
In some embodiments, projection plane is the imaging plane of video frame;And it is sat based on identified two projections
Whether the movement of mark, the corresponding palm of identification target video is movement of waving, comprising: in response to the projection coordinate of two frame video frames
It is less than or equal to preset first threshold, also, two frame video frames in the absolute value of the difference of the coordinate value of preset first direction
Corresponding projection coordinate is greater than or equal to preset second threshold in the absolute value of the difference of the coordinate value of preset second direction, will
The action recognition of the corresponding palm of target video is movement of waving.
In some embodiments, projection plane is the imaging plane of video frame;And it is sat based on identified two projections
Whether the movement of mark, the corresponding palm of identification target video is movement of waving, comprising: in response to the projection coordinate of two frame video frames
It is greater than preset first threshold in the absolute value of the difference of the coordinate value of preset first direction, alternatively, two frame video frames are corresponding
Projection coordinate is less than preset second threshold in the absolute value of the difference of the coordinate value of preset second direction, and target video is corresponding
Palm action recognition be non-movement of waving.
In some embodiments, the two frame video frames comprising palm object are extracted from target video, comprising: regard from target
The two adjacent frame video frames comprising palm object are extracted in frequency.
In some embodiments, this method further include: in response to generate recognition result, in target video include palm
Image comprising target object and the video frame are carried out image co-registration, obtained corresponding with the video frame by the video frame of object
Fused image;Fusion rear video is generated, and is presented using fusion rear video substitution target video.
Second aspect, embodiment of the disclosure provide a kind of device of image for identification, which includes: to obtain list
Member is configured to obtain target video;Extraction unit is configured to extract the two frames view comprising palm object from target video
Frequency frame;Recognition unit is configured to based on two frame video frames, the movement of the corresponding palm of identification target video whether be wave it is dynamic
Make;First generation unit is configured in response to determine that the movement of the corresponding palm of target video is movement of waving, and generation is used for
The movement for indicating the corresponding palm of target video is the recognition result of movement of waving.
In some embodiments, recognition unit includes: input module, is configured to two frame video frames being input to preparatory instruction
Experienced identification model, to determine whether the movement of the corresponding palm of target video is movement of waving, wherein identification model is for knowing
Whether the movement of the corresponding palm of video for the two frame video frames that Bao Han do not inputted is movement of waving.
In some embodiments, recognition unit comprises determining that module, and it is corresponding to be configured to determine two frame video frames
Projection coordinate of the normal vector of palm in predetermined projection plane;Identification module is configured to based on identified two
Whether the movement of projection coordinate, the corresponding palm of identification target video is movement of waving.
In some embodiments, projection plane is the imaging plane of video frame;And identification module includes: the first identification
Module, the projection coordinate for being configured in response to two frame video frames are small in the absolute value of the difference of the coordinate value of preset first direction
In or be equal to preset first threshold, also, the corresponding projection coordinate of two frame video frames is in the coordinate value of preset second direction
Absolute value of the difference be greater than or equal to preset second threshold, by the action recognition of the corresponding palm of target video be wave it is dynamic
Make.
In some embodiments, projection plane is the imaging plane of video frame;And identification module includes: the second identification
Module, the projection coordinate for being configured in response to two frame video frames are big in the absolute value of the difference of the coordinate value of preset first direction
In preset first threshold, alternatively, the difference of coordinate value of the corresponding projection coordinate of two frame video frames in preset second direction
Absolute value is less than preset second threshold, is non-movement of waving by the action recognition of the corresponding palm of target video.
In some embodiments, extraction unit includes: extraction module, is configured to extract from target video comprising palm
The two adjacent frame video frames of object.
In some embodiments, the device further include: integrated unit is configured in response to generate recognition result, for
Include the video frame of palm object in target video, the image comprising target object and the video frame are subjected to image co-registration, obtained
To fused image corresponding with the video frame;Second generation unit is configured to generate fusion rear video, and uses and melt
Rear video substitution target video is closed to be presented.
The third aspect, embodiment of the disclosure provide a kind of electronic equipment of image for identification, comprising: one or more
A processor;Storage device is stored thereon with one or more programs, when said one or multiple programs are by said one or more
A processor executes, so that the one or more processors are realized such as any embodiment in the method for above-mentioned image for identification
Method.
Fourth aspect, embodiment of the disclosure provide a kind of computer-readable medium of image for identification, deposit thereon
Computer program is contained, is realized when which is executed by processor such as any embodiment in the method for above-mentioned image for identification
Method.
The method and apparatus for the image for identification that embodiment of the disclosure provides, by obtaining target video, then, from
The two frame video frames comprising palm object are extracted in target video, later, are based on two frame video frames, identification target video is corresponding
Whether the movement of palm is movement of waving, finally, in response to determining that the movement of the corresponding palm of target video is movement of waving, it is raw
It is the recognition result of movement of waving at the movement for being used to indicate the corresponding palm of target video, thus, it is possible to make electronic equipment
Determine whether identify richer palm attitude information in turn comprising executing the palm object for movement of waving in video, have
Help realize the human-computer interaction based on gesture identification.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that one embodiment of the disclosure can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the method for the image for identification of the disclosure;
Fig. 3 is the schematic diagram according to an application scenarios of the method for the image for identification of the disclosure;
Fig. 4 is the flow chart according to another embodiment of the method for the image for identification of the disclosure;
Fig. 5 is the structural schematic diagram according to one embodiment of the device of the image for identification of the disclosure;
Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of embodiment of the disclosure.
Specific embodiment
The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase
Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the method that can apply the image for identification of embodiment of the disclosure or for identification dress of image
The exemplary system architecture 100 for the embodiment set.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send data (such as video) etc..Various client applications, such as video playing can be installed on terminal device 101,102,103
Software, the application of Domestic News class, image processing class application, web browser applications, the application of shopping class, searching class are applied, immediately
Means of communication, mailbox client, social platform software etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard
When part, it can be the various electronic equipments with video capture function, including but not limited to smart phone, tablet computer, electronics
Book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert
Compression standard audio level 3), (Moving Picture Experts Group Audio Layer IV, dynamic image are special by MP4
Family's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..When terminal device 101,
102,103 when being software, may be mounted in above-mentioned cited electronic equipment.Multiple softwares or software mould may be implemented into it
Block (such as providing the software of Distributed Services or software module), also may be implemented into single software or software module.?
This is not specifically limited.
Server 105 can be to provide the server of various services, such as to the view that terminal device 101,102,103 is sent
Frequently the background server that (such as target video) is handled.Background server can know the data such as the video received
It Deng not handle, and generate processing result (such as recognition result).Optionally, above-mentioned background server can also be anti-by processing result
It feeds terminal device.As an example, server 105 can be cloud server, it is also possible to physical server.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented
At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software
To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented
At single software or software module.It is not specifically limited herein.
It should also be noted that, the method for image can be held by server for identification provided by embodiment of the disclosure
Row, can also be executed, can also be fitted to each other execution by server and terminal device by terminal device.Correspondingly, for identification
The various pieces (such as each unit, subelement, module, submodule) that the device of image includes can all be set to server
In, it can also all be set in terminal device, can also be respectively arranged in server and terminal device.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.When the method operation electricity thereon of image for identification
Sub- equipment is during executing this method, and when not needing to carry out data transmission with other electronic equipments, which can be with
It only include the electronic equipment (such as server or terminal device) of the method operation of image for identification thereon.
With continued reference to Fig. 2, the process of one embodiment of the method for the image for identification according to the disclosure is shown
200.The method of the image for identification, comprising the following steps:
Step 201, target video is obtained.
In the present embodiment, (such as server shown in FIG. 1 or terminal are set the executing subject of the method for image for identification
It is standby) target video can be obtained from other electronic equipments or locally by wired connection mode or radio connection.
Wherein, above-mentioned target video can be the video to carry out palm action identification to it.
As an example, the target video can be the electricity by above-mentioned executing subject or with the communication connection of above-mentioned executing subject
Sub- equipment, video obtained from being shot to the palm of user.
It is appreciated that when the video obtained from target video is shot to the palm of user, in the target video
Some or all of video frame in may include palm object.Wherein, palm object, which can be in video frame (i.e. image), is in
The image of existing palm.
Step 202, the two frame video frames comprising palm object are extracted from target video.
In the present embodiment, above-mentioned executing subject can be from the target video that step 201 is got, and extracting includes palm
Two frame video frames of object.
It should be noted that in the step 202, above-mentioned executing subject extracted video frame from above-mentioned target video
Quantity can be 2, be also possible to the random natural number greater than 2.Embodiment of the disclosure is not construed as limiting this.
It is appreciated that since the quantity of above-mentioned executing subject extracted video frame from above-mentioned target video is greater than 2
Random natural number when, certainty is extracted the two frame video frames comprising palm object from target video, thus, above-mentioned execution
The quantity of main body extracted video frame from above-mentioned target video is the scheme of the random natural number greater than 2, is also contained in
Within embodiment of the disclosure technical proposal scope claimed.
As an example, above-mentioned executing subject can in the following way, to execute the step 202:
Firstly, identified for the video frame in target video to the video frame, with determine the video frame whether include
Palm object marks the video frame if the video frame includes palm object.
Then, two frame video frames are randomly chosen from each video frame marked, alternatively, from each view marked
1 frame video frame is randomly chosen in frequency frame, then choose from the video with the video frame period predetermined natural number (such as 0,1,2,
3 etc.) video frame and the video frame being located at after the video frame, to extract from target video comprising palm object
Two frame video frames.
In some optional implementations of the present embodiment, above-mentioned executing subject can also execute in the following way
The step 202: the two adjacent frame video frames comprising palm object are extracted from target video.Here, two adjacent frame videos
Frame can be the video frame (but there may be the video frames not comprising palm object therebetween) being not present therebetween comprising palm object
Two frame video frames, be also possible to therebetween be not present video frame two frame video frames.
As an example, above-mentioned executing subject can randomly extract adjacent two comprising palm object from target video
Frame video frame can also extract two adjacent frame video frames comprising palm object, occurring at first, i.e. mesh from target video
Mark the first frame video frame and the second frame video frame in video comprising palm object.
It is appreciated that since the movement of hand is more flexible and movement velocity is very fast, thus, when extracted two frames video
When frame is adjacent, relative to extracted two frames video frame not necessarily adjacent situation, this optional implementation can be by pre-
The relative movement distance for estimating the corresponding palm of palm object in two frame video frames, so that subsequent step can be more accurately
Whether the movement for identifying palm is movement of waving.
Step 203, two frame video frames are based on, whether the movement of the corresponding palm of identification target video is movement of waving.
In the present embodiment, above-mentioned executing subject can be based on extracted two frames video frame in step 202, identification step
Whether the movement of the acquired corresponding palm of target video is movement of waving in 201.Wherein, the corresponding palm of target video can
To be palm captured during obtaining target video.
In some optional implementations of the present embodiment, above-mentioned executing subject can execute this in the following way
Step 203:
Two frame video frames are input to in advance trained identification model, to determine that the movement of the corresponding palm of target video is
No is movement of waving.Wherein, identification model includes the dynamic of the corresponding palm of video of two inputted frame video frames for identification
It whether is movement of waving.
As an example, above-mentioned identification model, which can be associated storage, there are multiple two frames video frames, and it is used to indicate and includes
Whether the movement of the corresponding palm of video of two frame video frames of each of above-mentioned multiple two frames video frames is the letter of movement of waving
The bivariate table or database of breath.
As another example, above-mentioned identification model is also possible to using machine learning algorithm, based on training sample set training
Obtained convolutional neural networks model.Wherein, the training sample that above-mentioned training sample is concentrated may include two frame video frames, and
Whether the movement for being used to indicate the corresponding palm of video comprising the two frames video frame is the information of movement of waving.
In some optional implementations of the present embodiment, above-mentioned executing subject can also execute in the following way
The step 203:
The first step determines projection of the normal vector in predetermined projection plane of the corresponding palm of two frame video frames
Coordinate.
Wherein, above-mentioned projection plane can be predetermined arbitrary plane in three-dimensional space, for example, projection plane can be with
For be parallel to video frame imaging plane (i.e. the picture planes of the image acquiring devices such as video camera) arbitrary plane.Projection is sat
Mark can be characterized by the first element and second element.Wherein, the first element can be projection coordinate in preset first direction
Coordinate value.Second element can be projection coordinate in the coordinate value of preset second direction.As an example, projection coordinate can be with
It is expressed as " (x, y) ".Wherein, x can be the first element, and y can be second element.The direction of the normal vector of palm can be vertical
The palm of the hand side of palm is directly directed toward in the plane where palm.The size of the normal vector of palm can be predetermined numerical value.
It is appreciated that since projection plane is two-dimensional surface, thus, projection coordinate need to only pass through the first element and second yuan
Plain two elements can characterize.However, in some cases, projection coordinate can also be characterized by more multielement.
Second step identifies whether the movement of the corresponding palm of target video is trick based on identified Liang Ge projection coordinate
Make manually.
Specifically, above-mentioned executing subject can in the following way, execute should step 2:
Identified Liang Ge projection coordinate is input to disaggregated model trained in advance, generation is used to indicate this two projections
Whether the movement of the corresponding palm of the corresponding video of coordinate is the discriminant information of movement of waving.
Wherein, above-mentioned disaggregated model can be is obtained using following steps training:
Firstly, obtaining training sample set.Wherein, the training sample that training sample is concentrated includes Liang Ge projection coordinate, and
Whether the movement for being used to indicate the corresponding palm of the corresponding video of Liang Ge projection coordinate is the discriminant information of movement of waving.Its
In, between " Liang Ge projection coordinate " therein for " the corresponding video of Liang Ge projection coordinate " and " video ", there are following relationships:
Determine the normal vector of the corresponding palm of two frame video frames in above-mentioned " video " comprising palm object in predetermined throwing
Above-mentioned " Liang Ge projection coordinate " can be obtained in the Liang Ge projection coordinate of shadow plane.
Then, using machine learning algorithm, the Liang Ge projection coordinate for including by the training sample that above-mentioned training sample is concentrated
As the input data of initial model, using discriminant information corresponding with the Liang Ge projection coordinate of input as the phase of initial model
It hopes output data, determines whether initial model meets predetermined trained termination condition, if it is satisfied, then training knot will be met
The initial model of beam condition is determined as disaggregated model.
Wherein, initial model may include: convolutional layer, classifier, pond layer etc..Training termination condition may include but
Be not limited at least one of following: the training time is more than preset duration, and frequency of training is more than preset times, is based on reality output data
The functional value for the predetermined loss function being calculated with desired output data is less than preset threshold.Here, above-mentioned reality
Output data is after input data is input to initial model, and initial model is by calculating exported data.
In some optional implementations of the present embodiment, above-mentioned projection plane is the imaging plane of video frame.As a result,
Above-mentioned executing subject can also execute in the following way it is above-mentioned step 2:
Be less than in response to the projection coordinate of two frame video frames in the absolute value of the difference of the coordinate value of preset first direction or
Equal to preset first threshold, also, the difference of coordinate value of the corresponding projection coordinate of two frame video frames in preset second direction
Absolute value be greater than or equal to preset second threshold, be to wave movement by the action recognition of the corresponding palm of target video.
In some optional implementations of the present embodiment, above-mentioned projection plane is the imaging plane of video frame.As a result,
Above-mentioned executing subject can also execute in the following way it is above-mentioned step 2:
It is greater than in advance in response to the projection coordinate of two frame video frames in the absolute value of the difference of the coordinate value of preset first direction
If first threshold, alternatively, the difference of coordinate value of the corresponding projection coordinate of two frame video frames in preset second direction is absolute
Value is less than preset second threshold, is non-movement of waving by the action recognition of the corresponding palm of target video.
In some optional implementations of the present embodiment, above-mentioned executing subject can in the following way, Lai Shengcheng
The corresponding projection coordinate of video frame (such as video frame in extracted two frames video frame):
Video frame is input to projection coordinate trained in advance and determines model, obtains the corresponding projection coordinate of the video frame.
Wherein, above-mentioned projection coordinate determines that model is determined for the corresponding projection coordinate of video frame of input.Video frame is corresponding
Projection coordinate of the normal vector of the corresponding palm of palm object in projection coordinate, that is, video frame in predetermined projection plane.
As an example, above-mentioned projection coordinate determines that model can be using machine learning algorithm, assembled for training based on training sample
The model got, is also possible to using geometric algorithm, to the plane where the corresponding palm of palm object in video frame into
After row normal vector calculates, then obtained normal vector is projected and obtains the formula of projection coordinate.
Illustratively, the training sample that above-mentioned training sample is concentrated includes that video frame and the corresponding projection of video frame are sat
Mark.Herein, the corresponding projection coordinate of each video frame that training sample is concentrated can be what artificial mark obtained;It is also possible to
It obtains in the following way:
Firstly, obtaining the video of palm rotation.Wherein, in each video frame in video palm normal vector projection coordinate
The middle coordinate value along first direction is all the same, and the seat in each video frame in the projection coordinate of palm normal vector in a second direction
Scale value persistently changes.Projection coordinate is projection coordinate of the palm normal vector on preset plane (such as imaging plane).Palm turns
Dynamic video content can reflect the variation tendency of palm normal vector.Here it is possible to pass through various equipment (such as above-mentioned execution master
Body or other electronic equipments communicated to connect with above-mentioned executing subject) shoot the video that palm rotates.Wherein, it is rotated in palm
During, direction of rotation can be kept to fix, there is no be displaced in the direction along shaft for palm.Wherein, the direction of shaft can
To be the direction where arm.For example, palm and finger are unfolded, guarantee palm and finger in the same plane, the direction of finger
For vertical direction.It is the initial position of palm where the palm when plane normal to screen direction, then the palm centre of the palm is to screen
Direction rotates, and plane where going to palm is parallel with screen, continues to rotate according to former direction of rotation, until to plane where palm
It is again perpendicular to screen direction, in palm rotation process, palm is not displaced in the vertical direction.The video bag of palm rotation
Containing multiple video frames.Wherein, palm rotation video each video frame in palm normal vector projection coordinate in along first direction
Coordinate value it is all the same, and the coordinate value in the image of the video in the projection coordinate of palm normal vector in a second direction persistently becomes
Change, wherein lasting variation is it is to be appreciated that lasting increase or lasting reduction.The projection coordinate of palm normal vector is palm
For normal vector in the coordinate of preset plane, preset plane can be imaging plane.It is obtained in another video using similar method
Coordinate value in each video frame in the projection coordinate of palm normal vector in a second direction is all the same, and palm in each video frame
The projection coordinate that coordinate value in the projection coordinate of normal vector along first direction persistently changes.Here, above-mentioned first direction and
Two directions can be any direction.Optionally, the straight line where first direction and the straight line where second direction can be perpendicular.
As an example, first direction can be the direction for being parallel to ground in projection plane, second direction, which can be in projection plane, hangs down
Directly in the direction on ground.
It is then possible to each video frame and the corresponding projection coordinate's group of the video frame that will include for above-mentioned video
At a training sample, and then obtain training sample set.
It is appreciated that causing since the method manually marked is difficult accurately to mark the projection coordinate of video frame
The model accuracy that training obtains is lower, and the method manually marked is troublesome in poeration, and the training effectiveness of model is lower.Therefore,
The training sample set obtained by the way of above-mentioned unartificial mark is raw to train projection coordinate to determine that model can be improved in model
At the accuracy of projection coordinate, the training effectiveness of model is improved.
Step 204, in response to determining that the movement of the corresponding palm of target video is movement of waving, generation is used to indicate target
The movement of the corresponding palm of video is the recognition result of movement of waving.
In the present embodiment, above-mentioned to hold in the case where determining the movement of the corresponding palm of target video is to wave movement
It is the recognition result of movement of waving that row main body, which can be generated and be used to indicate the movement of the corresponding palm of target video,.
In some optional implementations of the present embodiment, after performing step 204, above-mentioned executing subject can be with
Execute following steps:
Firstly, for the video frame for including palm object, will include target in target video in response to generating recognition result
The image of object and the video frame carry out image co-registration, obtain fused image corresponding with the video frame.Wherein, above-mentioned mesh
Mark object can be various objects, such as the text for being used to indicate greeting etc..
Then, fusion rear video is generated, and is presented using fusion rear video substitution target video.
It should be noted that above-mentioned executing subject can be presented target video before fusion rear video is presented,
Target video can not also be presented, be not limited thereto.
In some optional implementations of the present embodiment, above-mentioned executing subject is after executing above-mentioned steps 204, also
Following steps can be executed: the position where control target device to above-mentioned executing subject is mobile.Wherein, above-mentioned target device can
To be the various electronic equipments with the communication connection of above-mentioned executing subject, such as mobile robot etc..
It is appreciated that this optional implementation can identify the movement of the corresponding palm of target video be wave it is dynamic
In the case where work, target device is controlled close to above-mentioned executing subject, it is thus achieved that the human-computer interaction based on action recognition of waving.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for the image for identification of the present embodiment
Figure.In the application scenarios of Fig. 3, mobile phone 301 obtains target video 3011 first.Then, mobile phone 301 is from target video 3011
Extract the two frame video frames (being video frame 30111 and video frame 30112 in diagram) comprising palm object.Later, 301 base of mobile phone
In above-mentioned two frames video frame, identify whether the movement of the corresponding palm of target video 3011 is movement of waving.Finally, in response to true
The movement of the corresponding palm of video 3011 of setting the goal is movement of waving, and generation is used to indicate the corresponding palm of target video 3011
Movement is the recognition result of movement of waving.As an example, mobile phone 301 generates the corresponding palm of target video 3011 in diagram
Movement is the recognition result " being movement of waving " of movement of waving.
In the prior art, whether there is usually no include the technical side for executing the palm object for movement of waving in identification video
Case.However, being responsible for a large amount of interaction in the daily life of user since hand is compared with other human bodies more freedom and flexibility
Work, the operation completed by hand are innumerable.As it can be seen that the image or view that identification includes hand object exists in the prior art
The demand of the semantic information of frequency, for example, whether identification image or the corresponding palm of video are in the demand for executing movement of waving.
Then the method provided by the above embodiment of the disclosure, extracts packet by obtaining target video from target video
Two frame video frames of the object containing palm are based on two frame video frames later, the movement of the corresponding palm of identification target video whether be
Movement of waving generates finally, the movement in response to the corresponding palm of determining target video is movement of waving and is used to indicate target view
Frequently the movement of corresponding palm is the recognition result of movement of waving, thus, it is possible to make electronic equipment determine in video whether
Palm object comprising executing movement of waving identifies richer palm attitude information, helps to realize based on gesture in turn
The human-computer interaction of identification.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for image for identification.The use
In the process 400 of the method for identification image, comprising the following steps:
Step 401, target video is obtained.Later, step 402 is executed.
In the present embodiment, step 401 and the step 201 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Step 402, the two frame video frames comprising palm object are extracted from target video.Later, step 403 is executed.
In the present embodiment, step 402 and the step 202 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Step 403, throwing of the normal vector in predetermined projection plane of the corresponding palm of two frame video frames is determined
Shadow coordinate.Later, step 404 is executed.
In the present embodiment, above-mentioned executing subject can determine the normal vector of the corresponding palm of two frame video frames pre-
The first projection coordinate of determining projection plane.Projection plane is the imaging plane of video frame.
Herein, above-mentioned executing subject or the electronic equipment communicated to connect with above-mentioned executing subject can use a variety of sides
Formula executes above-mentioned steps 403.
As an example, the two frame video frames that above-mentioned executing subject can extract step 402 are sequentially input to preparatory training
Palm normal vector determine model, successively obtain the normal vector of the corresponding palm of every frame video frame in above-mentioned two frames video frame,
Then, successively determine obtained normal vector in the projection coordinate of imaging plane.
Herein, above-mentioned palm normal vector determines that model is determined for the normal vector of the corresponding palm of video frame.Make
For example, above-mentioned palm normal vector determines that model can be the method that associated storage has video frame and the corresponding palm of video frame
The bivariate table or database of vector are also possible to the convolution mind obtained using machine learning algorithm based on the training of training sample set
Through network model.Wherein, the training sample in above-mentioned training sample set includes video frame and the corresponding palm of video frame
Normal vector.
As another example, the two frame video frames that above-mentioned executing subject can also extract step 402 are sequentially input to pre-
First trained projection coordinate determines model, obtains the corresponding projection coordinate of the video frame.Wherein, above-mentioned projection coordinate determines model
It is determined for the corresponding projection coordinate of video frame of input.Palm pair in the corresponding projection coordinate, that is, video frame of video frame
As corresponding palm normal vector imaging plane projection coordinate.
As an example, above-mentioned projection coordinate determines that model can be using machine learning algorithm, assembled for training based on training sample
The model got, is also possible to using geometric algorithm, to the plane where the corresponding palm of palm object in video frame into
After row normal vector calculates, then obtained normal vector is projected and obtains the formula of projection coordinate.
Illustratively, the training sample that above-mentioned training sample is concentrated includes that video frame and the corresponding projection of video frame are sat
Mark.Herein, the corresponding projection coordinate of each video frame that training sample is concentrated can be what artificial mark obtained;It is also possible to
It obtains in the following way:
Firstly, obtaining the video of palm rotation.Wherein, in each video frame in video palm normal vector projection coordinate
The middle coordinate value along first direction is all the same, and the seat in each video frame in the projection coordinate of palm normal vector in a second direction
Scale value persistently changes.Projection coordinate is projection coordinate of the palm normal vector on preset plane (such as imaging plane).Palm turns
Dynamic video content can reflect the variation tendency of palm normal vector.Here it is possible to pass through various equipment (such as above-mentioned execution master
Body or other electronic equipments communicated to connect with above-mentioned executing subject) shoot the video that palm rotates.Wherein, it is rotated in palm
During, direction of rotation can be kept to fix, there is no be displaced in the direction along shaft for palm.Wherein, the direction of shaft can
To be the direction where arm.For example, palm and finger are unfolded, guarantee palm and finger in the same plane, the direction of finger
For vertical direction.It is the initial position of palm where the palm when plane normal to screen direction, then the palm centre of the palm is to screen
Direction rotates, and plane where going to palm is parallel with screen, continues to rotate according to former direction of rotation, until to plane where palm
It is again perpendicular to screen direction, in palm rotation process, palm is not displaced in the vertical direction.The video bag of palm rotation
Containing multiple video frames.Wherein, palm rotation video each video frame in palm normal vector projection coordinate in along first direction
Coordinate value it is all the same, and the coordinate value in the image of the video in the projection coordinate of palm normal vector in a second direction persistently becomes
Change, wherein lasting variation is it is to be appreciated that lasting increase or lasting reduction.The projection coordinate of palm normal vector is palm
For normal vector in the coordinate of preset plane, preset plane can be imaging plane.It is obtained in another video using similar method
Coordinate value in each video frame in the projection coordinate of palm normal vector in a second direction is all the same, and palm in each video frame
The projection coordinate that coordinate value in the projection coordinate of normal vector along first direction persistently changes.
It is then possible to each video frame and the corresponding projection coordinate's group of the video frame that will include for above-mentioned video
At a training sample, and then obtain training sample set.
It is appreciated that causing since the method manually marked is difficult accurately to mark the projection coordinate of video frame
The model accuracy that training obtains is lower, and the method manually marked is troublesome in poeration, and the training effectiveness of model is lower.Therefore,
The training sample set obtained by the way of above-mentioned unartificial mark is raw to train projection coordinate to determine that model can be improved in model
At the accuracy of projection coordinate, the training effectiveness of model is improved.
Step 404, judge whether the projection coordinate of above-mentioned two frames video frame meets predetermined identification condition of waving.It
Afterwards, if so, thening follow the steps 405;If it is not, thening follow the steps 406.
In the present embodiment, above-mentioned executing subject may determine that whether the projection coordinate of above-mentioned two frames video frame meets in advance
Determining identification condition of waving.Wherein, above-mentioned identification condition of waving is that " projection coordinate of two frame video frames is in preset first party
To the absolute value of the difference of coordinate value be less than or equal to preset first threshold, also, the corresponding projection coordinate of two frame video frames
It is greater than or equal to preset second threshold in the absolute value of the difference of the coordinate value of preset second direction ".Wherein, first threshold and
Second threshold can be the pre-set numerical value of technical staff, and above-mentioned first threshold and second threshold can be equal or not
Deng.
Herein, satisfaction " it is above-mentioned wave identification condition in the case where, above-mentioned executing subject can continue to execute step
405;If being unsatisfactory for above-mentioned identification condition of waving, above-mentioned executing subject can execute step 406.
It step 405, is movement of waving by the action recognition of the corresponding palm of target video.Later, step 407 is executed.
In the present embodiment, in satisfaction " difference of the projection coordinate of two frame video frames in the coordinate value of preset first direction
Absolute value be less than or equal to preset first threshold, also, the corresponding projection coordinate of two frame video frames is in preset second party
To coordinate value absolute value of the difference be greater than or equal to preset second threshold " in the case where, above-mentioned executing subject can be by mesh
The action recognition of the corresponding palm of mark video is movement of waving.
It step 406, is non-movement of waving by the action recognition of the corresponding palm of target video.
In the present embodiment, in satisfaction " difference of the projection coordinate of two frame video frames in the coordinate value of preset first direction
Absolute value be greater than preset first threshold, alternatively, seat of the corresponding projection coordinate of two frame video frames in preset second direction
The absolute value of the difference of scale value is less than preset second threshold " in the case where, above-mentioned executing subject can be corresponding by target video
The action recognition of palm is non-movement of waving.
In the present embodiment, projection coordinate can be characterized by the first element and second element.Wherein, the first element can
To be coordinate value of the projection coordinate in above-mentioned preset first direction, second element can be projection coordinate above-mentioned preset
The coordinate value in two directions.It, can be by the corresponding palm of target video as a result, when meeting following formula (identification condition of waving)
Action recognition be to wave movement:
|dx|≤T1, also, | dy|≥T2,
Wherein, dxThe difference for characterizing the first element of the projection coordinate of above-mentioned two frames video frame, when above-mentioned two frames video frame
Projection coordinate is respectively " (xt, yt) " and " (xt-1, yt-1) " when, dxValue be xtWith xt-1Difference.Wherein, xtFor wherein frame view
First element of the projection coordinate of frequency frame.xt-1For the first element of the projection coordinate of another frame video frame.dyCharacterize above-mentioned two frame
The difference of the second element of the projection coordinate of video frame, when the projection coordinate of above-mentioned two frames video frame is respectively " (xt, yt) " and
“(xt-1, yt-1) " when, dyValue be ytWith yt-1Difference.Wherein, ytFor the second element of the wherein projection coordinate of a frame video frame,
yt-1For the second element of the projection coordinate of another frame video frame.
In addition, when being unsatisfactory for above-mentioned formula, can be by the action recognition of the corresponding palm of target video it is non-wave it is dynamic
Make.
Step 407, generating and being used to indicate the movement of the corresponding palm of target video is the recognition result of movement of waving.
In the present embodiment, it is to recruit that above-mentioned executing subject, which can be generated and be used to indicate the movement of the corresponding palm of target video,
The recognition result made manually.
It should be noted that the present embodiment can also include embodiment corresponding with Fig. 2 in addition to documented content above
Same or similar feature, effect, details are not described herein.
Figure 4, it is seen that compared with the corresponding embodiment of Fig. 2, the method for the image for identification in the present embodiment
Process 400 highlight the projection coordinate based on two frame video frames, whether the movement of the corresponding palm of identification target video is trick
The step of making manually.The scheme of the present embodiment description can more acurrate, rapidly identify the dynamic of the corresponding palm of video as a result,
It whether is movement of waving.
With further reference to Fig. 5, as the realization to method shown in above-mentioned Fig. 2, present disclose provides one kind to scheme for identification
One embodiment of the device of picture, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, except following documented special
Sign is outer, which can also include feature identical or corresponding with embodiment of the method shown in Fig. 2, and generates and scheme
Embodiment of the method shown in 2 is identical or corresponding effect.The device specifically can be applied in various electronic equipments.
As shown in figure 5, the device 500 of the image for identification of the present embodiment includes: acquiring unit 501, extraction unit
502, recognition unit 503 and the first generation unit 504.Wherein, acquiring unit 501 is configured to obtain target video;It extracts single
Member 502 is configured to extract the two frame video frames comprising palm object from target video;Recognition unit 503 is configured to be based on
Whether the movement of two frame video frames, the corresponding palm of identification target video is movement of waving;First generation unit 504 is configured to
In response to determining that the movement of the corresponding palm of target video is movement of waving, generation is used to indicate the corresponding palm of target video
Movement is the recognition result of movement of waving.
In the present embodiment, for identification the acquiring unit 501 of the device 500 of image can by wired connection mode or
Person's radio connection obtains target video from other electronic equipments or locally.Wherein, above-mentioned target video can be to right
Its video for carrying out palm action identification.
In the present embodiment, said extracted unit 502 can extract packet from the target video that acquiring unit 501 is got
Two frame video frames of the object containing palm.
In the present embodiment, the two frame videos that above-mentioned recognition unit 503 can be extracted based on said extracted unit 502
Whether the movement of frame, the corresponding palm of target video that identification acquiring unit 501 is got is movement of waving.Wherein, target regards
Frequently corresponding palm can be obtain target video during captured palm.
In the present embodiment, it is in response to the movement of the corresponding palm of target video for determining that acquiring unit 501 is got
It waves movement, it is to wave that above-mentioned first generation unit 504, which can be generated and be used to indicate the movement of the corresponding palm of the target video,
The recognition result of movement.
In some optional implementations of the present embodiment, above-mentioned recognition unit 503 includes: that input module (is in figure
Show) it is configured to for two frame video frames being input to identification model trained in advance, to determine the corresponding palm of target video
Whether movement is movement of waving, wherein identification model includes the corresponding hand of video of two inputted frame video frames for identification
Whether the movement of the palm is movement of waving.
In some optional implementations of the present embodiment, recognition unit 503 comprises determining that module (showing in figure)
It is configured to determine projection coordinate of the normal vector in predetermined projection plane of the corresponding palm of two frame video frames;With
And identification module (showing in figure) is configured to identify the corresponding palm of target video based on identified Liang Ge projection coordinate
Movement whether be movement of waving.
In some optional implementations of the present embodiment, projection plane is the imaging plane of video frame;And identification
Module includes: that the first identification submodule (showing in figure) is configured in response to the projection coordinate of two frame video frames preset
The absolute value of the difference of the coordinate value of first direction is less than or equal to preset first threshold, also, the corresponding throwing of two frame video frames
Shadow coordinate is greater than or equal to preset second threshold in the absolute value of the difference of the coordinate value of preset second direction, by target video
The action recognition of corresponding palm is movement of waving.
In some optional implementations of the present embodiment, projection plane is the imaging plane of video frame;And identification
Module includes: that the second identification submodule (showing in figure) is configured in response to the projection coordinate of two frame video frames preset
The absolute value of the difference of the coordinate value of first direction is greater than preset first threshold, alternatively, the corresponding projection coordinate of two frame video frames
It is less than preset second threshold in the absolute value of the difference of the coordinate value of preset second direction, by the corresponding palm of target video
Action recognition is non-movement of waving.
In some optional implementations of the present embodiment, extraction unit 502 includes: extraction module (showing in figure)
It is configured to extract the two adjacent frame video frames comprising palm object from target video.
In some optional implementations of the present embodiment, the device 500 further include: integrated unit (is shown) in figure
It is configured in response to generate recognition result, will include target object in target video for the video frame for including palm object
Image and the video frame carry out image co-registration, obtain fused image corresponding with the video frame;And second generate it is single
Member (showing in figure) is configured to generate fusion rear video, and is presented using fusion rear video substitution target video.
The device of the image for identification provided by the above embodiment of the disclosure obtains target view by acquiring unit 501
Frequently, then, extraction unit 502 extracts the two frame video frames comprising palm object, later, recognition unit 503 from target video
Based on two frame video frames, identify whether the movement of the corresponding palm of target video is movement of waving, finally, the first generation unit
504 movement in response to determining the corresponding palm of target video is movement of waving, and generation is used to indicate the corresponding hand of target video
The movement of the palm is the recognition result of movement of waving, thus, it is possible to determine electronic equipment whether in video comprising executing trick
The palm object made manually identifies richer palm attitude information, helps to realize based on the man-machine of gesture identification in turn
Interaction.
Below with reference to Fig. 6, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1
Server or terminal device) 600 structural schematic diagram.Terminal device in embodiment of the disclosure can include but is not limited to all
As mobile phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD (tablet computer), PMP are (portable
Formula multimedia player), the mobile terminal and such as number TV, desk-top meter of car-mounted terminal (such as vehicle mounted guidance terminal) etc.
The fixed terminal of calculation machine etc..Terminal device/server shown in Fig. 6 is only an example, should not be to the implementation of the disclosure
The function and use scope of example bring any restrictions.
As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.)
601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608
Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment
Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM603 are connected with each other by bus 604.
Input/output (I/O) interface 605 is also connected to bus 604.
In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph
As the input unit 606 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration
The output device 607 of dynamic device etc.;Storage device 608 including such as tape, hard disk etc.;And communication device 609.Communication device
609, which can permit electronic equipment 600, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 6 shows tool
There is the electronic equipment 600 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with
Alternatively implement or have more or fewer devices.Each box shown in Fig. 6 can represent a device, can also root
According to needing to represent multiple devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communication device 609, or from storage device 608
It is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, the implementation of the disclosure is executed
The above-mentioned function of being limited in the method for example.
It is situated between it should be noted that computer-readable medium described in embodiment of the disclosure can be computer-readable signal
Matter or computer readable storage medium either the two any combination.Computer readable storage medium for example can be with
System, device or the device of --- but being not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or it is any more than
Combination.The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires
Electrical connection, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type are programmable
Read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited
Memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer readable storage medium, which can be, appoints
What include or the tangible medium of storage program that the program can be commanded execution system, device or device use or and its
It is used in combination.And in embodiment of the disclosure, computer-readable signal media may include in a base band or as carrier wave
The data-signal that a part is propagated, wherein carrying computer-readable program code.The data-signal of this propagation can be adopted
With diversified forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal is situated between
Matter can also be any computer-readable medium other than computer readable storage medium, which can be with
It sends, propagate or transmits for by the use of instruction execution system, device or device or program in connection.Meter
The program code for including on calculation machine readable medium can transmit with any suitable medium, including but not limited to: electric wire, optical cable,
RF (radio frequency) etc. or above-mentioned any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not
It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more
When a program is executed by the electronic equipment, so that the electronic equipment: obtaining target video;Extract from target video includes palm
Two frame video frames of object;Based on two frame video frames, identify whether the movement of the corresponding palm of target video is movement of waving;It rings
It should generate in determining that the movement of the corresponding palm of target video is movement of waving and be used to indicate the dynamic of the corresponding palm of target video
Work is the recognition result of movement of waving.
The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof
The computer program code of work, described program design language include object oriented program language-such as Java,
Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language
Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence
Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or
It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through
The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor
Including acquiring unit, extraction unit, recognition unit and the first generation unit.Wherein, the title of these units is under certain conditions
The restriction to the unit itself is not constituted, for example, acquiring unit is also described as " obtaining the unit of target video ".
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (16)
1. a kind of method of image for identification, comprising:
Obtain target video;
The two frame video frames comprising palm object are extracted from the target video;
Based on the two frames video frame, identify whether the movement of the corresponding palm of the target video is movement of waving;
Movement in response to the corresponding palm of the determination target video is movement of waving, and generation is used to indicate the target video
The movement of corresponding palm is the recognition result of movement of waving.
2. it is described to be based on the two frames video frame according to the method described in claim 1, wherein, identify the target video pair
Whether the movement for the palm answered is movement of waving, comprising:
The two frames video frame is input to identification model trained in advance, with the dynamic of the corresponding palm of the determination target video
It whether is movement of waving, wherein the identification model includes that the video of two inputted frame video frames is corresponding for identification
Whether the movement of palm is movement of waving.
3. it is described to be based on the two frames video frame according to the method described in claim 1, wherein, identify the target video pair
Whether the movement for the palm answered is movement of waving, comprising:
Determine projection coordinate of the normal vector in predetermined projection plane of the corresponding palm of the two frames video frame;
Based on identified Liang Ge projection coordinate, identify whether the movement of the corresponding palm of the target video is movement of waving.
4. according to the method described in claim 3, wherein, projection plane is the imaging plane of video frame;And
It is described based on identified Liang Ge projection coordinate, identify the corresponding palm of the target video movement whether be wave it is dynamic
Make, comprising:
Be less than in response to the absolute value of the difference of coordinate value of the projection coordinate in preset first direction of the two frames video frame or
Equal to preset first threshold, also, the corresponding projection coordinate of the two frames video frame is in the coordinate value of preset second direction
Absolute value of the difference be greater than or equal to preset second threshold, by the action recognition of the corresponding palm of the target video be wave
Movement.
5. according to the method described in claim 3, wherein, projection plane is the imaging plane of video frame;And
It is described based on identified Liang Ge projection coordinate, identify the corresponding palm of the target video movement whether be wave it is dynamic
Make, comprising:
In response to coordinate value of the projection coordinate in preset first direction of the two frames video frame absolute value of the difference be greater than it is pre-
If first threshold, alternatively, the difference of coordinate value of the corresponding projection coordinate of the two frames video frame in preset second direction
Absolute value is less than preset second threshold, is non-movement of waving by the action recognition of the corresponding palm of the target video.
6. according to the method described in claim 1, wherein, the extraction from the target video includes two frames of palm object
Video frame, comprising:
The two adjacent frame video frames comprising palm object are extracted from the target video.
7. method described in one of -6 according to claim 1, wherein the method also includes:
It will include target in the target video for the video frame for including palm object in response to generating the recognition result
The image of object and the video frame carry out image co-registration, obtain fused image corresponding with the video frame;
Fusion rear video is generated, and the target video is substituted using the fusion rear video and is presented.
8. a kind of device of image for identification, comprising:
Acquiring unit is configured to obtain target video;
Extraction unit is configured to extract the two frame video frames comprising palm object from the target video;
Recognition unit, be configured to identify based on the two frames video frame the corresponding palm of the target video movement whether
For movement of waving;
First generation unit is configured in response to determine that the movement of the corresponding palm of the target video is movement of waving, raw
It is the recognition result of movement of waving at the movement for being used to indicate the corresponding palm of the target video.
9. device according to claim 8, wherein the recognition unit includes:
Input module is configured to for the two frames video frame being input to identification model trained in advance, with the determination target
Whether the movement of the corresponding palm of video is movement of waving, wherein the identification model includes two inputted frames for identification
Whether the movement of the corresponding palm of the video of video frame is movement of waving.
10. device according to claim 8, wherein the recognition unit includes:
Determining module is configured to determine the normal vector of the corresponding palm of the two frames video frame in predetermined projection
The projection coordinate of plane;
Identification module is configured to identify the dynamic of the corresponding palm of the target video based on identified Liang Ge projection coordinate
It whether is movement of waving.
11. device according to claim 10, wherein projection plane is the imaging plane of video frame;And
The identification module includes:
First identification submodule, is configured in response to seat of the projection coordinate in preset first direction of the two frames video frame
The absolute value of the difference of scale value is less than or equal to preset first threshold, also, the corresponding projection coordinate of the two frames video frame exists
The absolute value of the difference of the coordinate value of preset second direction is greater than or equal to preset second threshold, and the target video is corresponding
The action recognition of palm be to wave movement.
12. device according to claim 10, wherein projection plane is the imaging plane of video frame;And
The identification module includes:
Second identification submodule, is configured in response to seat of the projection coordinate in preset first direction of the two frames video frame
The absolute value of the difference of scale value is greater than preset first threshold, alternatively, the corresponding projection coordinate of the two frames video frame is preset
The absolute value of the difference of the coordinate value of second direction is less than preset second threshold, by the movement of the corresponding palm of the target video
It is identified as non-movement of waving.
13. device according to claim 8, wherein the extraction unit includes:
Extraction module is configured to extract the two adjacent frame video frames comprising palm object from the target video.
14. the device according to one of claim 8-13, wherein described device further include:
Integrated unit is configured in response to generate the recognition result, for including palm object in the target video
Image comprising target object and the video frame are carried out image co-registration, obtain fusion corresponding with the video frame by video frame
Image afterwards;
Second generation unit is configured to generate fusion rear video, and substitutes the target view using the fusion rear video
Frequency is presented.
15. a kind of electronic equipment, comprising:
One or more processors;
Storage device is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
The now method as described in any in claim 1-7.
16. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor
The now method as described in any in claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910558852.5A CN110263743B (en) | 2019-06-26 | 2019-06-26 | Method and device for recognizing images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910558852.5A CN110263743B (en) | 2019-06-26 | 2019-06-26 | Method and device for recognizing images |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110263743A true CN110263743A (en) | 2019-09-20 |
CN110263743B CN110263743B (en) | 2023-10-13 |
Family
ID=67921630
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910558852.5A Active CN110263743B (en) | 2019-06-26 | 2019-06-26 | Method and device for recognizing images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263743B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111143613A (en) * | 2019-12-30 | 2020-05-12 | 携程计算机技术(上海)有限公司 | Method, system, electronic device and storage medium for selecting video cover |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6944315B1 (en) * | 2000-10-31 | 2005-09-13 | Intel Corporation | Method and apparatus for performing scale-invariant gesture recognition |
CN105794191A (en) * | 2013-12-17 | 2016-07-20 | 夏普株式会社 | Recognition data transmission device |
CN105809144A (en) * | 2016-03-24 | 2016-07-27 | 重庆邮电大学 | Gesture recognition system and method adopting action segmentation |
CN107438804A (en) * | 2016-10-19 | 2017-12-05 | 深圳市大疆创新科技有限公司 | A kind of Wearable and UAS for being used to control unmanned plane |
CN108345387A (en) * | 2018-03-14 | 2018-07-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for output information |
CN108549490A (en) * | 2018-05-03 | 2018-09-18 | 林潼 | A kind of gesture identification interactive approach based on Leap Motion equipment |
CN109455180A (en) * | 2018-11-09 | 2019-03-12 | 百度在线网络技术(北京)有限公司 | Method and apparatus for controlling unmanned vehicle |
CN109597485A (en) * | 2018-12-04 | 2019-04-09 | 山东大学 | A kind of gesture interaction system and its working method based on two fingers angular domain feature |
CN109635767A (en) * | 2018-12-20 | 2019-04-16 | 北京字节跳动网络技术有限公司 | A kind of training method, device, equipment and the storage medium of palm normal module |
US20190113980A1 (en) * | 2013-10-16 | 2019-04-18 | Leap Motion, Inc. | Velocity field interaction for free space gesture interface and control |
CN109889892A (en) * | 2019-04-16 | 2019-06-14 | 北京字节跳动网络技术有限公司 | Video effect adding method, device, equipment and storage medium |
-
2019
- 2019-06-26 CN CN201910558852.5A patent/CN110263743B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6944315B1 (en) * | 2000-10-31 | 2005-09-13 | Intel Corporation | Method and apparatus for performing scale-invariant gesture recognition |
US20190113980A1 (en) * | 2013-10-16 | 2019-04-18 | Leap Motion, Inc. | Velocity field interaction for free space gesture interface and control |
CN105794191A (en) * | 2013-12-17 | 2016-07-20 | 夏普株式会社 | Recognition data transmission device |
CN105809144A (en) * | 2016-03-24 | 2016-07-27 | 重庆邮电大学 | Gesture recognition system and method adopting action segmentation |
CN107438804A (en) * | 2016-10-19 | 2017-12-05 | 深圳市大疆创新科技有限公司 | A kind of Wearable and UAS for being used to control unmanned plane |
CN108345387A (en) * | 2018-03-14 | 2018-07-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for output information |
CN108549490A (en) * | 2018-05-03 | 2018-09-18 | 林潼 | A kind of gesture identification interactive approach based on Leap Motion equipment |
CN109455180A (en) * | 2018-11-09 | 2019-03-12 | 百度在线网络技术(北京)有限公司 | Method and apparatus for controlling unmanned vehicle |
CN109597485A (en) * | 2018-12-04 | 2019-04-09 | 山东大学 | A kind of gesture interaction system and its working method based on two fingers angular domain feature |
CN109635767A (en) * | 2018-12-20 | 2019-04-16 | 北京字节跳动网络技术有限公司 | A kind of training method, device, equipment and the storage medium of palm normal module |
CN109889892A (en) * | 2019-04-16 | 2019-06-14 | 北京字节跳动网络技术有限公司 | Video effect adding method, device, equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111143613A (en) * | 2019-12-30 | 2020-05-12 | 携程计算机技术(上海)有限公司 | Method, system, electronic device and storage medium for selecting video cover |
CN111143613B (en) * | 2019-12-30 | 2024-02-06 | 携程计算机技术(上海)有限公司 | Method, system, electronic device and storage medium for selecting video cover |
Also Published As
Publication number | Publication date |
---|---|
CN110263743B (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110288049A (en) | Method and apparatus for generating image recognition model | |
EP3972239A1 (en) | Method and apparatus for virtual fitting | |
CN109599113A (en) | Method and apparatus for handling information | |
CN110110811A (en) | Method and apparatus for training pattern, the method and apparatus for predictive information | |
CN109858445A (en) | Method and apparatus for generating model | |
CN109902659A (en) | Method and apparatus for handling human body image | |
CN110009059A (en) | Method and apparatus for generating model | |
CN110298906A (en) | Method and apparatus for generating information | |
CN110348419B (en) | Method and device for photographing | |
CN110298319B (en) | Image synthesis method and device | |
CN110162670A (en) | Method and apparatus for generating expression packet | |
CN109829432A (en) | Method and apparatus for generating information | |
CN109993150A (en) | The method and apparatus at age for identification | |
CN110188719A (en) | Method for tracking target and device | |
CN109086719A (en) | Method and apparatus for output data | |
CN108345387A (en) | Method and apparatus for output information | |
CN108510454A (en) | Method and apparatus for generating depth image | |
CN109600559B (en) | Video special effect adding method and device, terminal equipment and storage medium | |
CN109754464A (en) | Method and apparatus for generating information | |
CN110059623A (en) | Method and apparatus for generating information | |
CN110264539A (en) | Image generating method and device | |
CN110111241A (en) | Method and apparatus for generating dynamic image | |
CN109934142A (en) | Method and apparatus for generating the feature vector of video | |
CN110097004B (en) | Facial expression recognition method and device | |
CN109829431A (en) | Method and apparatus for generating information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |