CN110263743A

CN110263743A - The method and apparatus of image for identification

Info

Publication number: CN110263743A
Application number: CN201910558852.5A
Authority: CN
Inventors: 卢艺帆
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2019-09-20
Anticipated expiration: 2039-06-26
Also published as: CN110263743B

Abstract

Embodiment of the disclosure discloses the method and apparatus of image for identification.One specific embodiment of this method includes: acquisition target video；The two frame video frames comprising palm object are extracted from target video；Based on two frame video frames, identify whether the movement of the corresponding palm of target video is movement of waving；In response to determining that the movement of the corresponding palm of target video is movement of waving, the movement that generation is used to indicate the corresponding palm of target video is the recognition result of movement of waving.The embodiment can make electronic equipment determine in video whether comprising executing the palm object for movement of waving to identify richer human body attitude information in turn.

Description

The method and apparatus of image for identification

Technical field

Embodiment of the disclosure is related to field of computer technology, and in particular to the method and apparatus of image for identification.

Background technique

Image recognition refers to and is handled image, analyzed and understood using computer, to identify various targets and object Technology.

Under human-computer interaction scene, more and more researchers are dedicated to the research of the interaction technique of hand.Compared to it His human body, hand freedom and flexibility are responsible for a large amount of interworking in the daily life of user, pass through what hand was completed It operates innumerable.As it can be seen that the demand identified comprising the image of hand object or the semantic information of video exists in the prior art.

Summary of the invention

The present disclosure proposes the method and apparatus of image for identification.

In a first aspect, embodiment of the disclosure provides a kind of method of image for identification, this method comprises: obtaining mesh Mark video；The two frame video frames comprising palm object are extracted from target video；Based on two frame video frames, target video pair is identified Whether the movement for the palm answered is movement of waving；It is raw in response to determining that the movement of the corresponding palm of target video is movement of waving It is the recognition result of movement of waving at the movement for being used to indicate the corresponding palm of target video.

In some embodiments, two frame video frames are based on, whether the movement of the corresponding palm of identification target video is to wave Movement, comprising: two frame video frames are input to identification model trained in advance, to determine the movement of the corresponding palm of target video It whether is movement of waving, wherein identification model includes the corresponding palm of video of two inputted frame video frames for identification Whether movement is movement of waving.

In some embodiments, two frame video frames are based on, whether the movement of the corresponding palm of identification target video is to wave Movement, comprising: determine projection coordinate of the normal vector in predetermined projection plane of the corresponding palm of two frame video frames； Based on identified Liang Ge projection coordinate, identify whether the movement of the corresponding palm of target video is movement of waving.

In some embodiments, projection plane is the imaging plane of video frame；And it is sat based on identified two projections Whether the movement of mark, the corresponding palm of identification target video is movement of waving, comprising: in response to the projection coordinate of two frame video frames It is less than or equal to preset first threshold, also, two frame video frames in the absolute value of the difference of the coordinate value of preset first direction Corresponding projection coordinate is greater than or equal to preset second threshold in the absolute value of the difference of the coordinate value of preset second direction, will The action recognition of the corresponding palm of target video is movement of waving.

In some embodiments, projection plane is the imaging plane of video frame；And it is sat based on identified two projections Whether the movement of mark, the corresponding palm of identification target video is movement of waving, comprising: in response to the projection coordinate of two frame video frames It is greater than preset first threshold in the absolute value of the difference of the coordinate value of preset first direction, alternatively, two frame video frames are corresponding Projection coordinate is less than preset second threshold in the absolute value of the difference of the coordinate value of preset second direction, and target video is corresponding Palm action recognition be non-movement of waving.

In some embodiments, the two frame video frames comprising palm object are extracted from target video, comprising: regard from target The two adjacent frame video frames comprising palm object are extracted in frequency.

In some embodiments, this method further include: in response to generate recognition result, in target video include palm Image comprising target object and the video frame are carried out image co-registration, obtained corresponding with the video frame by the video frame of object Fused image；Fusion rear video is generated, and is presented using fusion rear video substitution target video.

Second aspect, embodiment of the disclosure provide a kind of device of image for identification, which includes: to obtain list Member is configured to obtain target video；Extraction unit is configured to extract the two frames view comprising palm object from target video Frequency frame；Recognition unit is configured to based on two frame video frames, the movement of the corresponding palm of identification target video whether be wave it is dynamic Make；First generation unit is configured in response to determine that the movement of the corresponding palm of target video is movement of waving, and generation is used for The movement for indicating the corresponding palm of target video is the recognition result of movement of waving.

In some embodiments, recognition unit includes: input module, is configured to two frame video frames being input to preparatory instruction Experienced identification model, to determine whether the movement of the corresponding palm of target video is movement of waving, wherein identification model is for knowing Whether the movement of the corresponding palm of video for the two frame video frames that Bao Han do not inputted is movement of waving.

In some embodiments, recognition unit comprises determining that module, and it is corresponding to be configured to determine two frame video frames Projection coordinate of the normal vector of palm in predetermined projection plane；Identification module is configured to based on identified two Whether the movement of projection coordinate, the corresponding palm of identification target video is movement of waving.

In some embodiments, projection plane is the imaging plane of video frame；And identification module includes: the first identification Module, the projection coordinate for being configured in response to two frame video frames are small in the absolute value of the difference of the coordinate value of preset first direction In or be equal to preset first threshold, also, the corresponding projection coordinate of two frame video frames is in the coordinate value of preset second direction Absolute value of the difference be greater than or equal to preset second threshold, by the action recognition of the corresponding palm of target video be wave it is dynamic Make.

In some embodiments, projection plane is the imaging plane of video frame；And identification module includes: the second identification Module, the projection coordinate for being configured in response to two frame video frames are big in the absolute value of the difference of the coordinate value of preset first direction In preset first threshold, alternatively, the difference of coordinate value of the corresponding projection coordinate of two frame video frames in preset second direction Absolute value is less than preset second threshold, is non-movement of waving by the action recognition of the corresponding palm of target video.

In some embodiments, extraction unit includes: extraction module, is configured to extract from target video comprising palm The two adjacent frame video frames of object.

In some embodiments, the device further include: integrated unit is configured in response to generate recognition result, for Include the video frame of palm object in target video, the image comprising target object and the video frame are subjected to image co-registration, obtained To fused image corresponding with the video frame；Second generation unit is configured to generate fusion rear video, and uses and melt Rear video substitution target video is closed to be presented.

The third aspect, embodiment of the disclosure provide a kind of electronic equipment of image for identification, comprising: one or more A processor；Storage device is stored thereon with one or more programs, when said one or multiple programs are by said one or more A processor executes, so that the one or more processors are realized such as any embodiment in the method for above-mentioned image for identification Method.

Fourth aspect, embodiment of the disclosure provide a kind of computer-readable medium of image for identification, deposit thereon Computer program is contained, is realized when which is executed by processor such as any embodiment in the method for above-mentioned image for identification Method.

The method and apparatus for the image for identification that embodiment of the disclosure provides, by obtaining target video, then, from The two frame video frames comprising palm object are extracted in target video, later, are based on two frame video frames, identification target video is corresponding Whether the movement of palm is movement of waving, finally, in response to determining that the movement of the corresponding palm of target video is movement of waving, it is raw It is the recognition result of movement of waving at the movement for being used to indicate the corresponding palm of target video, thus, it is possible to make electronic equipment Determine whether identify richer palm attitude information in turn comprising executing the palm object for movement of waving in video, have Help realize the human-computer interaction based on gesture identification.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the disclosure can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for the image for identification of the disclosure；

Fig. 3 is the schematic diagram according to an application scenarios of the method for the image for identification of the disclosure；

Fig. 4 is the flow chart according to another embodiment of the method for the image for identification of the disclosure；

Fig. 5 is the structural schematic diagram according to one embodiment of the device of the image for identification of the disclosure；

Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of embodiment of the disclosure.

Specific embodiment

The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 shows the method that can apply the image for identification of embodiment of the disclosure or for identification dress of image The exemplary system architecture 100 for the embodiment set.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send data (such as video) etc..Various client applications, such as video playing can be installed on terminal device 101,102,103 Software, the application of Domestic News class, image processing class application, web browser applications, the application of shopping class, searching class are applied, immediately Means of communication, mailbox client, social platform software etc..

Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be the various electronic equipments with video capture function, including but not limited to smart phone, tablet computer, electronics Book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert Compression standard audio level 3), (Moving Picture Experts Group Audio Layer IV, dynamic image are special by MP4 Family's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..When terminal device 101, 102,103 when being software, may be mounted in above-mentioned cited electronic equipment.Multiple softwares or software mould may be implemented into it Block (such as providing the software of Distributed Services or software module), also may be implemented into single software or software module.? This is not specifically limited.

Server 105 can be to provide the server of various services, such as to the view that terminal device 101,102,103 is sent Frequently the background server that (such as target video) is handled.Background server can know the data such as the video received It Deng not handle, and generate processing result (such as recognition result).Optionally, above-mentioned background server can also be anti-by processing result It feeds terminal device.As an example, server 105 can be cloud server, it is also possible to physical server.

It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented At single software or software module.It is not specifically limited herein.

It should also be noted that, the method for image can be held by server for identification provided by embodiment of the disclosure Row, can also be executed, can also be fitted to each other execution by server and terminal device by terminal device.Correspondingly, for identification The various pieces (such as each unit, subelement, module, submodule) that the device of image includes can all be set to server In, it can also all be set in terminal device, can also be respectively arranged in server and terminal device.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.When the method operation electricity thereon of image for identification Sub- equipment is during executing this method, and when not needing to carry out data transmission with other electronic equipments, which can be with It only include the electronic equipment (such as server or terminal device) of the method operation of image for identification thereon.

With continued reference to Fig. 2, the process of one embodiment of the method for the image for identification according to the disclosure is shown 200.The method of the image for identification, comprising the following steps:

Step 201, target video is obtained.

In the present embodiment, (such as server shown in FIG. 1 or terminal are set the executing subject of the method for image for identification It is standby) target video can be obtained from other electronic equipments or locally by wired connection mode or radio connection.

Wherein, above-mentioned target video can be the video to carry out palm action identification to it.

As an example, the target video can be the electricity by above-mentioned executing subject or with the communication connection of above-mentioned executing subject Sub- equipment, video obtained from being shot to the palm of user.

It is appreciated that when the video obtained from target video is shot to the palm of user, in the target video Some or all of video frame in may include palm object.Wherein, palm object, which can be in video frame (i.e. image), is in The image of existing palm.

Step 202, the two frame video frames comprising palm object are extracted from target video.

In the present embodiment, above-mentioned executing subject can be from the target video that step 201 is got, and extracting includes palm Two frame video frames of object.

It should be noted that in the step 202, above-mentioned executing subject extracted video frame from above-mentioned target video Quantity can be 2, be also possible to the random natural number greater than 2.Embodiment of the disclosure is not construed as limiting this.

It is appreciated that since the quantity of above-mentioned executing subject extracted video frame from above-mentioned target video is greater than 2 Random natural number when, certainty is extracted the two frame video frames comprising palm object from target video, thus, above-mentioned execution The quantity of main body extracted video frame from above-mentioned target video is the scheme of the random natural number greater than 2, is also contained in Within embodiment of the disclosure technical proposal scope claimed.

As an example, above-mentioned executing subject can in the following way, to execute the step 202:

Firstly, identified for the video frame in target video to the video frame, with determine the video frame whether include Palm object marks the video frame if the video frame includes palm object.

Then, two frame video frames are randomly chosen from each video frame marked, alternatively, from each view marked 1 frame video frame is randomly chosen in frequency frame, then choose from the video with the video frame period predetermined natural number (such as 0,1,2, 3 etc.) video frame and the video frame being located at after the video frame, to extract from target video comprising palm object Two frame video frames.

In some optional implementations of the present embodiment, above-mentioned executing subject can also execute in the following way The step 202: the two adjacent frame video frames comprising palm object are extracted from target video.Here, two adjacent frame videos Frame can be the video frame (but there may be the video frames not comprising palm object therebetween) being not present therebetween comprising palm object Two frame video frames, be also possible to therebetween be not present video frame two frame video frames.

As an example, above-mentioned executing subject can randomly extract adjacent two comprising palm object from target video Frame video frame can also extract two adjacent frame video frames comprising palm object, occurring at first, i.e. mesh from target video Mark the first frame video frame and the second frame video frame in video comprising palm object.

It is appreciated that since the movement of hand is more flexible and movement velocity is very fast, thus, when extracted two frames video When frame is adjacent, relative to extracted two frames video frame not necessarily adjacent situation, this optional implementation can be by pre- The relative movement distance for estimating the corresponding palm of palm object in two frame video frames, so that subsequent step can be more accurately Whether the movement for identifying palm is movement of waving.

Step 203, two frame video frames are based on, whether the movement of the corresponding palm of identification target video is movement of waving.

In the present embodiment, above-mentioned executing subject can be based on extracted two frames video frame in step 202, identification step Whether the movement of the acquired corresponding palm of target video is movement of waving in 201.Wherein, the corresponding palm of target video can To be palm captured during obtaining target video.

In some optional implementations of the present embodiment, above-mentioned executing subject can execute this in the following way Step 203:

Two frame video frames are input to in advance trained identification model, to determine that the movement of the corresponding palm of target video is No is movement of waving.Wherein, identification model includes the dynamic of the corresponding palm of video of two inputted frame video frames for identification It whether is movement of waving.

As an example, above-mentioned identification model, which can be associated storage, there are multiple two frames video frames, and it is used to indicate and includes Whether the movement of the corresponding palm of video of two frame video frames of each of above-mentioned multiple two frames video frames is the letter of movement of waving The bivariate table or database of breath.

As another example, above-mentioned identification model is also possible to using machine learning algorithm, based on training sample set training Obtained convolutional neural networks model.Wherein, the training sample that above-mentioned training sample is concentrated may include two frame video frames, and Whether the movement for being used to indicate the corresponding palm of video comprising the two frames video frame is the information of movement of waving.

In some optional implementations of the present embodiment, above-mentioned executing subject can also execute in the following way The step 203:

The first step determines projection of the normal vector in predetermined projection plane of the corresponding palm of two frame video frames Coordinate.

Wherein, above-mentioned projection plane can be predetermined arbitrary plane in three-dimensional space, for example, projection plane can be with For be parallel to video frame imaging plane (i.e. the picture planes of the image acquiring devices such as video camera) arbitrary plane.Projection is sat Mark can be characterized by the first element and second element.Wherein, the first element can be projection coordinate in preset first direction Coordinate value.Second element can be projection coordinate in the coordinate value of preset second direction.As an example, projection coordinate can be with It is expressed as " (x, y) ".Wherein, x can be the first element, and y can be second element.The direction of the normal vector of palm can be vertical The palm of the hand side of palm is directly directed toward in the plane where palm.The size of the normal vector of palm can be predetermined numerical value.

It is appreciated that since projection plane is two-dimensional surface, thus, projection coordinate need to only pass through the first element and second yuan Plain two elements can characterize.However, in some cases, projection coordinate can also be characterized by more multielement.

Second step identifies whether the movement of the corresponding palm of target video is trick based on identified Liang Ge projection coordinate Make manually.

Specifically, above-mentioned executing subject can in the following way, execute should step 2:

Identified Liang Ge projection coordinate is input to disaggregated model trained in advance, generation is used to indicate this two projections Whether the movement of the corresponding palm of the corresponding video of coordinate is the discriminant information of movement of waving.

Wherein, above-mentioned disaggregated model can be is obtained using following steps training:

Firstly, obtaining training sample set.Wherein, the training sample that training sample is concentrated includes Liang Ge projection coordinate, and Whether the movement for being used to indicate the corresponding palm of the corresponding video of Liang Ge projection coordinate is the discriminant information of movement of waving.Its In, between " Liang Ge projection coordinate " therein for " the corresponding video of Liang Ge projection coordinate " and " video ", there are following relationships: Determine the normal vector of the corresponding palm of two frame video frames in above-mentioned " video " comprising palm object in predetermined throwing Above-mentioned " Liang Ge projection coordinate " can be obtained in the Liang Ge projection coordinate of shadow plane.

Then, using machine learning algorithm, the Liang Ge projection coordinate for including by the training sample that above-mentioned training sample is concentrated As the input data of initial model, using discriminant information corresponding with the Liang Ge projection coordinate of input as the phase of initial model It hopes output data, determines whether initial model meets predetermined trained termination condition, if it is satisfied, then training knot will be met The initial model of beam condition is determined as disaggregated model.

Wherein, initial model may include: convolutional layer, classifier, pond layer etc..Training termination condition may include but Be not limited at least one of following: the training time is more than preset duration, and frequency of training is more than preset times, is based on reality output data The functional value for the predetermined loss function being calculated with desired output data is less than preset threshold.Here, above-mentioned reality Output data is after input data is input to initial model, and initial model is by calculating exported data.

In some optional implementations of the present embodiment, above-mentioned projection plane is the imaging plane of video frame.As a result, Above-mentioned executing subject can also execute in the following way it is above-mentioned step 2:

Be less than in response to the projection coordinate of two frame video frames in the absolute value of the difference of the coordinate value of preset first direction or Equal to preset first threshold, also, the difference of coordinate value of the corresponding projection coordinate of two frame video frames in preset second direction Absolute value be greater than or equal to preset second threshold, be to wave movement by the action recognition of the corresponding palm of target video.

It is greater than in advance in response to the projection coordinate of two frame video frames in the absolute value of the difference of the coordinate value of preset first direction If first threshold, alternatively, the difference of coordinate value of the corresponding projection coordinate of two frame video frames in preset second direction is absolute Value is less than preset second threshold, is non-movement of waving by the action recognition of the corresponding palm of target video.

In some optional implementations of the present embodiment, above-mentioned executing subject can in the following way, Lai Shengcheng The corresponding projection coordinate of video frame (such as video frame in extracted two frames video frame):

Video frame is input to projection coordinate trained in advance and determines model, obtains the corresponding projection coordinate of the video frame. Wherein, above-mentioned projection coordinate determines that model is determined for the corresponding projection coordinate of video frame of input.Video frame is corresponding Projection coordinate of the normal vector of the corresponding palm of palm object in projection coordinate, that is, video frame in predetermined projection plane.

As an example, above-mentioned projection coordinate determines that model can be using machine learning algorithm, assembled for training based on training sample The model got, is also possible to using geometric algorithm, to the plane where the corresponding palm of palm object in video frame into After row normal vector calculates, then obtained normal vector is projected and obtains the formula of projection coordinate.

Illustratively, the training sample that above-mentioned training sample is concentrated includes that video frame and the corresponding projection of video frame are sat Mark.Herein, the corresponding projection coordinate of each video frame that training sample is concentrated can be what artificial mark obtained；It is also possible to It obtains in the following way:

Firstly, obtaining the video of palm rotation.Wherein, in each video frame in video palm normal vector projection coordinate The middle coordinate value along first direction is all the same, and the seat in each video frame in the projection coordinate of palm normal vector in a second direction Scale value persistently changes.Projection coordinate is projection coordinate of the palm normal vector on preset plane (such as imaging plane).Palm turns Dynamic video content can reflect the variation tendency of palm normal vector.Here it is possible to pass through various equipment (such as above-mentioned execution master Body or other electronic equipments communicated to connect with above-mentioned executing subject) shoot the video that palm rotates.Wherein, it is rotated in palm During, direction of rotation can be kept to fix, there is no be displaced in the direction along shaft for palm.Wherein, the direction of shaft can To be the direction where arm.For example, palm and finger are unfolded, guarantee palm and finger in the same plane, the direction of finger For vertical direction.It is the initial position of palm where the palm when plane normal to screen direction, then the palm centre of the palm is to screen Direction rotates, and plane where going to palm is parallel with screen, continues to rotate according to former direction of rotation, until to plane where palm It is again perpendicular to screen direction, in palm rotation process, palm is not displaced in the vertical direction.The video bag of palm rotation Containing multiple video frames.Wherein, palm rotation video each video frame in palm normal vector projection coordinate in along first direction Coordinate value it is all the same, and the coordinate value in the image of the video in the projection coordinate of palm normal vector in a second direction persistently becomes Change, wherein lasting variation is it is to be appreciated that lasting increase or lasting reduction.The projection coordinate of palm normal vector is palm For normal vector in the coordinate of preset plane, preset plane can be imaging plane.It is obtained in another video using similar method Coordinate value in each video frame in the projection coordinate of palm normal vector in a second direction is all the same, and palm in each video frame The projection coordinate that coordinate value in the projection coordinate of normal vector along first direction persistently changes.Here, above-mentioned first direction and Two directions can be any direction.Optionally, the straight line where first direction and the straight line where second direction can be perpendicular. As an example, first direction can be the direction for being parallel to ground in projection plane, second direction, which can be in projection plane, hangs down Directly in the direction on ground.

It is then possible to each video frame and the corresponding projection coordinate's group of the video frame that will include for above-mentioned video At a training sample, and then obtain training sample set.

It is appreciated that causing since the method manually marked is difficult accurately to mark the projection coordinate of video frame The model accuracy that training obtains is lower, and the method manually marked is troublesome in poeration, and the training effectiveness of model is lower.Therefore, The training sample set obtained by the way of above-mentioned unartificial mark is raw to train projection coordinate to determine that model can be improved in model At the accuracy of projection coordinate, the training effectiveness of model is improved.

Step 204, in response to determining that the movement of the corresponding palm of target video is movement of waving, generation is used to indicate target The movement of the corresponding palm of video is the recognition result of movement of waving.

In the present embodiment, above-mentioned to hold in the case where determining the movement of the corresponding palm of target video is to wave movement It is the recognition result of movement of waving that row main body, which can be generated and be used to indicate the movement of the corresponding palm of target video,.

In some optional implementations of the present embodiment, after performing step 204, above-mentioned executing subject can be with Execute following steps:

Firstly, for the video frame for including palm object, will include target in target video in response to generating recognition result The image of object and the video frame carry out image co-registration, obtain fused image corresponding with the video frame.Wherein, above-mentioned mesh Mark object can be various objects, such as the text for being used to indicate greeting etc..

Then, fusion rear video is generated, and is presented using fusion rear video substitution target video.

It should be noted that above-mentioned executing subject can be presented target video before fusion rear video is presented, Target video can not also be presented, be not limited thereto.

In some optional implementations of the present embodiment, above-mentioned executing subject is after executing above-mentioned steps 204, also Following steps can be executed: the position where control target device to above-mentioned executing subject is mobile.Wherein, above-mentioned target device can To be the various electronic equipments with the communication connection of above-mentioned executing subject, such as mobile robot etc..

It is appreciated that this optional implementation can identify the movement of the corresponding palm of target video be wave it is dynamic In the case where work, target device is controlled close to above-mentioned executing subject, it is thus achieved that the human-computer interaction based on action recognition of waving.

With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for the image for identification of the present embodiment Figure.In the application scenarios of Fig. 3, mobile phone 301 obtains target video 3011 first.Then, mobile phone 301 is from target video 3011 Extract the two frame video frames (being video frame 30111 and video frame 30112 in diagram) comprising palm object.Later, 301 base of mobile phone In above-mentioned two frames video frame, identify whether the movement of the corresponding palm of target video 3011 is movement of waving.Finally, in response to true The movement of the corresponding palm of video 3011 of setting the goal is movement of waving, and generation is used to indicate the corresponding palm of target video 3011 Movement is the recognition result of movement of waving.As an example, mobile phone 301 generates the corresponding palm of target video 3011 in diagram Movement is the recognition result " being movement of waving " of movement of waving.

In the prior art, whether there is usually no include the technical side for executing the palm object for movement of waving in identification video Case.However, being responsible for a large amount of interaction in the daily life of user since hand is compared with other human bodies more freedom and flexibility Work, the operation completed by hand are innumerable.As it can be seen that the image or view that identification includes hand object exists in the prior art The demand of the semantic information of frequency, for example, whether identification image or the corresponding palm of video are in the demand for executing movement of waving.

Then the method provided by the above embodiment of the disclosure, extracts packet by obtaining target video from target video Two frame video frames of the object containing palm are based on two frame video frames later, the movement of the corresponding palm of identification target video whether be Movement of waving generates finally, the movement in response to the corresponding palm of determining target video is movement of waving and is used to indicate target view Frequently the movement of corresponding palm is the recognition result of movement of waving, thus, it is possible to make electronic equipment determine in video whether Palm object comprising executing movement of waving identifies richer palm attitude information, helps to realize based on gesture in turn The human-computer interaction of identification.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for image for identification.The use In the process 400 of the method for identification image, comprising the following steps:

Step 401, target video is obtained.Later, step 402 is executed.

In the present embodiment, step 401 and the step 201 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

Step 402, the two frame video frames comprising palm object are extracted from target video.Later, step 403 is executed.

In the present embodiment, step 402 and the step 202 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

Step 403, throwing of the normal vector in predetermined projection plane of the corresponding palm of two frame video frames is determined Shadow coordinate.Later, step 404 is executed.

In the present embodiment, above-mentioned executing subject can determine the normal vector of the corresponding palm of two frame video frames pre- The first projection coordinate of determining projection plane.Projection plane is the imaging plane of video frame.

Herein, above-mentioned executing subject or the electronic equipment communicated to connect with above-mentioned executing subject can use a variety of sides Formula executes above-mentioned steps 403.

As an example, the two frame video frames that above-mentioned executing subject can extract step 402 are sequentially input to preparatory training Palm normal vector determine model, successively obtain the normal vector of the corresponding palm of every frame video frame in above-mentioned two frames video frame, Then, successively determine obtained normal vector in the projection coordinate of imaging plane.

Herein, above-mentioned palm normal vector determines that model is determined for the normal vector of the corresponding palm of video frame.Make For example, above-mentioned palm normal vector determines that model can be the method that associated storage has video frame and the corresponding palm of video frame The bivariate table or database of vector are also possible to the convolution mind obtained using machine learning algorithm based on the training of training sample set Through network model.Wherein, the training sample in above-mentioned training sample set includes video frame and the corresponding palm of video frame Normal vector.

As another example, the two frame video frames that above-mentioned executing subject can also extract step 402 are sequentially input to pre- First trained projection coordinate determines model, obtains the corresponding projection coordinate of the video frame.Wherein, above-mentioned projection coordinate determines model It is determined for the corresponding projection coordinate of video frame of input.Palm pair in the corresponding projection coordinate, that is, video frame of video frame As corresponding palm normal vector imaging plane projection coordinate.

Firstly, obtaining the video of palm rotation.Wherein, in each video frame in video palm normal vector projection coordinate The middle coordinate value along first direction is all the same, and the seat in each video frame in the projection coordinate of palm normal vector in a second direction Scale value persistently changes.Projection coordinate is projection coordinate of the palm normal vector on preset plane (such as imaging plane).Palm turns Dynamic video content can reflect the variation tendency of palm normal vector.Here it is possible to pass through various equipment (such as above-mentioned execution master Body or other electronic equipments communicated to connect with above-mentioned executing subject) shoot the video that palm rotates.Wherein, it is rotated in palm During, direction of rotation can be kept to fix, there is no be displaced in the direction along shaft for palm.Wherein, the direction of shaft can To be the direction where arm.For example, palm and finger are unfolded, guarantee palm and finger in the same plane, the direction of finger For vertical direction.It is the initial position of palm where the palm when plane normal to screen direction, then the palm centre of the palm is to screen Direction rotates, and plane where going to palm is parallel with screen, continues to rotate according to former direction of rotation, until to plane where palm It is again perpendicular to screen direction, in palm rotation process, palm is not displaced in the vertical direction.The video bag of palm rotation Containing multiple video frames.Wherein, palm rotation video each video frame in palm normal vector projection coordinate in along first direction Coordinate value it is all the same, and the coordinate value in the image of the video in the projection coordinate of palm normal vector in a second direction persistently becomes Change, wherein lasting variation is it is to be appreciated that lasting increase or lasting reduction.The projection coordinate of palm normal vector is palm For normal vector in the coordinate of preset plane, preset plane can be imaging plane.It is obtained in another video using similar method Coordinate value in each video frame in the projection coordinate of palm normal vector in a second direction is all the same, and palm in each video frame The projection coordinate that coordinate value in the projection coordinate of normal vector along first direction persistently changes.

Step 404, judge whether the projection coordinate of above-mentioned two frames video frame meets predetermined identification condition of waving.It Afterwards, if so, thening follow the steps 405；If it is not, thening follow the steps 406.

In the present embodiment, above-mentioned executing subject may determine that whether the projection coordinate of above-mentioned two frames video frame meets in advance Determining identification condition of waving.Wherein, above-mentioned identification condition of waving is that " projection coordinate of two frame video frames is in preset first party To the absolute value of the difference of coordinate value be less than or equal to preset first threshold, also, the corresponding projection coordinate of two frame video frames It is greater than or equal to preset second threshold in the absolute value of the difference of the coordinate value of preset second direction ".Wherein, first threshold and Second threshold can be the pre-set numerical value of technical staff, and above-mentioned first threshold and second threshold can be equal or not Deng.

Herein, satisfaction " it is above-mentioned wave identification condition in the case where, above-mentioned executing subject can continue to execute step 405；If being unsatisfactory for above-mentioned identification condition of waving, above-mentioned executing subject can execute step 406.

It step 405, is movement of waving by the action recognition of the corresponding palm of target video.Later, step 407 is executed.

In the present embodiment, in satisfaction " difference of the projection coordinate of two frame video frames in the coordinate value of preset first direction Absolute value be less than or equal to preset first threshold, also, the corresponding projection coordinate of two frame video frames is in preset second party To coordinate value absolute value of the difference be greater than or equal to preset second threshold " in the case where, above-mentioned executing subject can be by mesh The action recognition of the corresponding palm of mark video is movement of waving.

It step 406, is non-movement of waving by the action recognition of the corresponding palm of target video.

In the present embodiment, in satisfaction " difference of the projection coordinate of two frame video frames in the coordinate value of preset first direction Absolute value be greater than preset first threshold, alternatively, seat of the corresponding projection coordinate of two frame video frames in preset second direction The absolute value of the difference of scale value is less than preset second threshold " in the case where, above-mentioned executing subject can be corresponding by target video The action recognition of palm is non-movement of waving.

In the present embodiment, projection coordinate can be characterized by the first element and second element.Wherein, the first element can To be coordinate value of the projection coordinate in above-mentioned preset first direction, second element can be projection coordinate above-mentioned preset The coordinate value in two directions.It, can be by the corresponding palm of target video as a result, when meeting following formula (identification condition of waving) Action recognition be to wave movement:

|d_x|≤T₁, also, | d_y|≥T₂,

Wherein, d_xThe difference for characterizing the first element of the projection coordinate of above-mentioned two frames video frame, when above-mentioned two frames video frame Projection coordinate is respectively " (x_t, y_t) " and " (x_t-1, y_t-1) " when, d_xValue be x_tWith x_t-1Difference.Wherein, x_tFor wherein frame view First element of the projection coordinate of frequency frame.x_t-1For the first element of the projection coordinate of another frame video frame.d_yCharacterize above-mentioned two frame The difference of the second element of the projection coordinate of video frame, when the projection coordinate of above-mentioned two frames video frame is respectively " (x_t, y_t) " and “(x_t-1, y_t-1) " when, d_yValue be y_tWith y_t-1Difference.Wherein, y_tFor the second element of the wherein projection coordinate of a frame video frame, y_t-1For the second element of the projection coordinate of another frame video frame.

In addition, when being unsatisfactory for above-mentioned formula, can be by the action recognition of the corresponding palm of target video it is non-wave it is dynamic Make.

Step 407, generating and being used to indicate the movement of the corresponding palm of target video is the recognition result of movement of waving.

In the present embodiment, it is to recruit that above-mentioned executing subject, which can be generated and be used to indicate the movement of the corresponding palm of target video, The recognition result made manually.

It should be noted that the present embodiment can also include embodiment corresponding with Fig. 2 in addition to documented content above Same or similar feature, effect, details are not described herein.

Figure 4, it is seen that compared with the corresponding embodiment of Fig. 2, the method for the image for identification in the present embodiment Process 400 highlight the projection coordinate based on two frame video frames, whether the movement of the corresponding palm of identification target video is trick The step of making manually.The scheme of the present embodiment description can more acurrate, rapidly identify the dynamic of the corresponding palm of video as a result, It whether is movement of waving.

With further reference to Fig. 5, as the realization to method shown in above-mentioned Fig. 2, present disclose provides one kind to scheme for identification One embodiment of the device of picture, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, except following documented special Sign is outer, which can also include feature identical or corresponding with embodiment of the method shown in Fig. 2, and generates and scheme Embodiment of the method shown in 2 is identical or corresponding effect.The device specifically can be applied in various electronic equipments.

As shown in figure 5, the device 500 of the image for identification of the present embodiment includes: acquiring unit 501, extraction unit 502, recognition unit 503 and the first generation unit 504.Wherein, acquiring unit 501 is configured to obtain target video；It extracts single Member 502 is configured to extract the two frame video frames comprising palm object from target video；Recognition unit 503 is configured to be based on Whether the movement of two frame video frames, the corresponding palm of identification target video is movement of waving；First generation unit 504 is configured to In response to determining that the movement of the corresponding palm of target video is movement of waving, generation is used to indicate the corresponding palm of target video Movement is the recognition result of movement of waving.

In the present embodiment, for identification the acquiring unit 501 of the device 500 of image can by wired connection mode or Person's radio connection obtains target video from other electronic equipments or locally.Wherein, above-mentioned target video can be to right Its video for carrying out palm action identification.

In the present embodiment, said extracted unit 502 can extract packet from the target video that acquiring unit 501 is got Two frame video frames of the object containing palm.

In the present embodiment, the two frame videos that above-mentioned recognition unit 503 can be extracted based on said extracted unit 502 Whether the movement of frame, the corresponding palm of target video that identification acquiring unit 501 is got is movement of waving.Wherein, target regards Frequently corresponding palm can be obtain target video during captured palm.

In the present embodiment, it is in response to the movement of the corresponding palm of target video for determining that acquiring unit 501 is got It waves movement, it is to wave that above-mentioned first generation unit 504, which can be generated and be used to indicate the movement of the corresponding palm of the target video, The recognition result of movement.

In some optional implementations of the present embodiment, above-mentioned recognition unit 503 includes: that input module (is in figure Show) it is configured to for two frame video frames being input to identification model trained in advance, to determine the corresponding palm of target video Whether movement is movement of waving, wherein identification model includes the corresponding hand of video of two inputted frame video frames for identification Whether the movement of the palm is movement of waving.

In some optional implementations of the present embodiment, recognition unit 503 comprises determining that module (showing in figure) It is configured to determine projection coordinate of the normal vector in predetermined projection plane of the corresponding palm of two frame video frames；With And identification module (showing in figure) is configured to identify the corresponding palm of target video based on identified Liang Ge projection coordinate Movement whether be movement of waving.

In some optional implementations of the present embodiment, projection plane is the imaging plane of video frame；And identification Module includes: that the first identification submodule (showing in figure) is configured in response to the projection coordinate of two frame video frames preset The absolute value of the difference of the coordinate value of first direction is less than or equal to preset first threshold, also, the corresponding throwing of two frame video frames Shadow coordinate is greater than or equal to preset second threshold in the absolute value of the difference of the coordinate value of preset second direction, by target video The action recognition of corresponding palm is movement of waving.

In some optional implementations of the present embodiment, projection plane is the imaging plane of video frame；And identification Module includes: that the second identification submodule (showing in figure) is configured in response to the projection coordinate of two frame video frames preset The absolute value of the difference of the coordinate value of first direction is greater than preset first threshold, alternatively, the corresponding projection coordinate of two frame video frames It is less than preset second threshold in the absolute value of the difference of the coordinate value of preset second direction, by the corresponding palm of target video Action recognition is non-movement of waving.

In some optional implementations of the present embodiment, extraction unit 502 includes: extraction module (showing in figure) It is configured to extract the two adjacent frame video frames comprising palm object from target video.

In some optional implementations of the present embodiment, the device 500 further include: integrated unit (is shown) in figure It is configured in response to generate recognition result, will include target object in target video for the video frame for including palm object Image and the video frame carry out image co-registration, obtain fused image corresponding with the video frame；And second generate it is single Member (showing in figure) is configured to generate fusion rear video, and is presented using fusion rear video substitution target video.

The device of the image for identification provided by the above embodiment of the disclosure obtains target view by acquiring unit 501 Frequently, then, extraction unit 502 extracts the two frame video frames comprising palm object, later, recognition unit 503 from target video Based on two frame video frames, identify whether the movement of the corresponding palm of target video is movement of waving, finally, the first generation unit 504 movement in response to determining the corresponding palm of target video is movement of waving, and generation is used to indicate the corresponding hand of target video The movement of the palm is the recognition result of movement of waving, thus, it is possible to determine electronic equipment whether in video comprising executing trick The palm object made manually identifies richer palm attitude information, helps to realize based on the man-machine of gesture identification in turn Interaction.

Below with reference to Fig. 6, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1 Server or terminal device) 600 structural schematic diagram.Terminal device in embodiment of the disclosure can include but is not limited to all As mobile phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD (tablet computer), PMP are (portable Formula multimedia player), the mobile terminal and such as number TV, desk-top meter of car-mounted terminal (such as vehicle mounted guidance terminal) etc. The fixed terminal of calculation machine etc..Terminal device/server shown in Fig. 6 is only an example, should not be to the implementation of the disclosure The function and use scope of example bring any restrictions.

As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.) 601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608 Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM603 are connected with each other by bus 604. Input/output (I/O) interface 605 is also connected to bus 604.

In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 606 of head, microphone, accelerometer, gyroscope etc.；Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 607 of dynamic device etc.；Storage device 608 including such as tape, hard disk etc.；And communication device 609.Communication device 609, which can permit electronic equipment 600, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 6 shows tool There is the electronic equipment 600 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.Each box shown in Fig. 6 can represent a device, can also root According to needing to represent multiple devices.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 609, or from storage device 608 It is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, the implementation of the disclosure is executed The above-mentioned function of being limited in the method for example.

It is situated between it should be noted that computer-readable medium described in embodiment of the disclosure can be computer-readable signal Matter or computer readable storage medium either the two any combination.Computer readable storage medium for example can be with System, device or the device of --- but being not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or it is any more than Combination.The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires Electrical connection, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type are programmable Read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited Memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer readable storage medium, which can be, appoints What include or the tangible medium of storage program that the program can be commanded execution system, device or device use or and its It is used in combination.And in embodiment of the disclosure, computer-readable signal media may include in a base band or as carrier wave The data-signal that a part is propagated, wherein carrying computer-readable program code.The data-signal of this propagation can be adopted With diversified forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal is situated between Matter can also be any computer-readable medium other than computer readable storage medium, which can be with It sends, propagate or transmits for by the use of instruction execution system, device or device or program in connection.Meter The program code for including on calculation machine readable medium can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. or above-mentioned any appropriate combination.

Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment；It is also possible to individualism, and not It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more When a program is executed by the electronic equipment, so that the electronic equipment: obtaining target video；Extract from target video includes palm Two frame video frames of object；Based on two frame video frames, identify whether the movement of the corresponding palm of target video is movement of waving；It rings It should generate in determining that the movement of the corresponding palm of target video is movement of waving and be used to indicate the dynamic of the corresponding palm of target video Work is the recognition result of movement of waving.

The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof The computer program code of work, described program design language include object oriented program language-such as Java, Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor Including acquiring unit, extraction unit, recognition unit and the first generation unit.Wherein, the title of these units is under certain conditions The restriction to the unit itself is not constituted, for example, acquiring unit is also described as " obtaining the unit of target video ".

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method of image for identification, comprising:

Obtain target video；

The two frame video frames comprising palm object are extracted from the target video；

Based on the two frames video frame, identify whether the movement of the corresponding palm of the target video is movement of waving；

Movement in response to the corresponding palm of the determination target video is movement of waving, and generation is used to indicate the target video The movement of corresponding palm is the recognition result of movement of waving.

2. it is described to be based on the two frames video frame according to the method described in claim 1, wherein, identify the target video pair Whether the movement for the palm answered is movement of waving, comprising:

The two frames video frame is input to identification model trained in advance, with the dynamic of the corresponding palm of the determination target video It whether is movement of waving, wherein the identification model includes that the video of two inputted frame video frames is corresponding for identification Whether the movement of palm is movement of waving.

3. it is described to be based on the two frames video frame according to the method described in claim 1, wherein, identify the target video pair Whether the movement for the palm answered is movement of waving, comprising:

Determine projection coordinate of the normal vector in predetermined projection plane of the corresponding palm of the two frames video frame；

Based on identified Liang Ge projection coordinate, identify whether the movement of the corresponding palm of the target video is movement of waving.

4. according to the method described in claim 3, wherein, projection plane is the imaging plane of video frame；And

It is described based on identified Liang Ge projection coordinate, identify the corresponding palm of the target video movement whether be wave it is dynamic Make, comprising:

Be less than in response to the absolute value of the difference of coordinate value of the projection coordinate in preset first direction of the two frames video frame or Equal to preset first threshold, also, the corresponding projection coordinate of the two frames video frame is in the coordinate value of preset second direction Absolute value of the difference be greater than or equal to preset second threshold, by the action recognition of the corresponding palm of the target video be wave Movement.

5. according to the method described in claim 3, wherein, projection plane is the imaging plane of video frame；And

In response to coordinate value of the projection coordinate in preset first direction of the two frames video frame absolute value of the difference be greater than it is pre- If first threshold, alternatively, the difference of coordinate value of the corresponding projection coordinate of the two frames video frame in preset second direction Absolute value is less than preset second threshold, is non-movement of waving by the action recognition of the corresponding palm of the target video.

6. according to the method described in claim 1, wherein, the extraction from the target video includes two frames of palm object Video frame, comprising:

The two adjacent frame video frames comprising palm object are extracted from the target video.

7. method described in one of -6 according to claim 1, wherein the method also includes:

It will include target in the target video for the video frame for including palm object in response to generating the recognition result The image of object and the video frame carry out image co-registration, obtain fused image corresponding with the video frame；

Fusion rear video is generated, and the target video is substituted using the fusion rear video and is presented.

8. a kind of device of image for identification, comprising:

Acquiring unit is configured to obtain target video；

Extraction unit is configured to extract the two frame video frames comprising palm object from the target video；

Recognition unit, be configured to identify based on the two frames video frame the corresponding palm of the target video movement whether For movement of waving；

First generation unit is configured in response to determine that the movement of the corresponding palm of the target video is movement of waving, raw It is the recognition result of movement of waving at the movement for being used to indicate the corresponding palm of the target video.

9. device according to claim 8, wherein the recognition unit includes:

Input module is configured to for the two frames video frame being input to identification model trained in advance, with the determination target Whether the movement of the corresponding palm of video is movement of waving, wherein the identification model includes two inputted frames for identification Whether the movement of the corresponding palm of the video of video frame is movement of waving.

10. device according to claim 8, wherein the recognition unit includes:

Determining module is configured to determine the normal vector of the corresponding palm of the two frames video frame in predetermined projection The projection coordinate of plane；

Identification module is configured to identify the dynamic of the corresponding palm of the target video based on identified Liang Ge projection coordinate It whether is movement of waving.

11. device according to claim 10, wherein projection plane is the imaging plane of video frame；And

The identification module includes:

First identification submodule, is configured in response to seat of the projection coordinate in preset first direction of the two frames video frame The absolute value of the difference of scale value is less than or equal to preset first threshold, also, the corresponding projection coordinate of the two frames video frame exists The absolute value of the difference of the coordinate value of preset second direction is greater than or equal to preset second threshold, and the target video is corresponding The action recognition of palm be to wave movement.

12. device according to claim 10, wherein projection plane is the imaging plane of video frame；And

The identification module includes:

Second identification submodule, is configured in response to seat of the projection coordinate in preset first direction of the two frames video frame The absolute value of the difference of scale value is greater than preset first threshold, alternatively, the corresponding projection coordinate of the two frames video frame is preset The absolute value of the difference of the coordinate value of second direction is less than preset second threshold, by the movement of the corresponding palm of the target video It is identified as non-movement of waving.

13. device according to claim 8, wherein the extraction unit includes:

Extraction module is configured to extract the two adjacent frame video frames comprising palm object from the target video.

14. the device according to one of claim 8-13, wherein described device further include:

Integrated unit is configured in response to generate the recognition result, for including palm object in the target video Image comprising target object and the video frame are carried out image co-registration, obtain fusion corresponding with the video frame by video frame Image afterwards；

Second generation unit is configured to generate fusion rear video, and substitutes the target view using the fusion rear video Frequency is presented.

15. a kind of electronic equipment, comprising:

One or more processors；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-7.

16. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor The now method as described in any in claim 1-7.