CN107995442A

CN107995442A - Processing method, device and the computing device of video data

Info

Publication number: CN107995442A
Application number: CN201711395657.2A
Authority: CN
Inventors: 肖胜涛; 刘洛麒; 赵鑫; 邱学侃
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-12-21
Filing date: 2017-12-21
Publication date: 2018-05-04

Abstract

The invention discloses a kind of processing method of video data, device and computing device, wherein, method includes：Human body segmentation's processing is carried out for the multiple images frame in video data, obtains multiple human region data；By multiple human region data respectively compared with the multiple body-sensing action datas included in default combinative movement data set；When definite comparative result meets preset matching rule, audio instructions are determined according to the corresponding voice data of multiple images frame, whether judgement matches with the combinative movement data set of multiple human region data match with audio instructions；Video data is handled if so, handling rule according to combinative movement, the video data after display processing.Which is acted using body-sensing and voice data is handled video data as driving, is improved the accuracy of processing, is reduced False Rate, improve the display effect of video data, suitable for any mobile terminal with camera, Anti infrared interference ability is strong, and cost is low.

Description

Processing method, device and the computing device of video data

Technical field

The present invention relates to technical field of image processing, and in particular to a kind of processing method of video data, device and calculating Equipment.

Background technology

With the development of science and technology, higher level human-computer interaction theory proposes interactive mode higher and higher want Ask, for example, body-sensing man-machine interaction mode, people can directly use limb action very much, with the device on periphery or environment into Row is interactive, without using any complicated control device, people can be allowed to do interaction with device or environment with being personally on the scene.

But inventor has found in the implementation of the present invention：Body-sensing man-machine interaction mode of the prior art is often Need the body-sensing for accurately catching user to act, such as need the artis for positioning human body to be acted with the body-sensing of definite user； Secondly, body-sensing man-machine interaction mode of the prior art often relies on the camera of high accuracy, high depth with the body-sensing to user Action is predicted, but the camera of high accuracy, high depth is of high cost, and can only be in the situation without strong Infrared jamming Lower use, the man-machine interaction mode based on which are difficult to promote on mobile terminals；In addition, the body-sensing based on RGB image is moved Make to catch and generally require very big calculation amount；In addition, often individually man-machine friendship is carried out by driving of body-sensing action in the prior art Mutually, which cannot be guaranteed the accuracy of processing, and there are certain False Rate.It can be seen from the above that lack a kind of energy in the prior art Enough solutions to the problems described above well.

The content of the invention

In view of the above problems, it is proposed that the present invention overcomes the above problem in order to provide one kind or solves at least in part State processing method, device and the computing device of the video data of problem.

According to an aspect of the invention, there is provided a kind of processing method of video data, including：For the video counts Multiple images frame in carries out human body segmentation's processing, obtains and the corresponding multiple human region numbers of described multiple images frame According to；By the multiple human region data respectively with multiple body-sensing action datas for being included in default combinative movement data set into Row compares；When definite comparative result meet preset matching rule when, according to the corresponding voice data of described multiple images frame Determine audio instructions, judgement and the combinative movement data set of the multiple human region data match are with the audio instructions No matching；If so, obtain the combinative movement corresponding to the combinative movement data set with the multiple human region data match Processing rule, handles rule according to the combinative movement and the video data is handled, the video data after display processing.

Alternatively, the step of basis determines audio instructions with the corresponding voice data of described multiple images frame is specific Including：

Speech recognition is carried out for voice data corresponding with described multiple images frame, obtains voice recognition result；

Determined and the corresponding audio instructions of the voice recognition result according to default audio instructions storehouse；Wherein, the sound Frequency instruction database is used to store each audio instructions.

Alternatively, the audio instructions storehouse is further used for storing each audio instructions and its corresponding combinative movement data Mapping relations between collection；

Then combinative movement data set and the audio instructions of the judgement with the multiple human region data match The step of whether matching specifically includes：

According to the audio instructions storehouse determine with the combinative movement data set of the multiple human region data match with Whether the audio instructions match.

Alternatively, the default combinative movement data set includes：Multiple groups being stored in default body-sensing maneuver library Action data collection is closed, and at least two body-sensing action datas are included in each combinative movement data set；

Then it is described by the multiple human region data respectively with included in default combinative movement data set it is more individual The step of sense action data is compared specifically includes：

By the multiple human region data each combinative movement data set with being stored in the body-sensing maneuver library respectively In multiple body-sensing action datas for including be compared.

Alternatively, the preset matching rule includes：

When the M human region data included in the multiple human region data respectively with combinative movement number to be compared During the M individual sense action data matchings included according to concentrating, the multiple human region data and the combination to be compared are determined Action data collection meets the matched rule；

Wherein, the total quantity of the multiple human region data is greater than or equal to M, the combinative movement data to be compared The total quantity of the multiple body-sensing action datas included is concentrated to be greater than or equal to M；Wherein, M is the natural number more than 1.

Alternatively, each body-sensing action data included in the combinative movement data set to be compared has time sequence number Mark, then the M human region data included in the multiple human region data respectively with combinative movement data to be compared The step of concentrating the M included individual sense action data matchings specifically includes：

Appearance of the M human region data for judging to include in the multiple human region data in the video data Whether order matches with the time sequence number mark of the M individual sense action datas included in combinative movement data set to be compared；

If, it is determined that the M human region data included in the multiple human region data respectively with it is to be compared The M individual sense action data matchings included in combinative movement data set.

Alternatively, it is described be directed to the video data in multiple images frame carry out human body segmentation's processing, obtain with it is described The step of multiple images frame corresponding multiple human region data, specifically includes：

According to appearance order of each picture frame in the video data, obtain what is included in the video data in real time Currently pending picture frame, carries out human body segmentation's processing to the currently pending picture frame, obtains currently treating with described The corresponding human region data of picture frame of processing.

Alternatively, it is described by the multiple human region data respectively with included in default combinative movement data set it is more The step of individual sense action data is compared specifically includes：

By the corresponding human region data of the currently pending picture frame respectively with each combinative movement data set Comprising multiple body-sensing action datas be compared；

Comparative result is determined as the first action data for successful body-sensing action data, by the first action data institute Combinative movement data set be determined as the first action data collection；

Corresponding human region data of rear N number of picture frame corresponding to by the currently pending picture frame and described the One action data concentrates each body-sensing action data included to be compared；Wherein, N is the natural number more than or equal to 1.

Alternatively, group of the acquisition corresponding to the combinative movement data set of the multiple human region data match The step of conjunction action processing rule specifically includes：

Storehouse is handled according to default combinative movement, determines the combinative movement number with the multiple human region data match Rule is handled according to the corresponding combinative movement of collection；

Wherein, the combinative movement processing storehouse is used to store the combinative movement processing corresponding to each combinative movement data set Rule.

Alternatively, the combinative movement processing rule includes：According to the corresponding effect textures of combinative movement data set, The video data is handled.

Alternatively, described the step of being handled according to combinative movement processing rule the video data, specifically wraps Include：

To the rear L picture frame corresponding to currently pending picture frame and/or the currently pending picture frame into Row processing；Wherein, the L is the natural number more than 1.

Alternatively, the video data includes：Video data, and/or man-machine friendship by image capture device captured in real-time The video data included in mutual class game.

According to another aspect of the present invention, there is provided a kind of processing unit of video data, including：Split module, be suitable for Human body segmentation's processing is carried out for the multiple images frame in the video data, is obtained corresponding more with described multiple images frame A human region data；Comparison module, suitable for by the multiple human region data respectively with default combinative movement data set In multiple body-sensing action datas for including be compared；Audio instructions determining module, suitable for that ought determine that it is default that comparative result meets During matched rule, audio instructions are determined according to the corresponding voice data of described multiple images frame；Judgment module, suitable for judging Whether matched with the audio instructions with the combinative movement data set of the multiple human region data match；Processing rule obtains Modulus block, if suitable for judging and the combinative movement data set of the multiple human region data match and the audio instructions Matching, obtains the combinative movement processing corresponding to the combinative movement data set with the multiple human region data match and advises Then；Processing module, is handled the video data suitable for handling rule according to the combinative movement；Display module, is suitable for Video data after display processing.

Alternatively, the audio instructions determining module is further adapted for：

Then the judgment module is further adapted for：

The comparison module is further adapted for：

Alternatively, the preset matching rule includes：

Alternatively, each body-sensing action data included in the combinative movement data set to be compared has time sequence number Mark, then the comparison module is further adapted for：

Alternatively, the segmentation module is further adapted for：

Alternatively, the comparison module is further adapted for：

Alternatively, the processing rule acquisition module is further adapted for：

Alternatively, the processing module is further adapted for：

According to another aspect of the invention, there is provided a kind of computing device, including：Processor, memory, communication interface and Communication bus, the processor, the memory and the communication interface complete mutual communication by the communication bus；

The memory is used to store an at least executable instruction, and it is above-mentioned that the executable instruction performs the processor The corresponding operation of processing method of video data.

In accordance with a further aspect of the present invention, there is provided a kind of computer-readable storage medium, be stored with the storage medium to A few executable instruction, the executable instruction make processor perform the corresponding operation of processing method such as above-mentioned video data.

Processing method, device and the computing device of the video data provided according to the present invention, this method can be quickly and accurate The body-sensing action of human body really is captured, video data is handled as driving using body-sensing action and voice data, is caught The video data that body-sensing action does not depend on high accuracy, the camera of high depth is shot is caught, suitable for any shifting with camera Dynamic terminal, Anti infrared interference ability is strong, and cost is low；Provide a kind of being acted with body-sensing based on human region segmentation and audio Man-machine interaction mode of the data as driving, can act according to body-sensing and voice data quickly determines to carry out video data Processing processing rule, only just performed on the premise of multiple image and the equal successful match of voice data to video data into The step of row processing, therefore, the accuracy of processing is improved, reduce False Rate, and the video data after display processing, lifted The display effect of video data, improves the recreational of human-computer interaction.

Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the embodiment of the present invention.

Brief description of the drawings

By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area Technical staff will be clear understanding.Attached drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And in whole attached drawing, identical component is denoted by the same reference numerals.In the accompanying drawings：

Fig. 1 shows the flow chart of the processing method of video data according to an embodiment of the invention；

Fig. 2 shows the flow chart of the processing method of video data in accordance with another embodiment of the present invention；

Fig. 3 shows the flow diagram for each sub-steps that step S220 is included；

Fig. 4 shows the structure diagram of the processing unit of the video data of further embodiment according to the present invention；

Fig. 5 shows the structure diagram of computing device according to embodiments of the present invention.

Embodiment

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.

Fig. 1 shows the flow chart of the processing method of video data according to an embodiment of the invention.As shown in Figure 1, This method comprises the following steps：

Step S110, carries out human body segmentation's processing for the multiple images frame in video data, obtains and multiple images frame Corresponding multiple human region data.

Video data can be the real-time video data of camera shooting, can also be either the pre- of local or high in the clouds The video data of camera recording is first passed through, can also be the video data being combined into by multiple pictures.Wherein, multiple images frame Can be the multiple images frame that default time interval is spaced in continuous multiple images frame or video data, this hair Bright concrete form and source to video data is without limiting.

Human body segmentation's processing is carried out to multiple images frame specifically to may be accomplished by：First, each figure is detected As the human region in frame, can specifically be classified by the pixel included to each picture frame to judge each picture frame In human region.Then, human region is split from corresponding picture frame, specifically can be by the corresponding picture of human region Vegetarian refreshments is split, and obtains multiple human region data corresponding with each picture frame.

Step S120, by multiple human region data multiple body-sensings with being included in default combinative movement data set respectively Action data is compared.

In the method for the present embodiment, the operation handled video data is triggered according to body-sensing combination of actions, therefore need Judge whether multiple human region data meet trigger condition, the multiple body-sensings included in default combinative movement data set are moved It is to judge whether multiple human region data meet the foundation of trigger condition as data.Wherein, human region data may include people Whether the pixel and the coordinate position of pixel that body region includes, this step specifically may determine that multiple human region data It is consistent with multiple body-sensing action datas respectively, or multiple human region data and multiple body-sensing action datas matching degree whether Include multiple body-sensing action datas more than preset matching degree threshold value, such as default beating dragon 18 palms combinative movement data set, then By multiple human region data and the plurality of body-sensing action data respectively compared with.

Step S130, when definite comparative result meet preset matching rule when, according to the corresponding sound of multiple images frame Frequency is according to audio instructions are determined, whether the combinative movement data set and audio instructions of judgement and multiple human region data match Matching.

Preset matching rule can be configured according to specific application scenarios, such as stronger for real-time and interactivity Game or live application scenarios, recognize when multiple human region data and the relatively low matching degree of multiple body-sensing action datas To meet preset matching rule, or the application scenarios of post-processing are carried out for video data, when multiple human region data With the matching degree of multiple body-sensing action datas it is higher when think to meet preset matching rule, in concrete application, art technology Personnel can be configured according to actual needs.

In actual application, there are the multiple body-sensing action data ratios included respectively in different combinative movement data sets More similar situation, feels action data for example, beating ground combinative movement data set and including two individuals, first lifts the right hand and put down the right side with after Hand, spreads colored combinative movement data set and includes two individual sense action datas, first lift the right side according to from the direction in the lower left corner to the upper right corner Hand puts down the right hand with after, then may go out during corresponding combinative movement data set is determined according to multiple somatic datas Mistake, that is, multiple body-sensing action datas that the combinative movement data set determined includes and the body-sensing action of actually user differ Cause.

Therefore, the method for the present embodiment further determines that audio refers to according to the corresponding voice data of multiple images frame Order, specifically can obtain audio instructions by carrying out speech recognition to voice data, judge and multiple human region data match Combinative movement data set whether matched with audio instructions.Specifically in advance audio can be set for multiple combinative movement data sets respectively Instruction, when the comparative result of multiple human region data and multiple body-sensing action datas meets preset matching rule, it is determined that With the corresponding combinative movement data set of the plurality of human region data, further determine that the plurality of human region data are corresponding Audio instructions corresponding to multiple images frame, judge the corresponding audio instructions of the plurality of picture frame and the combinative movement data set pair Whether the audio instructions answered match.

Step S140, if judging and the combinative movement data set of multiple human region data match and audio instructions Match somebody with somebody, obtain the combinative movement processing rule corresponding to the combinative movement data set with multiple human region data match, according to Combinative movement processing rule handles video data, the video data after display processing.

When judging to match with audio instructions with the combinative movement data set of multiple human region data match, then obtain The combinative movement corresponding to the combinative movement data set is taken to handle rule, it follows that the method for the present embodiment is according to multiple Human region data and its corresponding audio instructions determine with the corresponding combinative movement data set of multiple human region data, should Mode can improve the accuracy of processing, while can lift the recreational of human-computer interaction.For example, only work as multiple human regions Data are matched with beating dragon 18 palms action data collection, and corresponding audio instructions are also corresponding with beating dragon 18 palms action data collection When audio instructions match, the corresponding combinative movement processing rule of beating dragon 18 palms action data collection is just obtained.

Video data is handled specially the picture frame that video data includes is handled, combinative movement processing rule Can be then all kinds of processing rules such as addition special effect processing rule, such as according to the corresponding combination of beating dragon 18 palms action data collection Action processing rule handles each picture frame included in video data, and the video data of processing is shown, So that the video data of display includes the special efficacy of beating dragon 18 palms.The present invention does not limit the specific rules of Video processing, only Video display effect can be lifted.

According to the image processing method provided in this embodiment based on image capture device, in video data Multiple images frame carries out human body segmentation's processing, obtains and the corresponding multiple human region data of multiple images frame；Will be more personal Body region data are respectively compared with the multiple body-sensing action datas included in default combinative movement data set；When definite ratio When relatively result meets preset matching rule, obtain corresponding to the combinative movement data set with multiple human region data match Combinative movement processing rule；Rule is handled according to combinative movement to handle video data, the video data after display processing. Which can quickly and accurately capture the body-sensing action of human body, moved by combination make and two kinds of voice data because The corresponding combinative movement data set of the definite multiple images frame of element, so as to handle video data, therefore improves processing Accuracy, reduces False Rate, and improves the recreational of human-computer interaction, and captor move do not depend on high accuracy, The video data of the camera shooting of high depth, suitable for any mobile terminal with camera, Anti infrared interference ability is strong, Cost is low.

Fig. 2 shows the flow chart of the processing method of video data in accordance with another embodiment of the present invention, such as Fig. 2 institutes Show, this method includes：

Step S210, carries out human body segmentation's processing for the multiple images frame in video data, obtains and multiple images frame Corresponding multiple human region data.

Specifically, the appearance order according to each picture frame in video data, obtains what is included in video data in real time Currently pending picture frame, carries out human body segmentation's processing to currently pending picture frame, obtains and currently pending figure As the corresponding human region data of frame.

Video data can be the video data photographed by camera, go out according to each picture frame in video data Existing sequencing, obtains the currently pending picture frame included in video data in real time, wherein, due to the side of the present embodiment Method is the operation handled according to multiple body-sensing action triggers video data, it is therefore desirable to obtains what is included in video data Multiple images frame is handled.Specifically, video data can also be the video data that shooting is recorded in advance, at this time, this method It is that postprocessing operation is carried out to video data, can be successively by specified time section in video data according to time order and function order The each picture frame inside included is determined as currently pending picture frame successively, can also be determined currently to wait to locate according to detection algorithm The picture frame of reason, specifically, the picture frame of human region, bag by the picture frame and its afterwards is included according to detection algorithm detection Multiple images frame containing human region is determined as currently pending picture frame successively；Video data can also include：By image The video data included in video data, and/or human-computer interaction the class game of collecting device captured in real-time, such as live scene, The video data that vision facilities gathers in real time in somatic sensation television game interaction scenarios, it is then every by what is included in video data in real time at this time One two field picture is determined as currently pending picture frame according to time-series successively, and this is not limited by the present invention.

Human body segmentation's processing is carried out to currently pending picture frame specifically to may be accomplished by：First, detect Human region in currently pending picture frame, can specifically be detected in currently pending picture frame by neural network algorithm Comprising human region.Wherein, neural network algorithm can constantly learn the feature of human region by modes such as deep learnings, and The human region included in currently pending picture frame is detected according to learning outcome.Then, by the human region detected from Split in currently pending picture frame, the corresponding pixel of human region can specifically be split, obtain with it is each The corresponding multiple human region data of picture frame, wherein, human region data include the corresponding pixel of human region and The information such as the positional information of pixel, the colouring information of pixel.

The side that the human region included in currently pending picture frame is detected by neural network algorithm mentioned above Formula belongs to detection mode.In addition to being realized by detection mode, this step can also be further combined with being realized by track algorithm Tracking mode carries out human body segmentation's processing to currently pending picture frame.Specifically, detect currently to treat by detection mode In the picture frame of processing after human region, the positional information of human region is supplied to tracker, by tracker according to current The position of human region in pending picture frame to the human region in follow-up picture frame into line trace, due to usual feelings Under condition, the same area in video data, there are relevance, therefore, can be added in continuous multiple image by tracking mode The detection efficiency of fast subsequent image frames.Also, tracking result can also be supplied to the detector for detection by tracker, for Detector determines one piece of regional area as detection range from whole two field picture, and is only detected in the detection range, from And lift detection efficiency.In short, by the combined use of detection mode and tracking mode, the efficiency and essence of detection can be lifted Degree.

Step S220, by multiple human region data multiple body-sensings with being included in default combinative movement data set respectively Action data is compared.

Wherein, default combinative movement data set includes：Multiple combinative movements being stored in default body-sensing maneuver library Data set, and include at least two body-sensing action datas in each combinative movement data set；Then by multiple human region data point Not compared with the multiple body-sensing action datas included in default combinative movement data set the step of, specifically includes：Will be multiple Human region data act number with the multiple body-sensings included in each combinative movement data set for being stored in body-sensing maneuver library respectively According to being compared.

Body-sensing maneuver library is pre-set, since the method for the present embodiment is according to a set of continuous multiple body-sensings detected The operation that action triggers handle video data, and moving work only by an individual cannot trigger to video data progress Therefore at least two individual sense action datas, are determined as a combinative movement data set, by group by the operation of processing in the present embodiment Close action data collection and its association of corresponding at least two body-sensings action data is stored in body-sensing maneuver library.When from each image It is partitioned into frame after multiple human region data, multiple human region data and multiple body-sensing action datas is compared respectively Compared with to determine the corresponding combinative movement data set of multiple human region data.

Wherein, the multiple body-sensing action datas included in default combinative movement data set are respectively provided with time sequence number mark, For example, first lifted in a combinative movement data set comprising lifting the right hand and putting down two individual sense action datas of the right hand The right hand is put down after the right hand and corresponds to beat ground combinative movement data set, and first puts down that the right hand combination that corresponds to swish a whip is lifted after the right hand is dynamic Make data set, it follows that various combination action data, which is concentrated, may include the same body-sensing action data, it is each by setting The time sequence number mark of body-sensing action data can distinguish each combinative movement data set.

Specifically, in the present embodiment, further more sub-steps, Fig. 3 show what step S220 was included to this step The flow diagram of each sub-steps, as shown in figure 3, step S220 is specifically included：

Sub-step S221, by the currently pending corresponding human region data of picture frame respectively with each combinative movement number The multiple body-sensing action datas included according to concentrating are compared.

The human region data being partitioned into currently pending picture frame, by human region data respectively with each body-sensing Action data is compared, and the profile of human region can be specifically determined according to the pixel information that human region data are included And/or area, each body-sensing of the profile of human region and/or area respectively with being included in each combinative movement data set is moved The profile and/or area for making the human region corresponding to data are compared, in addition, in order to improve matching efficiency, can be by currently The first individual corresponding with each combinative movement data set moves work to the corresponding human region data of pending picture frame respectively Data are compared, or the forward multiple body-sensing action datas of order corresponding with each combinative movement data set carry out respectively Compare.

Sub-step S222, is determined as the first action data for successful body-sensing action data by comparative result, first is moved Combinative movement data set where making data is determined as the first action data collection.

It is compared according to sub-step S221, in the present embodiment, if the corresponding human region of currently pending picture frame Profile it is consistent with the profile of a corresponding human region of individual sense action data or matching degree is more than default outline Spend threshold value, and/or the area of the corresponding human region of currently pending picture frame face corresponding with an individual sense action data Difference between product is consistent or both is less than default difference threshold, then it is assumed that the human body area of the currently pending picture frame It is that successful body-sensing action data is determined as the by comparative result successfully that the comparative result of numeric field data and the body-sensing action data, which be, One action data, is determined as the first action data collection by the combinative movement data set where the first action data.

Sub-step S223, the corresponding human region data of rear N number of picture frame corresponding to by currently pending picture frame Compared with each body-sensing action data included being concentrated with the first action data；Wherein, N is the natural number more than or equal to 1.

Corresponding human region data of rear N number of picture frame corresponding to by picture frame currently pending in video data with First action data concentrates each body-sensing action data included to be compared respectively, and the mode compared can be found in above-mentioned steps The method of S221, details are not described herein, for example, carrying out dividing processing to currently pending picture frame, obtains corresponding human body Area data, which is compared with each body-sensing action data included in each combinative movement data set Compared with if being successfully, by the body-sensing action data there are an individual sense action data and the comparative result of the human region data The combinative movement data set at place is determined as the first action data collection, then further will be each after currently pending picture frame The corresponding human region data of picture frame are led to compared with each body-sensing action data that the first action data collection is included The scope of comparison other can be reduced by crossing which, accelerate to inquire about the process of action data collection corresponding with each picture frame.

Step S230, when definite comparative result meet preset matching rule when, for the corresponding sound of multiple images frame Frequency obtains voice recognition result according to speech recognition is carried out；Determined and the voice recognition result according to default audio instructions storehouse Corresponding audio instructions；Wherein, audio instructions storehouse is used to store each audio instructions；Judge and multiple human region data phases Whether matched combinative movement data set matches with audio instructions.

Preset matching rule includes：When the M human region data included in multiple human region data are respectively with waiting to compare Compared with combinative movement data set in include M individual sense action data matching when, determine multiple human region data with it is to be compared Combinative movement data set meet matched rule；Wherein, the total quantity of multiple human region data is greater than or equal to M, to be compared Combinative movement data set in the total quantity of multiple body-sensing action datas that includes be greater than or equal to M；Wherein, M is oneself more than 1 So number.

Combinative movement data set to be compared refers to default combinative movement data set, and human region data are acted with body-sensing Data Matching refer to the comparative result of the human region data and the body-sensing action data be successfully, in the application of reality, meeting The multiple body-sensings action that there is a situation where user acts not quite identical, example with the body-sensing corresponding to a combinative movement data set Such as, for the corresponding multiple body-sensing actions of combinative movement data set, in the multiple body-sensings action for detecting user, exist The body-sensing action of mistake or the body-sensing action omitted, if the action of multiple body-sensings and the combinative movement number of strict regulations user The operation handled video data is just triggered when acting completely the same according to corresponding multiple body-sensings, can be caused not to user Just, the experience of user's body feeling interaction is influenced.

Therefore, those skilled in the art can set preset matching rule according to specific application scenarios, for example, setting certain Matching ratio threshold value, matching ratio refer in multiple human region data with included in a combinative movement data set it is multiple The comparative result of body-sensing action data accounts for the quantity of the plurality of body-sensing action data for the quantity of successful human region data Ratio, if matching ratio is not less than matching ratio threshold value, it is determined that comparative result meets preset matching rule.For example, if Feel action data comprising five individuals in one combinative movement data set, determine four human region data with being somebody's turn to do by above-mentioned steps The comparative result of wherein four human region data of five individual sense action datas is respectively success, and matching ratio is at this time 80%, then it is assumed that four human region data match with the combinative movement data set；It is furthermore it is also possible to dynamic for a combination Make data concentration each body-sensing action data priority sequence number is set respectively, however, it is determined that multiple human region data respectively with group The comparative result for closing the higher body-sensing action data of multiple priority sequence numbers that action data is concentrated is successfully, then it is assumed that the plurality of Human region data match with the combinative movement data set.

Wherein, each body-sensing action data included in combinative movement data set to be compared is identified with time sequence number, The M human region data then included in multiple human region data are respectively with including in combinative movement data set to be compared The step of M individual sense action data matchings, specifically includes：

Whether appearance order of the M human region data for judging to include in multiple human region data in video data Matched with the time sequence number mark of the M individual sense action datas included in combinative movement data set to be compared；If, it is determined that The M human region data included in multiple human region data the M with being included in combinative movement data set to be compared respectively Individual sense action data matching.

The multiple body-sensing action datas included in default combinative movement data set are respectively provided with time sequence number mark, then need By the appearance order of each human region data compared with multiple body-sensing action datas, for example, a combinative movement Comprising lifting the right hand and putting down two individual sense action datas of the right hand in data set, first lift and put down the right hand after the right hand and correspond to beat Ground combinative movement data set, and lift the right hand after first putting down the right hand and correspond to the combinative movement data set that swishes a whip, it follows that different The same multiple body-sensing action datas may be included in combinative movement data set, by the time for setting each body-sensing action data Sequence number mark can distinguish each combinative movement data set, corresponding, in inquiry and the combination of multiple human region Data Matchings During action data collection, not only it needs to be determined that multiple human region data comparison knot with multiple body-sensing action datas respectively Fruit, it is also necessary to determine whether the order that multiple human region data occur in video data moves with combination to be compared Make the time sequence number mark matching that data concentrate the multiple body-sensing action datas included, only when multiple human region data and one The comparative result for multiple body-sensing action datas that a combinative movement data set includes meets preset matching rule, and multiple human bodies The order that area data occurs in video data is multiple body-sensing action datas with being included in the combinative movement data set During the mark matching of time sequence number, just determine that the plurality of human region data match with the combinative movement data set.

Include picture frame and voice data in video data, the method for the present embodiment is further according in video data State the corresponding voice data of multiple images frame and carry out speech recognition, voice recognition result is obtained, according to default audio instructions Storehouse determine with the corresponding audio instructions of the voice recognition result, multiple audio instructions are stored in audio instructions storehouse, in reality In, keyword can be set audio character according to included in audio instructions, then be determined pair according to voice recognition result During the audio instructions answered, judge whether the audio character that voice recognition result is included includes above-mentioned keyword, if so, then It can determine the corresponding audio instructions of voice recognition result.For example, " beating dragon 18 palms " audio instructions are included in audio instructions storehouse, Corresponding keyword is set for " drop dragon ", the corresponding voice recognition result of multiple images frame is " the drop dragon palm ", then can determine The corresponding audio instructions of the voice recognition result are " beating dragon 18 palms ".

Wherein, audio instructions storehouse is further used for storing between each audio instructions and its corresponding combinative movement data set Mapping relations, specifically in advance data set identification can be set for multiple combinative movement data sets respectively, and with multiple combinations Multiple data splitting set identifiers and its corresponding audio instructions are associated preservation by the corresponding audio instructions of action data collection In audio instructions storehouse, then the corresponding combinative movement data of voice data can be determined according to the voice recognition result of voice data Collection.

Then judge and the combinative movement data set of multiple human region data match and the whether matched step of audio instructions Suddenly specifically include：Determine with audio to refer to the combinative movement data set of multiple human region data match according to audio instructions storehouse Whether order matches.Closed due to saving the mapping between audio instructions and its corresponding combinative movement data set in audio instructions storehouse System, then can determine the combinative movement data set to match with voice data according to the corresponding voice data of multiple images frame, Above-mentioned steps determine the combinative movement data set with multiple human region data match, then determine whether voice data pair Whether the combinative movement data set answered is consistent with the combinative movement data set of multiple human region data match.

Step S240, if judging and the combinative movement data set of multiple human region data match and audio instructions Match somebody with somebody, the combinative movement processing rule corresponding to the combinative movement data set with multiple human region data match is obtained, to working as Rear L picture frame corresponding to preceding pending picture frame and/or currently pending picture frame is handled；Wherein, L is big In 1 natural number.

According to above-mentioned steps, if judging with audio to refer to the combinative movement data set of multiple human region data match Order matching, that is, the corresponding combinative movement data set of voice data combinative movement data corresponding with multiple human region data Collection is consistent, then the corresponding combinative movement processing rule of the acquisition combinative movement data set, and combinative movement processing rule can be added Add special effect processing rule or additive effect stick picture disposing is regular or display animation process is regular, for example, In live scene, user, which makes, first lifts the body-sensing action that the right hand is put down after the right hand, determines corresponding combinative movement processing rule For addition special effect processing rule, then special effect processing is added to video data；For another example, in somatic sensation television game, user makes the right side The body-sensing action of hand impact tennis, determines corresponding combinative movement processing rule as display animation process rule, then to video counts According to being added animation and display processing.The present invention is not limited the intension of combinative movement processing rule.

According to above-mentioned steps determine with after the combinative movement data set of multiple human region data match, further According to the corresponding effect textures of combinative movement data set, video data is handled, and the video counts after display processing According to.To video data processing be that the picture frame that video data includes is handled, according to combinative movement handle rule to regarding Frequency is according to being handled, such as above-mentioned addition special effect processing rule, and the processing of special efficacy is added to video data, and at display The video data of reason, for example, it is regular to video counts according to the corresponding combinative movement processing of combinative movement data set of beating dragon 18 palms Corresponding each picture frame is handled in, and the video data of processing is shown so that in the video data of display Special efficacy including beating dragon 18 palms.

Specifically, to the rear L picture frame corresponding to currently pending picture frame and/or currently pending picture frame Handled；Wherein, L is the natural number more than 1.In actual application, exist according to currently pending picture frame Determine the situation of corresponding combinative movement processing rule, then to currently pending picture frame and its corresponding rear L image Frame is handled accordingly；Alternatively, determined jointly in the presence of several picture frames according to currently pending picture frame and its afterwards The situation of corresponding combinative movement processing rule, then handled the currently pending corresponding rear L picture frame of picture frame.

According to the processing method of video data provided in this embodiment, mode of the which based on human body segmentation can be quick And the body-sensing action of human body is captured exactly, using body-sensing action handling video data as driving, also, pass through god There do not have through network detection human region and by the mode that human region is split from image for picture pick-up device to be any special It is required that the video data that the camera independent of high accuracy, high depth is shot, suitable for any mobile end with camera End, Anti infrared interference ability is strong, and cost is low.Further, since which passes through moving corresponding to the human region in multiple image Combine and corresponding voice data triggers corresponding special efficacy, only in multiple image and the equal successful match of voice data On the premise of just perform subsequent step, therefore, improve the accuracy of processing, reduce False Rate.Provide one kind and be based on people The man-machine interaction mode using body-sensing action as driving of body region segmentation, can act according to body-sensing and voice data quickly determines The processing rule handled video data, and the video data after display processing, improve the display effect of video data.

Fig. 4 shows the structure diagram of the processing unit of the video data of further embodiment according to the present invention, such as Fig. 4 Shown, which includes：

Split module 41, suitable for in video data multiple images frame carry out human body segmentation's processing, obtain with it is multiple The corresponding multiple human region data of picture frame；

Comparison module 42, suitable for by multiple human region data respectively with included in default combinative movement data set it is more Individual sense action data is compared；

Audio instructions determining module 43, suitable for when definite comparative result meet preset matching rule when, according to multiple figures As the corresponding voice data of frame determines audio instructions；

Judgment module 44, suitable for judging combinative movement data set and audio instructions with multiple human region data match Whether match；

Rule acquisition module 45 is handled, if suitable for judging the combinative movement data with multiple human region data match Collection is matched with audio instructions, obtains the combinative movement corresponding to the combinative movement data set with multiple human region data match Processing rule；

Processing module 46, is handled video data suitable for handling rule according to combinative movement；

Display module 47, suitable for the video data after display processing.

Alternatively, audio instructions determining module 43 is further adapted for：

Speech recognition is carried out for voice data corresponding with multiple images frame, obtains voice recognition result；

Determined and the corresponding audio instructions of the voice recognition result according to default audio instructions storehouse；Wherein, audio refers to Storehouse is made to be used to store each audio instructions.

Alternatively, audio instructions storehouse be further used for storing each audio instructions and its corresponding combinative movement data set it Between mapping relations；

Then judgment module 44 is further adapted for：

Combinative movement data set and audio instructions with multiple human region data match is determined according to audio instructions storehouse Whether match.

Alternatively, default combinative movement data set includes：Multiple combinations being stored in default body-sensing maneuver library are moved Make data set, and at least two body-sensing action datas are included in each combinative movement data set；

Then comparison module 42 is further adapted for：

By multiple human region data respectively with including in each combinative movement data set for being stored in body-sensing maneuver library Multiple body-sensing action datas are compared.

Alternatively, preset matching rule includes：

When the M human region data included in multiple human region data respectively with combinative movement data set to be compared In include M individual sense action data matchings when, determine that multiple human region data are accorded with combinative movement data set to be compared Close matched rule；

Wherein, the total quantity of multiple human region data is greater than or equal to M, is included in combinative movement data set to be compared The total quantitys of multiple body-sensing action datas be greater than or equal to M；Wherein, M is the natural number more than 1.

Alternatively, each body-sensing action data included in combinative movement data set to be compared has time sequence number mark Know, then comparison module 42 is further adapted for：

Whether appearance order of the M human region data for judging to include in multiple human region data in video data Matched with the time sequence number mark of the M individual sense action datas included in combinative movement data set to be compared；

If, it is determined that the M human region data included in multiple human region data respectively with combination to be compared Action data concentrates the M individual sense action data matchings included.

Alternatively, segmentation module 41 is further adapted for：

According to appearance order of each picture frame in video data, what is included in real time in acquisition video data currently waits to locate The picture frame of reason, carries out human body segmentation's processing to currently pending picture frame, obtains opposite with currently pending picture frame The human region data answered.

Alternatively, comparison module 42 is further adapted for：

By the currently pending corresponding human region data of picture frame respectively with being included in each combinative movement data set Multiple body-sensing action datas be compared；

Comparative result is determined as the first action data for successful body-sensing action data, by where the first action data Combinative movement data set is determined as the first action data collection；

The corresponding human region data of rear N number of picture frame and the first action number corresponding to by currently pending picture frame The each body-sensing action data included according to concentrating is compared；Wherein, N is the natural number more than or equal to 1.

Alternatively, processing rule acquisition module 45 is further adapted for：

Storehouse is handled according to default combinative movement, determines the combinative movement data set with multiple human region data match Corresponding combinative movement processing rule；

Wherein, combinative movement processing storehouse is used to store the combinative movement processing rule corresponding to each combinative movement data set Then.

Alternatively, combinative movement processing rule includes：According to the corresponding effect textures of combinative movement data set, to regarding Frequency is according to being handled.

Alternatively, processing module 46 is further adapted for：

At the rear L picture frame corresponding to currently pending picture frame and/or currently pending picture frame Reason；Wherein, L is the natural number more than 1.

Alternatively, video data includes：By the video data, and/or human-computer interaction class of image capture device captured in real-time The video data included in game.

The concrete structure and operation principle of above-mentioned modules can refer to the description of corresponding steps in embodiment of the method, herein Repeat no more.

The another embodiment of the application provides a kind of nonvolatile computer storage media, and the computer-readable storage medium is deposited An at least executable instruction is contained, which can perform the video data in above-mentioned any means embodiment Processing method.

Fig. 5 shows a kind of structure diagram of computing device according to embodiments of the present invention, the specific embodiment of the invention The specific implementation to computing device does not limit.

As shown in figure 5, the computing device can include：Processor (processor) 502, communication interface (Communications Interface) 504, memory (memory) 506 and communication bus 508.

Wherein：

Processor 502, communication interface 504 and memory 506 complete mutual communication by communication bus 508.

Communication interface 504, for communicating with the network element of miscellaneous equipment such as client or other servers etc..

Processor 502, for executive program 510, in the processing method embodiment that can specifically perform above-mentioned video data Correlation step.

Specifically, program 510 can include program code, which includes computer-managed instruction.

Processor 502 is probably central processor CPU, or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the embodiment of the present invention one or more integrate electricity Road.The one or more processors that computing device includes, can be same type of processors, such as one or more CPU；Also may be used To be different types of processor, such as one or more CPU and one or more ASIC.

Memory 506, for storing program 510.Memory 506 may include high-speed RAM memory, it is also possible to further include Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.

Program 510 specifically can be used for so that processor 502 performs following operation：For the multiple images in video data Frame carries out human body segmentation's processing, obtains and the corresponding multiple human region data of multiple images frame；By multiple human region numbers According to respectively compared with the multiple body-sensing action datas included in default combinative movement data set；When definite comparative result accords with When closing preset matching rule, determine audio instructions according to the corresponding voice data of multiple images frame, judge and multiple human bodies Whether the combinative movement data set that area data matches matches with audio instructions；If so, obtain and multiple human region data Combinative movement processing rule corresponding to the combinative movement data set to match, rule is handled to video data according to combinative movement Handled, the video data after display processing.

In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour Make：Speech recognition is carried out for voice data corresponding with multiple images frame, obtains voice recognition result；According to default sound Frequency instruction database determines and the corresponding audio instructions of the voice recognition result；Wherein, audio instructions storehouse is used to store each audio Instruction.

In a kind of optional mode, audio instructions storehouse is further used for storing each audio instructions and its corresponding combination Mapping relations between action data collection；Program 510 can specifically be further used for so that processor 502 performs following operation： Determine whether matched with audio instructions with the combinative movement data set of multiple human region data match according to audio instructions storehouse.

In a kind of optional mode, default combinative movement data set includes：It is multiple to be stored in default body-sensing action Combinative movement data set in storehouse, and include at least two body-sensing action datas in each combinative movement data set；Then program 510 It can specifically be further used for so that processor 502 performs following operation：Multiple human region data are acted with body-sensing respectively The multiple body-sensing action datas included in each combinative movement data set stored in storehouse are compared.

In a kind of optional mode, preset matching rule includes：When the M human body included in multiple human region data When area data is matched with the M individual sense action datas included in combinative movement data set to be compared respectively, determine more personal Body region data meet matched rule with combinative movement data set to be compared；Wherein, the total quantity of multiple human region data More than or equal to M, the total quantity of the multiple body-sensing action datas included in combinative movement data set to be compared is greater than or equal to M；Wherein, M is the natural number more than 1.

In a kind of optional mode, each body-sensing action data included in combinative movement data set to be compared has Time sequence number identifies, then program 510 can specifically be further used for so that processor 502 performs following operation：Judge more personal The M human region data included in body region data in video data occur order whether with combinative movement to be compared The time sequence number mark matching of the M individual sense action datas included in data set；If, it is determined that in multiple human region data Comprising M human region data respectively with included in combinative movement data set to be compared M individual sense action data match.

In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour Make：According to appearance order of each picture frame in video data, included in real time in acquisition video data currently pending Picture frame, carries out human body segmentation's processing to currently pending picture frame, obtains corresponding with currently pending picture frame Human region data.

In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour Make：By the currently pending corresponding human region data of picture frame respectively with included in each combinative movement data set it is multiple Body-sensing action data is compared；Comparative result is determined as the first action data for successful body-sensing action data, by first Combinative movement data set where action data is determined as the first action data collection；By corresponding to currently pending picture frame The corresponding human region data of N number of picture frame concentrate each body-sensing action data included to be compared with the first action data afterwards Compared with；Wherein, N is the natural number more than or equal to 1.

In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour Make：Storehouse is handled according to default combinative movement, it is right with the combinative movement data set of multiple human region data match institute to determine The combinative movement processing rule answered；Wherein, combinative movement processing storehouse is used to store the group corresponding to each combinative movement data set Conjunction action processing rule.

In a kind of optional mode, combinative movement processing rule includes：According to corresponding with combinative movement data set Effect textures, handle video data.

In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour Make：Rear L picture frame corresponding to currently pending picture frame and/or currently pending picture frame is handled；Its In, L is the natural number more than 1.

In a kind of optional mode, video data includes：By image capture device captured in real-time video data and/ Or the video data included in the game of human-computer interaction class.

Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein. Various general-purpose systems can also be used together with teaching based on this.As described above, required by constructing this kind of system Structure be obvious.In addition, the present invention is not also directed to any certain programmed language.It should be understood that it can utilize various Programming language realizes the content of invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.

In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.

Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：I.e. required guarantor The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself Separate embodiments all as the present invention.

Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, summary and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation Replace.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.

The all parts embodiment of the present invention can be with hardware realization, or to be run on one or more processor Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice Microprocessor or digital signal processor (DSP) realize the processing computing device of video data according to embodiments of the present invention In some or all components some or all functions.The present invention is also implemented as being used to perform as described herein The some or all equipment or program of device (for example, computer program and computer program product) of method.So Realization the present invention program can store on a computer-readable medium, or can have one or more signal shape Formula.Such signal can be downloaded from internet website and obtained, and either be provided or with any other shape on carrier signal Formula provides.

It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.

Claims

1. a kind of processing method of video data, it includes：

Human body segmentation's processing is carried out for the multiple images frame in the video data, is obtained corresponding with described multiple images frame Multiple human region data；

By the multiple human region data multiple body-sensing action datas with being included in default combinative movement data set respectively It is compared；

When definite comparative result meets preset matching rule, determined according to the corresponding voice data of described multiple images frame Audio instructions, judge with the combinative movement data set of the multiple human region data match and the audio instructions whether Match somebody with somebody；

If so, the combinative movement obtained corresponding to the combinative movement data set with the multiple human region data match is handled Rule, handles rule according to the combinative movement and the video data is handled, the video data after display processing.

2. according to the method described in claim 1, wherein, the basis and the corresponding voice data of described multiple images frame are true The step of determining audio instructions specifically includes：

Determined and the corresponding audio instructions of the voice recognition result according to default audio instructions storehouse；Wherein, the audio refers to Storehouse is made to be used to store each audio instructions.

3. according to the method described in claim 2, wherein, the audio instructions storehouse be further used for storing each audio instructions and Mapping relations between its corresponding combinative movement data set；

Then whether the judgement and the combinative movement data set of the multiple human region data match and the audio instructions The step of matching, specifically includes：

According to the audio instructions storehouse determine with the combinative movement data set of the multiple human region data match with it is described Whether audio instructions match.

4. according to any methods of claim 1-3, wherein, the default combinative movement data set includes：It is multiple to deposit The combinative movement data set in default body-sensing maneuver library is stored up, and at least two body-sensings are included in each combinative movement data set Action data；

It is then described to move multiple body-sensings of the multiple human region data respectively with being included in default combinative movement data set The step of being compared as data specifically includes：

By the multiple human region data respectively with being wrapped in each combinative movement data set for being stored in the body-sensing maneuver library The multiple body-sensing action datas contained are compared.

5. according to any methods of claim 1-4, wherein, the preset matching rule includes：

When the M human region data included in the multiple human region data respectively with combinative movement data set to be compared In include M individual sense action data matching when, determine the multiple human region data and the combinative movement to be compared Data set meets the matched rule；

Wherein, the total quantity of the multiple human region data is greater than or equal to M, in the combinative movement data set to be compared Comprising the total quantitys of multiple body-sensing action datas be greater than or equal to M；Wherein, M is the natural number more than 1.

6. according to the method described in claim 5, wherein, each body-sensing included in the combinative movement data set to be compared Action data with time sequence number identify, then the M human region data included in the multiple human region data respectively with The step of M individual sense action data matchings included in combinative movement data set to be compared, specifically includes：

Appearance order of the M human region data for judging to include in the multiple human region data in the video data Whether matched with the time sequence number mark of the M individual sense action datas included in combinative movement data set to be compared；

If, it is determined that the M human region data included in the multiple human region data respectively with combination to be compared Action data concentrates the M individual sense action data matchings included.

7. according to any methods of claim 1-6, wherein, the multiple images frame being directed in the video data into Pedestrian's body dividing processing, the step of obtaining multiple human region data corresponding with described multiple images frame, specifically include：

According to appearance order of each picture frame in the video data, obtain in real time included in the video data it is current Pending picture frame, human body segmentation's processing is carried out to the currently pending picture frame, obtain with it is described currently pending The corresponding human region data of picture frame.

8. a kind of processing unit of video data, it includes：

Split module, suitable for in the video data multiple images frame carry out human body segmentation's processing, obtain with it is described more A corresponding multiple human region data of picture frame；

Comparison module, suitable for by the multiple human region data respectively with included in default combinative movement data set it is multiple Body-sensing action data is compared；

Audio instructions determining module, suitable for when definite comparative result meet preset matching rule when, according to described multiple images The corresponding voice data of frame determines audio instructions；

Judgment module, suitable for judging that the combinative movement data set with the multiple human region data match refers to the audio Whether order matches；

Rule acquisition module is handled, if suitable for judging the combinative movement data set with the multiple human region data match Matched with the audio instructions, obtain the group corresponding to the combinative movement data set with the multiple human region data match Conjunction action processing rule；

Processing module, is handled the video data suitable for handling rule according to the combinative movement；

Display module, suitable for the video data after display processing.

9. a kind of computing device, including：Processor, memory, communication interface and communication bus, the processor, the storage Device and the communication interface complete mutual communication by the communication bus；

The memory is used to store an at least executable instruction, and the executable instruction makes the processor perform right such as will Ask the corresponding operation of processing method of the video data any one of 1-7.

10. a kind of computer-readable storage medium, an at least executable instruction, the executable instruction are stored with the storage medium Make the corresponding operation of processing method of video data of the processor execution as any one of claim 1-7.