CN107995442A - Processing method, device and the computing device of video data - Google Patents
Processing method, device and the computing device of video data Download PDFInfo
- Publication number
- CN107995442A CN107995442A CN201711395657.2A CN201711395657A CN107995442A CN 107995442 A CN107995442 A CN 107995442A CN 201711395657 A CN201711395657 A CN 201711395657A CN 107995442 A CN107995442 A CN 107995442A
- Authority
- CN
- China
- Prior art keywords
- data
- human region
- combinative movement
- data set
- video data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 230000009471 action Effects 0.000 claims abstract description 186
- 238000012545 processing Methods 0.000 claims abstract description 112
- 241001269238 Data Species 0.000 claims abstract description 57
- 238000000034 method Methods 0.000 claims abstract description 29
- 230000011218 segmentation Effects 0.000 claims abstract description 25
- 230000000052 comparative effect Effects 0.000 claims abstract description 24
- 238000004891 communication Methods 0.000 claims description 17
- 239000012141 concentrate Substances 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 abstract description 10
- 230000000875 corresponding effect Effects 0.000 description 141
- 238000013480 data collection Methods 0.000 description 23
- 230000003993 interaction Effects 0.000 description 16
- 238000001514 detection method Methods 0.000 description 14
- 238000010009 beating Methods 0.000 description 11
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000006399 behavior Effects 0.000 description 5
- 210000000746 body region Anatomy 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000000392 somatic effect Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000013506 data mapping Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000035807 sensation Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000013016 learning Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of processing method of video data, device and computing device, wherein, method includes:Human body segmentation's processing is carried out for the multiple images frame in video data, obtains multiple human region data;By multiple human region data respectively compared with the multiple body-sensing action datas included in default combinative movement data set;When definite comparative result meets preset matching rule, audio instructions are determined according to the corresponding voice data of multiple images frame, whether judgement matches with the combinative movement data set of multiple human region data match with audio instructions;Video data is handled if so, handling rule according to combinative movement, the video data after display processing.Which is acted using body-sensing and voice data is handled video data as driving, is improved the accuracy of processing, is reduced False Rate, improve the display effect of video data, suitable for any mobile terminal with camera, Anti infrared interference ability is strong, and cost is low.
Description
Technical field
The present invention relates to technical field of image processing, and in particular to a kind of processing method of video data, device and calculating
Equipment.
Background technology
With the development of science and technology, higher level human-computer interaction theory proposes interactive mode higher and higher want
Ask, for example, body-sensing man-machine interaction mode, people can directly use limb action very much, with the device on periphery or environment into
Row is interactive, without using any complicated control device, people can be allowed to do interaction with device or environment with being personally on the scene.
But inventor has found in the implementation of the present invention:Body-sensing man-machine interaction mode of the prior art is often
Need the body-sensing for accurately catching user to act, such as need the artis for positioning human body to be acted with the body-sensing of definite user;
Secondly, body-sensing man-machine interaction mode of the prior art often relies on the camera of high accuracy, high depth with the body-sensing to user
Action is predicted, but the camera of high accuracy, high depth is of high cost, and can only be in the situation without strong Infrared jamming
Lower use, the man-machine interaction mode based on which are difficult to promote on mobile terminals;In addition, the body-sensing based on RGB image is moved
Make to catch and generally require very big calculation amount;In addition, often individually man-machine friendship is carried out by driving of body-sensing action in the prior art
Mutually, which cannot be guaranteed the accuracy of processing, and there are certain False Rate.It can be seen from the above that lack a kind of energy in the prior art
Enough solutions to the problems described above well.
The content of the invention
In view of the above problems, it is proposed that the present invention overcomes the above problem in order to provide one kind or solves at least in part
State processing method, device and the computing device of the video data of problem.
According to an aspect of the invention, there is provided a kind of processing method of video data, including:For the video counts
Multiple images frame in carries out human body segmentation's processing, obtains and the corresponding multiple human region numbers of described multiple images frame
According to;By the multiple human region data respectively with multiple body-sensing action datas for being included in default combinative movement data set into
Row compares;When definite comparative result meet preset matching rule when, according to the corresponding voice data of described multiple images frame
Determine audio instructions, judgement and the combinative movement data set of the multiple human region data match are with the audio instructions
No matching;If so, obtain the combinative movement corresponding to the combinative movement data set with the multiple human region data match
Processing rule, handles rule according to the combinative movement and the video data is handled, the video data after display processing.
Alternatively, the step of basis determines audio instructions with the corresponding voice data of described multiple images frame is specific
Including:
Speech recognition is carried out for voice data corresponding with described multiple images frame, obtains voice recognition result;
Determined and the corresponding audio instructions of the voice recognition result according to default audio instructions storehouse;Wherein, the sound
Frequency instruction database is used to store each audio instructions.
Alternatively, the audio instructions storehouse is further used for storing each audio instructions and its corresponding combinative movement data
Mapping relations between collection;
Then combinative movement data set and the audio instructions of the judgement with the multiple human region data match
The step of whether matching specifically includes:
According to the audio instructions storehouse determine with the combinative movement data set of the multiple human region data match with
Whether the audio instructions match.
Alternatively, the default combinative movement data set includes:Multiple groups being stored in default body-sensing maneuver library
Action data collection is closed, and at least two body-sensing action datas are included in each combinative movement data set;
Then it is described by the multiple human region data respectively with included in default combinative movement data set it is more individual
The step of sense action data is compared specifically includes:
By the multiple human region data each combinative movement data set with being stored in the body-sensing maneuver library respectively
In multiple body-sensing action datas for including be compared.
Alternatively, the preset matching rule includes:
When the M human region data included in the multiple human region data respectively with combinative movement number to be compared
During the M individual sense action data matchings included according to concentrating, the multiple human region data and the combination to be compared are determined
Action data collection meets the matched rule;
Wherein, the total quantity of the multiple human region data is greater than or equal to M, the combinative movement data to be compared
The total quantity of the multiple body-sensing action datas included is concentrated to be greater than or equal to M;Wherein, M is the natural number more than 1.
Alternatively, each body-sensing action data included in the combinative movement data set to be compared has time sequence number
Mark, then the M human region data included in the multiple human region data respectively with combinative movement data to be compared
The step of concentrating the M included individual sense action data matchings specifically includes:
Appearance of the M human region data for judging to include in the multiple human region data in the video data
Whether order matches with the time sequence number mark of the M individual sense action datas included in combinative movement data set to be compared;
If, it is determined that the M human region data included in the multiple human region data respectively with it is to be compared
The M individual sense action data matchings included in combinative movement data set.
Alternatively, it is described be directed to the video data in multiple images frame carry out human body segmentation's processing, obtain with it is described
The step of multiple images frame corresponding multiple human region data, specifically includes:
According to appearance order of each picture frame in the video data, obtain what is included in the video data in real time
Currently pending picture frame, carries out human body segmentation's processing to the currently pending picture frame, obtains currently treating with described
The corresponding human region data of picture frame of processing.
Alternatively, it is described by the multiple human region data respectively with included in default combinative movement data set it is more
The step of individual sense action data is compared specifically includes:
By the corresponding human region data of the currently pending picture frame respectively with each combinative movement data set
Comprising multiple body-sensing action datas be compared;
Comparative result is determined as the first action data for successful body-sensing action data, by the first action data institute
Combinative movement data set be determined as the first action data collection;
Corresponding human region data of rear N number of picture frame corresponding to by the currently pending picture frame and described the
One action data concentrates each body-sensing action data included to be compared;Wherein, N is the natural number more than or equal to 1.
Alternatively, group of the acquisition corresponding to the combinative movement data set of the multiple human region data match
The step of conjunction action processing rule specifically includes:
Storehouse is handled according to default combinative movement, determines the combinative movement number with the multiple human region data match
Rule is handled according to the corresponding combinative movement of collection;
Wherein, the combinative movement processing storehouse is used to store the combinative movement processing corresponding to each combinative movement data set
Rule.
Alternatively, the combinative movement processing rule includes:According to the corresponding effect textures of combinative movement data set,
The video data is handled.
Alternatively, described the step of being handled according to combinative movement processing rule the video data, specifically wraps
Include:
To the rear L picture frame corresponding to currently pending picture frame and/or the currently pending picture frame into
Row processing;Wherein, the L is the natural number more than 1.
Alternatively, the video data includes:Video data, and/or man-machine friendship by image capture device captured in real-time
The video data included in mutual class game.
According to another aspect of the present invention, there is provided a kind of processing unit of video data, including:Split module, be suitable for
Human body segmentation's processing is carried out for the multiple images frame in the video data, is obtained corresponding more with described multiple images frame
A human region data;Comparison module, suitable for by the multiple human region data respectively with default combinative movement data set
In multiple body-sensing action datas for including be compared;Audio instructions determining module, suitable for that ought determine that it is default that comparative result meets
During matched rule, audio instructions are determined according to the corresponding voice data of described multiple images frame;Judgment module, suitable for judging
Whether matched with the audio instructions with the combinative movement data set of the multiple human region data match;Processing rule obtains
Modulus block, if suitable for judging and the combinative movement data set of the multiple human region data match and the audio instructions
Matching, obtains the combinative movement processing corresponding to the combinative movement data set with the multiple human region data match and advises
Then;Processing module, is handled the video data suitable for handling rule according to the combinative movement;Display module, is suitable for
Video data after display processing.
Alternatively, the audio instructions determining module is further adapted for:
Speech recognition is carried out for voice data corresponding with described multiple images frame, obtains voice recognition result;
Determined and the corresponding audio instructions of the voice recognition result according to default audio instructions storehouse;Wherein, the sound
Frequency instruction database is used to store each audio instructions.
Alternatively, the audio instructions storehouse is further used for storing each audio instructions and its corresponding combinative movement data
Mapping relations between collection;
Then the judgment module is further adapted for:
According to the audio instructions storehouse determine with the combinative movement data set of the multiple human region data match with
Whether the audio instructions match.
Alternatively, the default combinative movement data set includes:Multiple groups being stored in default body-sensing maneuver library
Action data collection is closed, and at least two body-sensing action datas are included in each combinative movement data set;
The comparison module is further adapted for:
By the multiple human region data each combinative movement data set with being stored in the body-sensing maneuver library respectively
In multiple body-sensing action datas for including be compared.
Alternatively, the preset matching rule includes:
When the M human region data included in the multiple human region data respectively with combinative movement number to be compared
During the M individual sense action data matchings included according to concentrating, the multiple human region data and the combination to be compared are determined
Action data collection meets the matched rule;
Wherein, the total quantity of the multiple human region data is greater than or equal to M, the combinative movement data to be compared
The total quantity of the multiple body-sensing action datas included is concentrated to be greater than or equal to M;Wherein, M is the natural number more than 1.
Alternatively, each body-sensing action data included in the combinative movement data set to be compared has time sequence number
Mark, then the comparison module is further adapted for:
Appearance of the M human region data for judging to include in the multiple human region data in the video data
Whether order matches with the time sequence number mark of the M individual sense action datas included in combinative movement data set to be compared;
If, it is determined that the M human region data included in the multiple human region data respectively with it is to be compared
The M individual sense action data matchings included in combinative movement data set.
Alternatively, the segmentation module is further adapted for:
According to appearance order of each picture frame in the video data, obtain what is included in the video data in real time
Currently pending picture frame, carries out human body segmentation's processing to the currently pending picture frame, obtains currently treating with described
The corresponding human region data of picture frame of processing.
Alternatively, the comparison module is further adapted for:
By the corresponding human region data of the currently pending picture frame respectively with each combinative movement data set
Comprising multiple body-sensing action datas be compared;
Comparative result is determined as the first action data for successful body-sensing action data, by the first action data institute
Combinative movement data set be determined as the first action data collection;
Corresponding human region data of rear N number of picture frame corresponding to by the currently pending picture frame and described the
One action data concentrates each body-sensing action data included to be compared;Wherein, N is the natural number more than or equal to 1.
Alternatively, the processing rule acquisition module is further adapted for:
Storehouse is handled according to default combinative movement, determines the combinative movement number with the multiple human region data match
Rule is handled according to the corresponding combinative movement of collection;
Wherein, the combinative movement processing storehouse is used to store the combinative movement processing corresponding to each combinative movement data set
Rule.
Alternatively, the combinative movement processing rule includes:According to the corresponding effect textures of combinative movement data set,
The video data is handled.
Alternatively, the processing module is further adapted for:
To the rear L picture frame corresponding to currently pending picture frame and/or the currently pending picture frame into
Row processing;Wherein, the L is the natural number more than 1.
Alternatively, the video data includes:Video data, and/or man-machine friendship by image capture device captured in real-time
The video data included in mutual class game.
According to another aspect of the invention, there is provided a kind of computing device, including:Processor, memory, communication interface and
Communication bus, the processor, the memory and the communication interface complete mutual communication by the communication bus;
The memory is used to store an at least executable instruction, and it is above-mentioned that the executable instruction performs the processor
The corresponding operation of processing method of video data.
In accordance with a further aspect of the present invention, there is provided a kind of computer-readable storage medium, be stored with the storage medium to
A few executable instruction, the executable instruction make processor perform the corresponding operation of processing method such as above-mentioned video data.
Processing method, device and the computing device of the video data provided according to the present invention, this method can be quickly and accurate
The body-sensing action of human body really is captured, video data is handled as driving using body-sensing action and voice data, is caught
The video data that body-sensing action does not depend on high accuracy, the camera of high depth is shot is caught, suitable for any shifting with camera
Dynamic terminal, Anti infrared interference ability is strong, and cost is low;Provide a kind of being acted with body-sensing based on human region segmentation and audio
Man-machine interaction mode of the data as driving, can act according to body-sensing and voice data quickly determines to carry out video data
Processing processing rule, only just performed on the premise of multiple image and the equal successful match of voice data to video data into
The step of row processing, therefore, the accuracy of processing is improved, reduce False Rate, and the video data after display processing, lifted
The display effect of video data, improves the recreational of human-computer interaction.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area
Technical staff will be clear understanding.Attached drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole attached drawing, identical component is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows the flow chart of the processing method of video data according to an embodiment of the invention;
Fig. 2 shows the flow chart of the processing method of video data in accordance with another embodiment of the present invention;
Fig. 3 shows the flow diagram for each sub-steps that step S220 is included;
Fig. 4 shows the structure diagram of the processing unit of the video data of further embodiment according to the present invention;
Fig. 5 shows the structure diagram of computing device according to embodiments of the present invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Completely it is communicated to those skilled in the art.
Fig. 1 shows the flow chart of the processing method of video data according to an embodiment of the invention.As shown in Figure 1,
This method comprises the following steps:
Step S110, carries out human body segmentation's processing for the multiple images frame in video data, obtains and multiple images frame
Corresponding multiple human region data.
Video data can be the real-time video data of camera shooting, can also be either the pre- of local or high in the clouds
The video data of camera recording is first passed through, can also be the video data being combined into by multiple pictures.Wherein, multiple images frame
Can be the multiple images frame that default time interval is spaced in continuous multiple images frame or video data, this hair
Bright concrete form and source to video data is without limiting.
Human body segmentation's processing is carried out to multiple images frame specifically to may be accomplished by:First, each figure is detected
As the human region in frame, can specifically be classified by the pixel included to each picture frame to judge each picture frame
In human region.Then, human region is split from corresponding picture frame, specifically can be by the corresponding picture of human region
Vegetarian refreshments is split, and obtains multiple human region data corresponding with each picture frame.
Step S120, by multiple human region data multiple body-sensings with being included in default combinative movement data set respectively
Action data is compared.
In the method for the present embodiment, the operation handled video data is triggered according to body-sensing combination of actions, therefore need
Judge whether multiple human region data meet trigger condition, the multiple body-sensings included in default combinative movement data set are moved
It is to judge whether multiple human region data meet the foundation of trigger condition as data.Wherein, human region data may include people
Whether the pixel and the coordinate position of pixel that body region includes, this step specifically may determine that multiple human region data
It is consistent with multiple body-sensing action datas respectively, or multiple human region data and multiple body-sensing action datas matching degree whether
Include multiple body-sensing action datas more than preset matching degree threshold value, such as default beating dragon 18 palms combinative movement data set, then
By multiple human region data and the plurality of body-sensing action data respectively compared with.
Step S130, when definite comparative result meet preset matching rule when, according to the corresponding sound of multiple images frame
Frequency is according to audio instructions are determined, whether the combinative movement data set and audio instructions of judgement and multiple human region data match
Matching.
Preset matching rule can be configured according to specific application scenarios, such as stronger for real-time and interactivity
Game or live application scenarios, recognize when multiple human region data and the relatively low matching degree of multiple body-sensing action datas
To meet preset matching rule, or the application scenarios of post-processing are carried out for video data, when multiple human region data
With the matching degree of multiple body-sensing action datas it is higher when think to meet preset matching rule, in concrete application, art technology
Personnel can be configured according to actual needs.
In actual application, there are the multiple body-sensing action data ratios included respectively in different combinative movement data sets
More similar situation, feels action data for example, beating ground combinative movement data set and including two individuals, first lifts the right hand and put down the right side with after
Hand, spreads colored combinative movement data set and includes two individual sense action datas, first lift the right side according to from the direction in the lower left corner to the upper right corner
Hand puts down the right hand with after, then may go out during corresponding combinative movement data set is determined according to multiple somatic datas
Mistake, that is, multiple body-sensing action datas that the combinative movement data set determined includes and the body-sensing action of actually user differ
Cause.
Therefore, the method for the present embodiment further determines that audio refers to according to the corresponding voice data of multiple images frame
Order, specifically can obtain audio instructions by carrying out speech recognition to voice data, judge and multiple human region data match
Combinative movement data set whether matched with audio instructions.Specifically in advance audio can be set for multiple combinative movement data sets respectively
Instruction, when the comparative result of multiple human region data and multiple body-sensing action datas meets preset matching rule, it is determined that
With the corresponding combinative movement data set of the plurality of human region data, further determine that the plurality of human region data are corresponding
Audio instructions corresponding to multiple images frame, judge the corresponding audio instructions of the plurality of picture frame and the combinative movement data set pair
Whether the audio instructions answered match.
Step S140, if judging and the combinative movement data set of multiple human region data match and audio instructions
Match somebody with somebody, obtain the combinative movement processing rule corresponding to the combinative movement data set with multiple human region data match, according to
Combinative movement processing rule handles video data, the video data after display processing.
When judging to match with audio instructions with the combinative movement data set of multiple human region data match, then obtain
The combinative movement corresponding to the combinative movement data set is taken to handle rule, it follows that the method for the present embodiment is according to multiple
Human region data and its corresponding audio instructions determine with the corresponding combinative movement data set of multiple human region data, should
Mode can improve the accuracy of processing, while can lift the recreational of human-computer interaction.For example, only work as multiple human regions
Data are matched with beating dragon 18 palms action data collection, and corresponding audio instructions are also corresponding with beating dragon 18 palms action data collection
When audio instructions match, the corresponding combinative movement processing rule of beating dragon 18 palms action data collection is just obtained.
Video data is handled specially the picture frame that video data includes is handled, combinative movement processing rule
Can be then all kinds of processing rules such as addition special effect processing rule, such as according to the corresponding combination of beating dragon 18 palms action data collection
Action processing rule handles each picture frame included in video data, and the video data of processing is shown,
So that the video data of display includes the special efficacy of beating dragon 18 palms.The present invention does not limit the specific rules of Video processing, only
Video display effect can be lifted.
According to the image processing method provided in this embodiment based on image capture device, in video data
Multiple images frame carries out human body segmentation's processing, obtains and the corresponding multiple human region data of multiple images frame;Will be more personal
Body region data are respectively compared with the multiple body-sensing action datas included in default combinative movement data set;When definite ratio
When relatively result meets preset matching rule, obtain corresponding to the combinative movement data set with multiple human region data match
Combinative movement processing rule;Rule is handled according to combinative movement to handle video data, the video data after display processing.
Which can quickly and accurately capture the body-sensing action of human body, moved by combination make and two kinds of voice data because
The corresponding combinative movement data set of the definite multiple images frame of element, so as to handle video data, therefore improves processing
Accuracy, reduces False Rate, and improves the recreational of human-computer interaction, and captor move do not depend on high accuracy,
The video data of the camera shooting of high depth, suitable for any mobile terminal with camera, Anti infrared interference ability is strong,
Cost is low.
Fig. 2 shows the flow chart of the processing method of video data in accordance with another embodiment of the present invention, such as Fig. 2 institutes
Show, this method includes:
Step S210, carries out human body segmentation's processing for the multiple images frame in video data, obtains and multiple images frame
Corresponding multiple human region data.
Specifically, the appearance order according to each picture frame in video data, obtains what is included in video data in real time
Currently pending picture frame, carries out human body segmentation's processing to currently pending picture frame, obtains and currently pending figure
As the corresponding human region data of frame.
Video data can be the video data photographed by camera, go out according to each picture frame in video data
Existing sequencing, obtains the currently pending picture frame included in video data in real time, wherein, due to the side of the present embodiment
Method is the operation handled according to multiple body-sensing action triggers video data, it is therefore desirable to obtains what is included in video data
Multiple images frame is handled.Specifically, video data can also be the video data that shooting is recorded in advance, at this time, this method
It is that postprocessing operation is carried out to video data, can be successively by specified time section in video data according to time order and function order
The each picture frame inside included is determined as currently pending picture frame successively, can also be determined currently to wait to locate according to detection algorithm
The picture frame of reason, specifically, the picture frame of human region, bag by the picture frame and its afterwards is included according to detection algorithm detection
Multiple images frame containing human region is determined as currently pending picture frame successively;Video data can also include:By image
The video data included in video data, and/or human-computer interaction the class game of collecting device captured in real-time, such as live scene,
The video data that vision facilities gathers in real time in somatic sensation television game interaction scenarios, it is then every by what is included in video data in real time at this time
One two field picture is determined as currently pending picture frame according to time-series successively, and this is not limited by the present invention.
Human body segmentation's processing is carried out to currently pending picture frame specifically to may be accomplished by:First, detect
Human region in currently pending picture frame, can specifically be detected in currently pending picture frame by neural network algorithm
Comprising human region.Wherein, neural network algorithm can constantly learn the feature of human region by modes such as deep learnings, and
The human region included in currently pending picture frame is detected according to learning outcome.Then, by the human region detected from
Split in currently pending picture frame, the corresponding pixel of human region can specifically be split, obtain with it is each
The corresponding multiple human region data of picture frame, wherein, human region data include the corresponding pixel of human region and
The information such as the positional information of pixel, the colouring information of pixel.
The side that the human region included in currently pending picture frame is detected by neural network algorithm mentioned above
Formula belongs to detection mode.In addition to being realized by detection mode, this step can also be further combined with being realized by track algorithm
Tracking mode carries out human body segmentation's processing to currently pending picture frame.Specifically, detect currently to treat by detection mode
In the picture frame of processing after human region, the positional information of human region is supplied to tracker, by tracker according to current
The position of human region in pending picture frame to the human region in follow-up picture frame into line trace, due to usual feelings
Under condition, the same area in video data, there are relevance, therefore, can be added in continuous multiple image by tracking mode
The detection efficiency of fast subsequent image frames.Also, tracking result can also be supplied to the detector for detection by tracker, for
Detector determines one piece of regional area as detection range from whole two field picture, and is only detected in the detection range, from
And lift detection efficiency.In short, by the combined use of detection mode and tracking mode, the efficiency and essence of detection can be lifted
Degree.
Step S220, by multiple human region data multiple body-sensings with being included in default combinative movement data set respectively
Action data is compared.
Wherein, default combinative movement data set includes:Multiple combinative movements being stored in default body-sensing maneuver library
Data set, and include at least two body-sensing action datas in each combinative movement data set;Then by multiple human region data point
Not compared with the multiple body-sensing action datas included in default combinative movement data set the step of, specifically includes:Will be multiple
Human region data act number with the multiple body-sensings included in each combinative movement data set for being stored in body-sensing maneuver library respectively
According to being compared.
Body-sensing maneuver library is pre-set, since the method for the present embodiment is according to a set of continuous multiple body-sensings detected
The operation that action triggers handle video data, and moving work only by an individual cannot trigger to video data progress
Therefore at least two individual sense action datas, are determined as a combinative movement data set, by group by the operation of processing in the present embodiment
Close action data collection and its association of corresponding at least two body-sensings action data is stored in body-sensing maneuver library.When from each image
It is partitioned into frame after multiple human region data, multiple human region data and multiple body-sensing action datas is compared respectively
Compared with to determine the corresponding combinative movement data set of multiple human region data.
Wherein, the multiple body-sensing action datas included in default combinative movement data set are respectively provided with time sequence number mark,
For example, first lifted in a combinative movement data set comprising lifting the right hand and putting down two individual sense action datas of the right hand
The right hand is put down after the right hand and corresponds to beat ground combinative movement data set, and first puts down that the right hand combination that corresponds to swish a whip is lifted after the right hand is dynamic
Make data set, it follows that various combination action data, which is concentrated, may include the same body-sensing action data, it is each by setting
The time sequence number mark of body-sensing action data can distinguish each combinative movement data set.
Specifically, in the present embodiment, further more sub-steps, Fig. 3 show what step S220 was included to this step
The flow diagram of each sub-steps, as shown in figure 3, step S220 is specifically included:
Sub-step S221, by the currently pending corresponding human region data of picture frame respectively with each combinative movement number
The multiple body-sensing action datas included according to concentrating are compared.
The human region data being partitioned into currently pending picture frame, by human region data respectively with each body-sensing
Action data is compared, and the profile of human region can be specifically determined according to the pixel information that human region data are included
And/or area, each body-sensing of the profile of human region and/or area respectively with being included in each combinative movement data set is moved
The profile and/or area for making the human region corresponding to data are compared, in addition, in order to improve matching efficiency, can be by currently
The first individual corresponding with each combinative movement data set moves work to the corresponding human region data of pending picture frame respectively
Data are compared, or the forward multiple body-sensing action datas of order corresponding with each combinative movement data set carry out respectively
Compare.
Sub-step S222, is determined as the first action data for successful body-sensing action data by comparative result, first is moved
Combinative movement data set where making data is determined as the first action data collection.
It is compared according to sub-step S221, in the present embodiment, if the corresponding human region of currently pending picture frame
Profile it is consistent with the profile of a corresponding human region of individual sense action data or matching degree is more than default outline
Spend threshold value, and/or the area of the corresponding human region of currently pending picture frame face corresponding with an individual sense action data
Difference between product is consistent or both is less than default difference threshold, then it is assumed that the human body area of the currently pending picture frame
It is that successful body-sensing action data is determined as the by comparative result successfully that the comparative result of numeric field data and the body-sensing action data, which be,
One action data, is determined as the first action data collection by the combinative movement data set where the first action data.
Sub-step S223, the corresponding human region data of rear N number of picture frame corresponding to by currently pending picture frame
Compared with each body-sensing action data included being concentrated with the first action data;Wherein, N is the natural number more than or equal to 1.
Corresponding human region data of rear N number of picture frame corresponding to by picture frame currently pending in video data with
First action data concentrates each body-sensing action data included to be compared respectively, and the mode compared can be found in above-mentioned steps
The method of S221, details are not described herein, for example, carrying out dividing processing to currently pending picture frame, obtains corresponding human body
Area data, which is compared with each body-sensing action data included in each combinative movement data set
Compared with if being successfully, by the body-sensing action data there are an individual sense action data and the comparative result of the human region data
The combinative movement data set at place is determined as the first action data collection, then further will be each after currently pending picture frame
The corresponding human region data of picture frame are led to compared with each body-sensing action data that the first action data collection is included
The scope of comparison other can be reduced by crossing which, accelerate to inquire about the process of action data collection corresponding with each picture frame.
Step S230, when definite comparative result meet preset matching rule when, for the corresponding sound of multiple images frame
Frequency obtains voice recognition result according to speech recognition is carried out;Determined and the voice recognition result according to default audio instructions storehouse
Corresponding audio instructions;Wherein, audio instructions storehouse is used to store each audio instructions;Judge and multiple human region data phases
Whether matched combinative movement data set matches with audio instructions.
Preset matching rule includes:When the M human region data included in multiple human region data are respectively with waiting to compare
Compared with combinative movement data set in include M individual sense action data matching when, determine multiple human region data with it is to be compared
Combinative movement data set meet matched rule;Wherein, the total quantity of multiple human region data is greater than or equal to M, to be compared
Combinative movement data set in the total quantity of multiple body-sensing action datas that includes be greater than or equal to M;Wherein, M is oneself more than 1
So number.
Combinative movement data set to be compared refers to default combinative movement data set, and human region data are acted with body-sensing
Data Matching refer to the comparative result of the human region data and the body-sensing action data be successfully, in the application of reality, meeting
The multiple body-sensings action that there is a situation where user acts not quite identical, example with the body-sensing corresponding to a combinative movement data set
Such as, for the corresponding multiple body-sensing actions of combinative movement data set, in the multiple body-sensings action for detecting user, exist
The body-sensing action of mistake or the body-sensing action omitted, if the action of multiple body-sensings and the combinative movement number of strict regulations user
The operation handled video data is just triggered when acting completely the same according to corresponding multiple body-sensings, can be caused not to user
Just, the experience of user's body feeling interaction is influenced.
Therefore, those skilled in the art can set preset matching rule according to specific application scenarios, for example, setting certain
Matching ratio threshold value, matching ratio refer in multiple human region data with included in a combinative movement data set it is multiple
The comparative result of body-sensing action data accounts for the quantity of the plurality of body-sensing action data for the quantity of successful human region data
Ratio, if matching ratio is not less than matching ratio threshold value, it is determined that comparative result meets preset matching rule.For example, if
Feel action data comprising five individuals in one combinative movement data set, determine four human region data with being somebody's turn to do by above-mentioned steps
The comparative result of wherein four human region data of five individual sense action datas is respectively success, and matching ratio is at this time
80%, then it is assumed that four human region data match with the combinative movement data set;It is furthermore it is also possible to dynamic for a combination
Make data concentration each body-sensing action data priority sequence number is set respectively, however, it is determined that multiple human region data respectively with group
The comparative result for closing the higher body-sensing action data of multiple priority sequence numbers that action data is concentrated is successfully, then it is assumed that the plurality of
Human region data match with the combinative movement data set.
Wherein, each body-sensing action data included in combinative movement data set to be compared is identified with time sequence number,
The M human region data then included in multiple human region data are respectively with including in combinative movement data set to be compared
The step of M individual sense action data matchings, specifically includes:
Whether appearance order of the M human region data for judging to include in multiple human region data in video data
Matched with the time sequence number mark of the M individual sense action datas included in combinative movement data set to be compared;If, it is determined that
The M human region data included in multiple human region data the M with being included in combinative movement data set to be compared respectively
Individual sense action data matching.
The multiple body-sensing action datas included in default combinative movement data set are respectively provided with time sequence number mark, then need
By the appearance order of each human region data compared with multiple body-sensing action datas, for example, a combinative movement
Comprising lifting the right hand and putting down two individual sense action datas of the right hand in data set, first lift and put down the right hand after the right hand and correspond to beat
Ground combinative movement data set, and lift the right hand after first putting down the right hand and correspond to the combinative movement data set that swishes a whip, it follows that different
The same multiple body-sensing action datas may be included in combinative movement data set, by the time for setting each body-sensing action data
Sequence number mark can distinguish each combinative movement data set, corresponding, in inquiry and the combination of multiple human region Data Matchings
During action data collection, not only it needs to be determined that multiple human region data comparison knot with multiple body-sensing action datas respectively
Fruit, it is also necessary to determine whether the order that multiple human region data occur in video data moves with combination to be compared
Make the time sequence number mark matching that data concentrate the multiple body-sensing action datas included, only when multiple human region data and one
The comparative result for multiple body-sensing action datas that a combinative movement data set includes meets preset matching rule, and multiple human bodies
The order that area data occurs in video data is multiple body-sensing action datas with being included in the combinative movement data set
During the mark matching of time sequence number, just determine that the plurality of human region data match with the combinative movement data set.
Include picture frame and voice data in video data, the method for the present embodiment is further according in video data
State the corresponding voice data of multiple images frame and carry out speech recognition, voice recognition result is obtained, according to default audio instructions
Storehouse determine with the corresponding audio instructions of the voice recognition result, multiple audio instructions are stored in audio instructions storehouse, in reality
In, keyword can be set audio character according to included in audio instructions, then be determined pair according to voice recognition result
During the audio instructions answered, judge whether the audio character that voice recognition result is included includes above-mentioned keyword, if so, then
It can determine the corresponding audio instructions of voice recognition result.For example, " beating dragon 18 palms " audio instructions are included in audio instructions storehouse,
Corresponding keyword is set for " drop dragon ", the corresponding voice recognition result of multiple images frame is " the drop dragon palm ", then can determine
The corresponding audio instructions of the voice recognition result are " beating dragon 18 palms ".
Wherein, audio instructions storehouse is further used for storing between each audio instructions and its corresponding combinative movement data set
Mapping relations, specifically in advance data set identification can be set for multiple combinative movement data sets respectively, and with multiple combinations
Multiple data splitting set identifiers and its corresponding audio instructions are associated preservation by the corresponding audio instructions of action data collection
In audio instructions storehouse, then the corresponding combinative movement data of voice data can be determined according to the voice recognition result of voice data
Collection.
Then judge and the combinative movement data set of multiple human region data match and the whether matched step of audio instructions
Suddenly specifically include:Determine with audio to refer to the combinative movement data set of multiple human region data match according to audio instructions storehouse
Whether order matches.Closed due to saving the mapping between audio instructions and its corresponding combinative movement data set in audio instructions storehouse
System, then can determine the combinative movement data set to match with voice data according to the corresponding voice data of multiple images frame,
Above-mentioned steps determine the combinative movement data set with multiple human region data match, then determine whether voice data pair
Whether the combinative movement data set answered is consistent with the combinative movement data set of multiple human region data match.
Step S240, if judging and the combinative movement data set of multiple human region data match and audio instructions
Match somebody with somebody, the combinative movement processing rule corresponding to the combinative movement data set with multiple human region data match is obtained, to working as
Rear L picture frame corresponding to preceding pending picture frame and/or currently pending picture frame is handled;Wherein, L is big
In 1 natural number.
According to above-mentioned steps, if judging with audio to refer to the combinative movement data set of multiple human region data match
Order matching, that is, the corresponding combinative movement data set of voice data combinative movement data corresponding with multiple human region data
Collection is consistent, then the corresponding combinative movement processing rule of the acquisition combinative movement data set, and combinative movement processing rule can be added
Add special effect processing rule or additive effect stick picture disposing is regular or display animation process is regular, for example,
In live scene, user, which makes, first lifts the body-sensing action that the right hand is put down after the right hand, determines corresponding combinative movement processing rule
For addition special effect processing rule, then special effect processing is added to video data;For another example, in somatic sensation television game, user makes the right side
The body-sensing action of hand impact tennis, determines corresponding combinative movement processing rule as display animation process rule, then to video counts
According to being added animation and display processing.The present invention is not limited the intension of combinative movement processing rule.
According to above-mentioned steps determine with after the combinative movement data set of multiple human region data match, further
According to the corresponding effect textures of combinative movement data set, video data is handled, and the video counts after display processing
According to.To video data processing be that the picture frame that video data includes is handled, according to combinative movement handle rule to regarding
Frequency is according to being handled, such as above-mentioned addition special effect processing rule, and the processing of special efficacy is added to video data, and at display
The video data of reason, for example, it is regular to video counts according to the corresponding combinative movement processing of combinative movement data set of beating dragon 18 palms
Corresponding each picture frame is handled in, and the video data of processing is shown so that in the video data of display
Special efficacy including beating dragon 18 palms.
Specifically, to the rear L picture frame corresponding to currently pending picture frame and/or currently pending picture frame
Handled;Wherein, L is the natural number more than 1.In actual application, exist according to currently pending picture frame
Determine the situation of corresponding combinative movement processing rule, then to currently pending picture frame and its corresponding rear L image
Frame is handled accordingly;Alternatively, determined jointly in the presence of several picture frames according to currently pending picture frame and its afterwards
The situation of corresponding combinative movement processing rule, then handled the currently pending corresponding rear L picture frame of picture frame.
According to the processing method of video data provided in this embodiment, mode of the which based on human body segmentation can be quick
And the body-sensing action of human body is captured exactly, using body-sensing action handling video data as driving, also, pass through god
There do not have through network detection human region and by the mode that human region is split from image for picture pick-up device to be any special
It is required that the video data that the camera independent of high accuracy, high depth is shot, suitable for any mobile end with camera
End, Anti infrared interference ability is strong, and cost is low.Further, since which passes through moving corresponding to the human region in multiple image
Combine and corresponding voice data triggers corresponding special efficacy, only in multiple image and the equal successful match of voice data
On the premise of just perform subsequent step, therefore, improve the accuracy of processing, reduce False Rate.Provide one kind and be based on people
The man-machine interaction mode using body-sensing action as driving of body region segmentation, can act according to body-sensing and voice data quickly determines
The processing rule handled video data, and the video data after display processing, improve the display effect of video data.
Fig. 4 shows the structure diagram of the processing unit of the video data of further embodiment according to the present invention, such as Fig. 4
Shown, which includes:
Split module 41, suitable for in video data multiple images frame carry out human body segmentation's processing, obtain with it is multiple
The corresponding multiple human region data of picture frame;
Comparison module 42, suitable for by multiple human region data respectively with included in default combinative movement data set it is more
Individual sense action data is compared;
Audio instructions determining module 43, suitable for when definite comparative result meet preset matching rule when, according to multiple figures
As the corresponding voice data of frame determines audio instructions;
Judgment module 44, suitable for judging combinative movement data set and audio instructions with multiple human region data match
Whether match;
Rule acquisition module 45 is handled, if suitable for judging the combinative movement data with multiple human region data match
Collection is matched with audio instructions, obtains the combinative movement corresponding to the combinative movement data set with multiple human region data match
Processing rule;
Processing module 46, is handled video data suitable for handling rule according to combinative movement;
Display module 47, suitable for the video data after display processing.
Alternatively, audio instructions determining module 43 is further adapted for:
Speech recognition is carried out for voice data corresponding with multiple images frame, obtains voice recognition result;
Determined and the corresponding audio instructions of the voice recognition result according to default audio instructions storehouse;Wherein, audio refers to
Storehouse is made to be used to store each audio instructions.
Alternatively, audio instructions storehouse be further used for storing each audio instructions and its corresponding combinative movement data set it
Between mapping relations;
Then judgment module 44 is further adapted for:
Combinative movement data set and audio instructions with multiple human region data match is determined according to audio instructions storehouse
Whether match.
Alternatively, default combinative movement data set includes:Multiple combinations being stored in default body-sensing maneuver library are moved
Make data set, and at least two body-sensing action datas are included in each combinative movement data set;
Then comparison module 42 is further adapted for:
By multiple human region data respectively with including in each combinative movement data set for being stored in body-sensing maneuver library
Multiple body-sensing action datas are compared.
Alternatively, preset matching rule includes:
When the M human region data included in multiple human region data respectively with combinative movement data set to be compared
In include M individual sense action data matchings when, determine that multiple human region data are accorded with combinative movement data set to be compared
Close matched rule;
Wherein, the total quantity of multiple human region data is greater than or equal to M, is included in combinative movement data set to be compared
The total quantitys of multiple body-sensing action datas be greater than or equal to M;Wherein, M is the natural number more than 1.
Alternatively, each body-sensing action data included in combinative movement data set to be compared has time sequence number mark
Know, then comparison module 42 is further adapted for:
Whether appearance order of the M human region data for judging to include in multiple human region data in video data
Matched with the time sequence number mark of the M individual sense action datas included in combinative movement data set to be compared;
If, it is determined that the M human region data included in multiple human region data respectively with combination to be compared
Action data concentrates the M individual sense action data matchings included.
Alternatively, segmentation module 41 is further adapted for:
According to appearance order of each picture frame in video data, what is included in real time in acquisition video data currently waits to locate
The picture frame of reason, carries out human body segmentation's processing to currently pending picture frame, obtains opposite with currently pending picture frame
The human region data answered.
Alternatively, comparison module 42 is further adapted for:
By the currently pending corresponding human region data of picture frame respectively with being included in each combinative movement data set
Multiple body-sensing action datas be compared;
Comparative result is determined as the first action data for successful body-sensing action data, by where the first action data
Combinative movement data set is determined as the first action data collection;
The corresponding human region data of rear N number of picture frame and the first action number corresponding to by currently pending picture frame
The each body-sensing action data included according to concentrating is compared;Wherein, N is the natural number more than or equal to 1.
Alternatively, processing rule acquisition module 45 is further adapted for:
Storehouse is handled according to default combinative movement, determines the combinative movement data set with multiple human region data match
Corresponding combinative movement processing rule;
Wherein, combinative movement processing storehouse is used to store the combinative movement processing rule corresponding to each combinative movement data set
Then.
Alternatively, combinative movement processing rule includes:According to the corresponding effect textures of combinative movement data set, to regarding
Frequency is according to being handled.
Alternatively, processing module 46 is further adapted for:
At the rear L picture frame corresponding to currently pending picture frame and/or currently pending picture frame
Reason;Wherein, L is the natural number more than 1.
Alternatively, video data includes:By the video data, and/or human-computer interaction class of image capture device captured in real-time
The video data included in game.
The concrete structure and operation principle of above-mentioned modules can refer to the description of corresponding steps in embodiment of the method, herein
Repeat no more.
The another embodiment of the application provides a kind of nonvolatile computer storage media, and the computer-readable storage medium is deposited
An at least executable instruction is contained, which can perform the video data in above-mentioned any means embodiment
Processing method.
Fig. 5 shows a kind of structure diagram of computing device according to embodiments of the present invention, the specific embodiment of the invention
The specific implementation to computing device does not limit.
As shown in figure 5, the computing device can include:Processor (processor) 502, communication interface
(Communications Interface) 504, memory (memory) 506 and communication bus 508.
Wherein:
Processor 502, communication interface 504 and memory 506 complete mutual communication by communication bus 508.
Communication interface 504, for communicating with the network element of miscellaneous equipment such as client or other servers etc..
Processor 502, for executive program 510, in the processing method embodiment that can specifically perform above-mentioned video data
Correlation step.
Specifically, program 510 can include program code, which includes computer-managed instruction.
Processor 502 is probably central processor CPU, or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or be arranged to implement the embodiment of the present invention one or more integrate electricity
Road.The one or more processors that computing device includes, can be same type of processors, such as one or more CPU;Also may be used
To be different types of processor, such as one or more CPU and one or more ASIC.
Memory 506, for storing program 510.Memory 506 may include high-speed RAM memory, it is also possible to further include
Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.
Program 510 specifically can be used for so that processor 502 performs following operation:For the multiple images in video data
Frame carries out human body segmentation's processing, obtains and the corresponding multiple human region data of multiple images frame;By multiple human region numbers
According to respectively compared with the multiple body-sensing action datas included in default combinative movement data set;When definite comparative result accords with
When closing preset matching rule, determine audio instructions according to the corresponding voice data of multiple images frame, judge and multiple human bodies
Whether the combinative movement data set that area data matches matches with audio instructions;If so, obtain and multiple human region data
Combinative movement processing rule corresponding to the combinative movement data set to match, rule is handled to video data according to combinative movement
Handled, the video data after display processing.
In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour
Make:Speech recognition is carried out for voice data corresponding with multiple images frame, obtains voice recognition result;According to default sound
Frequency instruction database determines and the corresponding audio instructions of the voice recognition result;Wherein, audio instructions storehouse is used to store each audio
Instruction.
In a kind of optional mode, audio instructions storehouse is further used for storing each audio instructions and its corresponding combination
Mapping relations between action data collection;Program 510 can specifically be further used for so that processor 502 performs following operation:
Determine whether matched with audio instructions with the combinative movement data set of multiple human region data match according to audio instructions storehouse.
In a kind of optional mode, default combinative movement data set includes:It is multiple to be stored in default body-sensing action
Combinative movement data set in storehouse, and include at least two body-sensing action datas in each combinative movement data set;Then program 510
It can specifically be further used for so that processor 502 performs following operation:Multiple human region data are acted with body-sensing respectively
The multiple body-sensing action datas included in each combinative movement data set stored in storehouse are compared.
In a kind of optional mode, preset matching rule includes:When the M human body included in multiple human region data
When area data is matched with the M individual sense action datas included in combinative movement data set to be compared respectively, determine more personal
Body region data meet matched rule with combinative movement data set to be compared;Wherein, the total quantity of multiple human region data
More than or equal to M, the total quantity of the multiple body-sensing action datas included in combinative movement data set to be compared is greater than or equal to
M;Wherein, M is the natural number more than 1.
In a kind of optional mode, each body-sensing action data included in combinative movement data set to be compared has
Time sequence number identifies, then program 510 can specifically be further used for so that processor 502 performs following operation:Judge more personal
The M human region data included in body region data in video data occur order whether with combinative movement to be compared
The time sequence number mark matching of the M individual sense action datas included in data set;If, it is determined that in multiple human region data
Comprising M human region data respectively with included in combinative movement data set to be compared M individual sense action data match.
In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour
Make:According to appearance order of each picture frame in video data, included in real time in acquisition video data currently pending
Picture frame, carries out human body segmentation's processing to currently pending picture frame, obtains corresponding with currently pending picture frame
Human region data.
In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour
Make:By the currently pending corresponding human region data of picture frame respectively with included in each combinative movement data set it is multiple
Body-sensing action data is compared;Comparative result is determined as the first action data for successful body-sensing action data, by first
Combinative movement data set where action data is determined as the first action data collection;By corresponding to currently pending picture frame
The corresponding human region data of N number of picture frame concentrate each body-sensing action data included to be compared with the first action data afterwards
Compared with;Wherein, N is the natural number more than or equal to 1.
In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour
Make:Storehouse is handled according to default combinative movement, it is right with the combinative movement data set of multiple human region data match institute to determine
The combinative movement processing rule answered;Wherein, combinative movement processing storehouse is used to store the group corresponding to each combinative movement data set
Conjunction action processing rule.
In a kind of optional mode, combinative movement processing rule includes:According to corresponding with combinative movement data set
Effect textures, handle video data.
In a kind of optional mode, program 510 can specifically be further used for so that processor 502 performs following behaviour
Make:Rear L picture frame corresponding to currently pending picture frame and/or currently pending picture frame is handled;Its
In, L is the natural number more than 1.
In a kind of optional mode, video data includes:By image capture device captured in real-time video data and/
Or the video data included in the game of human-computer interaction class.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together with teaching based on this.As described above, required by constructing this kind of system
Structure be obvious.In addition, the present invention is not also directed to any certain programmed language.It should be understood that it can utilize various
Programming language realizes the content of invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention
Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect,
Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor
The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself
Separate embodiments all as the present invention.
Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment
Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment
Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any
Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and attached drawing) and so to appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power
Profit requires, summary and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation
Replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
One of meaning mode can use in any combination.
The all parts embodiment of the present invention can be with hardware realization, or to be run on one or more processor
Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice
Microprocessor or digital signal processor (DSP) realize the processing computing device of video data according to embodiments of the present invention
In some or all components some or all functions.The present invention is also implemented as being used to perform as described herein
The some or all equipment or program of device (for example, computer program and computer program product) of method.So
Realization the present invention program can store on a computer-readable medium, or can have one or more signal shape
Formula.Such signal can be downloaded from internet website and obtained, and either be provided or with any other shape on carrier signal
Formula provides.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real
It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
Claims (10)
1. a kind of processing method of video data, it includes:
Human body segmentation's processing is carried out for the multiple images frame in the video data, is obtained corresponding with described multiple images frame
Multiple human region data;
By the multiple human region data multiple body-sensing action datas with being included in default combinative movement data set respectively
It is compared;
When definite comparative result meets preset matching rule, determined according to the corresponding voice data of described multiple images frame
Audio instructions, judge with the combinative movement data set of the multiple human region data match and the audio instructions whether
Match somebody with somebody;
If so, the combinative movement obtained corresponding to the combinative movement data set with the multiple human region data match is handled
Rule, handles rule according to the combinative movement and the video data is handled, the video data after display processing.
2. according to the method described in claim 1, wherein, the basis and the corresponding voice data of described multiple images frame are true
The step of determining audio instructions specifically includes:
Speech recognition is carried out for voice data corresponding with described multiple images frame, obtains voice recognition result;
Determined and the corresponding audio instructions of the voice recognition result according to default audio instructions storehouse;Wherein, the audio refers to
Storehouse is made to be used to store each audio instructions.
3. according to the method described in claim 2, wherein, the audio instructions storehouse be further used for storing each audio instructions and
Mapping relations between its corresponding combinative movement data set;
Then whether the judgement and the combinative movement data set of the multiple human region data match and the audio instructions
The step of matching, specifically includes:
According to the audio instructions storehouse determine with the combinative movement data set of the multiple human region data match with it is described
Whether audio instructions match.
4. according to any methods of claim 1-3, wherein, the default combinative movement data set includes:It is multiple to deposit
The combinative movement data set in default body-sensing maneuver library is stored up, and at least two body-sensings are included in each combinative movement data set
Action data;
It is then described to move multiple body-sensings of the multiple human region data respectively with being included in default combinative movement data set
The step of being compared as data specifically includes:
By the multiple human region data respectively with being wrapped in each combinative movement data set for being stored in the body-sensing maneuver library
The multiple body-sensing action datas contained are compared.
5. according to any methods of claim 1-4, wherein, the preset matching rule includes:
When the M human region data included in the multiple human region data respectively with combinative movement data set to be compared
In include M individual sense action data matching when, determine the multiple human region data and the combinative movement to be compared
Data set meets the matched rule;
Wherein, the total quantity of the multiple human region data is greater than or equal to M, in the combinative movement data set to be compared
Comprising the total quantitys of multiple body-sensing action datas be greater than or equal to M;Wherein, M is the natural number more than 1.
6. according to the method described in claim 5, wherein, each body-sensing included in the combinative movement data set to be compared
Action data with time sequence number identify, then the M human region data included in the multiple human region data respectively with
The step of M individual sense action data matchings included in combinative movement data set to be compared, specifically includes:
Appearance order of the M human region data for judging to include in the multiple human region data in the video data
Whether matched with the time sequence number mark of the M individual sense action datas included in combinative movement data set to be compared;
If, it is determined that the M human region data included in the multiple human region data respectively with combination to be compared
Action data concentrates the M individual sense action data matchings included.
7. according to any methods of claim 1-6, wherein, the multiple images frame being directed in the video data into
Pedestrian's body dividing processing, the step of obtaining multiple human region data corresponding with described multiple images frame, specifically include:
According to appearance order of each picture frame in the video data, obtain in real time included in the video data it is current
Pending picture frame, human body segmentation's processing is carried out to the currently pending picture frame, obtain with it is described currently pending
The corresponding human region data of picture frame.
8. a kind of processing unit of video data, it includes:
Split module, suitable for in the video data multiple images frame carry out human body segmentation's processing, obtain with it is described more
A corresponding multiple human region data of picture frame;
Comparison module, suitable for by the multiple human region data respectively with included in default combinative movement data set it is multiple
Body-sensing action data is compared;
Audio instructions determining module, suitable for when definite comparative result meet preset matching rule when, according to described multiple images
The corresponding voice data of frame determines audio instructions;
Judgment module, suitable for judging that the combinative movement data set with the multiple human region data match refers to the audio
Whether order matches;
Rule acquisition module is handled, if suitable for judging the combinative movement data set with the multiple human region data match
Matched with the audio instructions, obtain the group corresponding to the combinative movement data set with the multiple human region data match
Conjunction action processing rule;
Processing module, is handled the video data suitable for handling rule according to the combinative movement;
Display module, suitable for the video data after display processing.
9. a kind of computing device, including:Processor, memory, communication interface and communication bus, the processor, the storage
Device and the communication interface complete mutual communication by the communication bus;
The memory is used to store an at least executable instruction, and the executable instruction makes the processor perform right such as will
Ask the corresponding operation of processing method of the video data any one of 1-7.
10. a kind of computer-readable storage medium, an at least executable instruction, the executable instruction are stored with the storage medium
Make the corresponding operation of processing method of video data of the processor execution as any one of claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711395657.2A CN107995442A (en) | 2017-12-21 | 2017-12-21 | Processing method, device and the computing device of video data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711395657.2A CN107995442A (en) | 2017-12-21 | 2017-12-21 | Processing method, device and the computing device of video data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107995442A true CN107995442A (en) | 2018-05-04 |
Family
ID=62038171
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711395657.2A Pending CN107995442A (en) | 2017-12-21 | 2017-12-21 | Processing method, device and the computing device of video data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107995442A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109462776A (en) * | 2018-11-29 | 2019-03-12 | 北京字节跳动网络技术有限公司 | A kind of special video effect adding method, device, terminal device and storage medium |
WO2020082575A1 (en) * | 2018-10-26 | 2020-04-30 | 平安科技(深圳)有限公司 | Music generation method and device |
CN111107279A (en) * | 2018-10-26 | 2020-05-05 | 北京微播视界科技有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
WO2020200081A1 (en) * | 2019-03-29 | 2020-10-08 | 广州虎牙信息科技有限公司 | Live streaming control method and apparatus, live streaming device, and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105100672A (en) * | 2014-05-09 | 2015-11-25 | 三星电子株式会社 | Display apparatus and method for performing videotelephony using the same |
CN105930072A (en) * | 2015-02-28 | 2016-09-07 | 三星电子株式会社 | Electronic Device And Control Method Thereof |
CN106204649A (en) * | 2016-07-05 | 2016-12-07 | 西安电子科技大学 | A kind of method for tracking target based on TLD algorithm |
KR101721231B1 (en) * | 2016-02-18 | 2017-03-30 | (주)다울디엔에스 | 4D media manufacture methods of MPEG-V standard base that use media platform |
CN106650668A (en) * | 2016-12-27 | 2017-05-10 | 上海葡萄纬度科技有限公司 | Method and system for detecting movable target object in real time |
-
2017
- 2017-12-21 CN CN201711395657.2A patent/CN107995442A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105100672A (en) * | 2014-05-09 | 2015-11-25 | 三星电子株式会社 | Display apparatus and method for performing videotelephony using the same |
CN105930072A (en) * | 2015-02-28 | 2016-09-07 | 三星电子株式会社 | Electronic Device And Control Method Thereof |
KR101721231B1 (en) * | 2016-02-18 | 2017-03-30 | (주)다울디엔에스 | 4D media manufacture methods of MPEG-V standard base that use media platform |
CN106204649A (en) * | 2016-07-05 | 2016-12-07 | 西安电子科技大学 | A kind of method for tracking target based on TLD algorithm |
CN106650668A (en) * | 2016-12-27 | 2017-05-10 | 上海葡萄纬度科技有限公司 | Method and system for detecting movable target object in real time |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020082575A1 (en) * | 2018-10-26 | 2020-04-30 | 平安科技(深圳)有限公司 | Music generation method and device |
CN111107279A (en) * | 2018-10-26 | 2020-05-05 | 北京微播视界科技有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
CN111107279B (en) * | 2018-10-26 | 2021-06-29 | 北京微播视界科技有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
CN109462776A (en) * | 2018-11-29 | 2019-03-12 | 北京字节跳动网络技术有限公司 | A kind of special video effect adding method, device, terminal device and storage medium |
WO2020200081A1 (en) * | 2019-03-29 | 2020-10-08 | 广州虎牙信息科技有限公司 | Live streaming control method and apparatus, live streaming device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Molchanov et al. | Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network | |
Chaplot et al. | Gated-attention architectures for task-oriented language grounding | |
US20230264109A1 (en) | System and method for toy recognition | |
CN112750140B (en) | Information mining-based disguised target image segmentation method | |
Yan et al. | Mirrornet: Bio-inspired camouflaged object segmentation | |
Yang et al. | Recurrent filter learning for visual tracking | |
CN107995442A (en) | Processing method, device and the computing device of video data | |
El-Nouby et al. | Tell, draw, and repeat: Generating and modifying images based on continual linguistic instruction | |
CN108090561B (en) | Storage medium, electronic device, and method and device for executing game operation | |
CN107204012A (en) | Reduce the power consumption of time-of-flight depth imaging | |
CN110569795A (en) | Image identification method and device and related equipment | |
WO2018089158A1 (en) | Natural language object tracking | |
CN110689093B (en) | Image target fine classification method under complex scene | |
CN110532883A (en) | On-line tracking is improved using off-line tracking algorithm | |
CN112527113A (en) | Method and apparatus for training gesture recognition and gesture recognition network, medium, and device | |
CN109325408A (en) | A kind of gesture judging method and storage medium | |
Vieriu et al. | On HMM static hand gesture recognition | |
CN111291612A (en) | Pedestrian re-identification method and device based on multi-person multi-camera tracking | |
Wake et al. | Verbal focus-of-attention system for learning-from-observation | |
Kirkland et al. | Perception understanding action: adding understanding to the perception action cycle with spiking segmentation | |
Liu et al. | A deep Q-learning network based active object detection model with a novel training algorithm for service robots | |
Ruiz-Santaquiteria et al. | Improving handgun detection through a combination of visual features and body pose-based data | |
CN102436301B (en) | Human-machine interaction method and system based on reference region and time domain information | |
CN108121963A (en) | Processing method, device and the computing device of video data | |
CN108509876A (en) | For the object detecting method of video, device, equipment, storage medium and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180504 |
|
RJ01 | Rejection of invention patent application after publication |