CN107340852A - Gestural control method, device and terminal device - Google Patents
Gestural control method, device and terminal device Download PDFInfo
- Publication number
- CN107340852A CN107340852A CN201610694510.2A CN201610694510A CN107340852A CN 107340852 A CN107340852 A CN 107340852A CN 201610694510 A CN201610694510 A CN 201610694510A CN 107340852 A CN107340852 A CN 107340852A
- Authority
- CN
- China
- Prior art keywords
- business object
- video image
- gesture
- hand
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The embodiment of the present invention provides a kind of gestural control method, device and terminal device.Methods described includes:Gestures detection is carried out to currently playing video image;When detecting that gesture matches with prearranged gesture, determine that business object to be shown shows position in the video image;The business object is drawn using computer graphics mode in the position that shows.Using the embodiment of the present invention, the system resource of Internet resources and/or client can be saved, and add interest for video image, will not also bother user normally watches video simultaneously, so as to reduce user to the dislike of the business object showed in video image, and the notice of spectators can be attracted to a certain extent, improve the influence power of business object.
Description
Technical field
The present invention relates to the information processing technology, more particularly to a kind of gestural control method, device and terminal device.
Background technology
With the development of Internet technology, people use internet viewing video, thus, internet video more and more
Business opportunity is provided for many new business.Because internet video can turn into important service traffics entrance, thus be considered as
It is the high-quality resource of advertisement implantation.
By way of implantation, the wide of duration mainly is fixed in some time insertion of video playback for existing video ads
Accuse, or advertisement is placed in the region of video playback and its neighboring area fixed position.
But, on the one hand, this video ads mode not only takes Internet resources, also takes the system resource of client;
On the other hand, this video ads mode often bothers the normal video viewing experience of spectators, causes spectators to dislike, it is impossible to reach
The advertising results of anticipation.
The content of the invention
It is an object of the present invention to provide a kind of scheme of gesture control.
A kind of one side according to embodiments of the present invention, there is provided gestural control method.Methods described includes, to currently playing
Video image carry out gestures detection;When detecting that gesture matches with prearranged gesture, determine business object to be shown in institute
State in video image and show position;The business object is drawn using computer graphics mode in the position that shows.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, the determination is to be shown
Business object shows position in the video image, including:Extract human hand candidate corresponding with the gesture detected
The characteristic point of hand in region;According to the characteristic point of the hand, it is determined that corresponding to be shown with the gesture detected
Business object shows position in the video image.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, it is described according to the hand
Characteristic point, it is determined that business object to be shown corresponding with the gesture detected shows position in the video image
Put, including:According to the characteristic point of the hand and the type of the business object to be shown, it is determined that with the hand that detects
The corresponding business object to be shown of gesture shows position in the video image.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, it is described according to the hand
Characteristic point and the business object to be shown type, it is determined that corresponding with the gesture detected described to be shown
Business object shows position in the video image, including:According to the characteristic point of the hand and the industry to be shown
The type of business object, it is determined that the business object to be shown corresponding with the gesture detected is in the video image
Multiple show position;From it is the multiple show select at least one to show position in position.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, the determination is to be shown
Business object shows position in the video image, including:From the gesture prestored and the corresponding relation for showing position
In, obtain the prearranged gesture corresponding to target show position as the corresponding business to be shown of the gesture with detecting
Object shows position in the video image.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, the business object is bag
Special efficacy containing semantic information, the video image are live class video image.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, the business object includes
The special efficacy of following at least one form comprising advertising message:Two-dimentional paster special efficacy, three-dimensional special efficacy, particle effect.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, the position that shows includes
At least one of:Body in video image beyond the hair zones of personage, forehead region, cheek region, chin area, head
The area in setting range in background area, video image in body region, video image centered on the region where hand
Region set in advance in domain, video image.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, the class of the business object
Type includes at least one of:Forehead patch type, cheek patch type, chin patch type, virtual hat-type, virtual clothes
Fill type, virtual dressing type, virtual headwear type, virtual hair decorations type, virtual jewellery type.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, the gesture includes following
At least one:Wave, scissors hand, clench fist, ask hand, applause, palm to open, palm closure, perpendicular thumb, rifle posture of waving, pendulum V
Word hand and pendulum OK hands.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, it is described to currently playing
Video image carries out gestures detection, including:The video image is detected using the first convolutional network of training in advance, described in acquisition
The fisrt feature information of video image and the information of forecasting of human hand candidate region, the fisrt feature information are believed including hand-characteristic
Breath;The second convolutional network mould using the information of forecasting of the fisrt feature information and the human hand candidate region as training in advance
The second feature information of type, and the video figure is carried out according to the second feature information using the second convolution network model
The gestures detection of picture, obtain the gestures detection result of the video image;Wherein, the second convolution network model and described
One convolution network model sharing feature extract layer.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, it is described to currently playing
Before video image carries out gestures detection, methods described, which also includes methods described, also to be included:According to containing human hand markup information
Sample image trains the first convolution network model, and the human hand for obtaining the first convolution network model for the sample image is waited
The information of forecasting of favored area;Correct the information of forecasting of the human hand candidate region;According to the revised human hand candidate region
Information of forecasting and the sample image train the second convolution network model, wherein, the second convolution network model and described
First convolution network model sharing feature extract layer, and keep the feature in the second convolution network model training process
The parameter constant of extract layer.
Alternatively, with reference to any gestural control method provided in an embodiment of the present invention, wherein, the determination is to be shown
Business object shows position in the video image, including:By the gesture and training in advance, for from video figure
As the 3rd convolutional network model for showing position of detection business object, it is determined that corresponding to be shown with the gesture detected
Business object show position.
A kind of another aspect according to embodiments of the present invention, there is provided gesture control device.Described device includes:Gestures detection
Module, for carrying out gestures detection to currently playing video image;Show position determination module, for detect gesture with
When prearranged gesture matches, determine that business object to be shown shows position in the video image;Business object draws mould
Block, for drawing the business object using computer graphics mode in the position that shows.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, it is described to show position determination
Module includes:Feature point extraction unit, hand in the corresponding human hand candidate region of the gesture for extracting with detecting
Characteristic point;Show position determination unit, for the characteristic point according to the hand, it is determined that corresponding with the gesture detected
Business object to be shown shows position in the video image.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, it is described to show position determination
Unit, for the characteristic point according to the hand and the type of the business object to be shown, it is determined that described with detecting
The corresponding business object to be shown of gesture shows position in the video image.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, it is described to show position determination
Unit, for the characteristic point according to the hand and the type of the business object to be shown, it is determined that described with detecting
Multiple in the video image of the corresponding business object to be shown of gesture show position;Show position from the multiple
Put middle selection and at least one show position.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, it is described to show position determination
Module, for when it is determined that the gesture detected matches with corresponding prearranged gesture, it is determined that with the prearranged gesture phase
Treat as the gesture with detecting is corresponding the position that shows of the business object to be shown answered in the video image
The business object of display shows position in the video image.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, it is described to show position determination
Module, for from the gesture prestored and the corresponding relation for showing position, obtaining target exhibition corresponding to the prearranged gesture
Existing position shows position as the corresponding business object to be shown of the gesture with detecting in the video image.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, the business object is bag
Special efficacy containing semantic information, the video image are live class video image.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, the business object includes
The special efficacy of following at least one form comprising advertising message:Two-dimentional paster special efficacy, three-dimensional special efficacy, particle effect.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, the position that shows includes
At least one of:Body in video image beyond the hair zones of personage, forehead region, cheek region, chin area, head
The area in setting range in background area, video image in body region, video image centered on the region where hand
Region set in advance in domain, video image.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, the class of the business object
Type includes at least one of:Forehead patch type, cheek patch type, chin patch type, virtual hat-type, virtual clothes
Fill type, virtual dressing type, virtual headwear type, virtual hair decorations type, virtual jewellery type.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, the gesture includes following
At least one:Wave, scissors hand, clench fist, ask hand, applause, palm to open, palm closure, perpendicular thumb, rifle posture of waving, pendulum V
Word hand and pendulum OK hands.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, the gestures detection mould
Block, for detecting the video image using the first convolutional network of training in advance, obtain the fisrt feature of the video image
Information and the information of forecasting of human hand candidate region, the fisrt feature information include hand-characteristic information;By the fisrt feature
Second feature information of the information of forecasting of information and the human hand candidate region as the second convolution network model of training in advance,
And the gestures detection of the video image is carried out according to the second feature information using the second convolution network model, obtain
The gestures detection result of the video image;Wherein, the second convolution network model and the first convolution network model are total to
Enjoy feature extraction layer.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, described device also includes:
Human hand area determination module, for training the first convolution network model according to the sample image containing human hand markup information, obtain
Information of forecasting of the first convolution network model for the human hand candidate region of the sample image;Correcting module, for repairing
The information of forecasting of just described human hand candidate region;Convolution model training module, for according to the revised human hand candidate regions
The information of forecasting in domain and the sample image train the second convolution network model, wherein, the second convolution network model and institute
The first convolution network model sharing feature extract layer is stated, and the spy is kept in the second convolution network model training process
Levy the parameter constant of extract layer.
Alternatively, with reference to any gesture control device provided in an embodiment of the present invention, wherein, it is described to show position determination
Module, for by the gesture and training in advance, for from the show position the 3rd of video images detection business object
Convolutional network model, it is determined that business object to be shown corresponding with the gesture detected shows position.
A kind of another aspect according to embodiments of the present invention, there is provided terminal device.The terminal device includes:Processor,
Memory, communication interface and communication bus, the processor, the memory and the communication interface pass through the communication bus
Complete mutual communication;The memory is used to deposit an at least executable instruction, and the executable instruction makes the processing
Device is performed and operated corresponding to gestural control method as provided above.
Another aspect according to embodiments of the present invention, additionally provides a kind of computer-readable recording medium, the computer
Readable storage medium storing program for executing is stored with:For carrying out the executable instruction of gestures detection to currently playing video image;For examining
When measuring gesture and being matched with prearranged gesture, determine that business object to be shown shows holding for position in the video image
Row instruction;For in the executable instruction for showing position and using computer graphics mode to draw the business object.
Gestural control method, device and the terminal device provided according to embodiments of the present invention, by being regarded to currently playing
Frequency image carries out human hand and gestures detection, and determine it is corresponding with the gesture detected show position, and then in video image
Business object to be shown is drawn in the above-mentioned position that shows using computer graphics mode, so when business object is used to show advertisement
When, compared with traditional video ads mode, on the one hand, the business object is combined with video playback, is passed without by network
The defeated additional ad video data unrelated with video, has saved the system resource of Internet resources and/or client;On the other hand,
Business object is combined closely with the gesture in video image, has both remained the main shape of video main body (such as main broadcaster) in video image
As and action, interest is added again for video image, while will not also bother user and normally watch video, so as to reduce
User can attract the notice of spectators to a certain extent to the dislike of the business object showed in video image, carry
The influence power of high business object.
Brief description of the drawings
Fig. 1 is a kind of flow chart for the gestural control method for showing according to embodiments of the present invention one;
Fig. 2 be show according to embodiments of the present invention two a kind of first convolution network model and the second convolution network model
The flow chart of acquisition methods;
Fig. 3 is a kind of flow chart for the gestural control method for showing according to embodiments of the present invention three;
Fig. 4 is a kind of flow chart for the gestural control method for showing according to embodiments of the present invention four;
Fig. 5 is a kind of structured flowchart for the gesture control device for showing according to embodiments of the present invention five;
Fig. 6 is a kind of structured flowchart for the gesture control device for showing according to embodiments of the present invention six;
Fig. 7 is a kind of structural representation for the terminal device for showing according to embodiments of the present invention seven.
Embodiment
The exemplary embodiment of the present invention is described in detail below in conjunction with the accompanying drawings.
Embodiment one
Fig. 1 is the flow chart for the gestural control method for showing according to embodiments of the present invention one.By being filled including gesture control
The computer system put performs methods described.
Reference picture 1, in step S110, gestures detection is carried out to currently playing video image.
Wherein, video image can be just live live video image or recorded the video of completion
In video image, can also be video image just in recording process etc..Gesture can include wave, scissors hand, clench fist,
Hold in the palm hand, the closure of palm or opening etc..
In force, by taking net cast as an example, at present, net cast platform is including multiple, as the live platform of Chinese prickly ash, YY are straight
Platform etc. is broadcast, each live platform includes multiple live rooms, and can include at least one main broadcaster in each live room,
Main broadcaster can be by bean vermicelli from the camera of terminal device (such as mobile phone, tablet personal computer or PC etc.) to the live room where it
Live video image.Main body in above-mentioned video image is usually a high priest (i.e. main broadcaster) and simple background, main broadcaster
Usually region shared in video image is larger.When needs Insert service object (such as advertisement) during net cast
When, can obtain current video it is live during video image as pending video image.
In addition, video image can also be the video image in the short-sighted frequency for recorded completion, for such a situation, user
The short-sighted frequency can be played using its terminal device, during broadcasting, terminal device can obtain each frame video image
As pending video image.
In addition, in the case of video image is the video image just in recording process, during recording, terminal
Equipment can obtain each frame video image of recording as pending video image.
Further, play the terminal device of video image or terminal device that main broadcaster uses in be provided with to video figure
The mechanism of the gestures detection in human hand candidate region as where carrying out human hand detection and human hand, can be to working as by above-mentioned mechanism
Each frame video image (i.e. above-mentioned pending video image) of preceding broadcasting is detected, and is determined in pending video image
Whether the hand information of main broadcaster is included, if including obtaining the video image, if do not included, can abandon the video
Image does not do any processing to the video image, and obtains next frame video image and continue above-mentioned processing.Wherein, hand
Portion's information may include but be not limited to finger state and position, the state of palm and position, hand close up and opened.
Video image for including hand information (human hand in other words), where can detecting human hand from the video image
Human hand candidate region, wherein, human hand candidate region can be the minimum square that whole human hand candidate region can be covered in video image
Shape region or the region of other shapes (such as ellipse).A kind of feasible processing procedure, which can be that terminal device obtains, to be worked as
The preceding frame video image played, can be from the video figure by mechanism set in advance as pending video image
The image for including human hand candidate region is intercepted out as in, it is then possible to by mechanism set in advance to human hand candidate region
Image is analyzed and feature extraction, obtains the characteristic of various pieces in human hand candidate region (including finger, palm etc.),
By the analysis to this feature data, determine that the gesture in video image in human hand candidate region belongs to and wave, scissors hand, hold
It is any in the gestures such as fist, support hand, the closure of palm or opening.
In addition, in order to subsequently more quickly and accurately determine that business object to be shown shows position in video image,
The position that shows of business object can be limited by hand position, wherein, hand position can be above-mentioned human hand candidate
The center in region or multiple marginal positions of the rectangular area of human hand candidate region or elliptical region etc. determine
Coordinate position etc..For example, behind region where hand can be determined in video image, the human hand candidate region is divided
Analysis calculates, and determines the center of the human hand candidate region as hand position, specifically such as, human hand candidate region is rectangle region
Domain, then the catercorner length of the rectangular area can be calculated, cornerwise centre position can be chosen as hand position, from
And the hand position determined based on human hand candidate region can be obtained.Wherein, except the centre bit of human hand candidate region can be used
Put outside as hand position, multiple marginal positions of the rectangular area of human hand candidate region or elliptical region etc. can also be passed through
As hand position, specific processing may refer to the above-mentioned content using center as hand position, will not be repeated here.
In step S120, when detecting that gesture matches with prearranged gesture, determine business object to be shown in video figure
Show position as in.
Wherein, business object to be shown is the object created according to certain business demand, such as advertisement etc..Show
Position can be the center of designated area in video image, or can be multiple marginal positions in above-mentioned designated area
Coordinate position etc..
In force, the characteristic of a variety of different gestures can be prestored, and different gestures is carried out corresponding
Mark, to distinguish the implication representated by each gesture.Can be from pending video figure by above-mentioned steps S110 processing
Detect the gesture in human hand and human hand candidate region and the human hand candidate region where human hand as in, will can detect
The gesture of hand is compared with each gesture prestored respectively, if wrapped in a variety of different gestures prestored
The gesture identical gesture of hand is included and detects, then the gesture that can determine to detect matches with corresponding prearranged gesture.
In order to improve the degree of accuracy of matching, above-mentioned matching result can be determined by way of calculating, for example, can set
Matching algorithm calculate any two gesture between matching degree, it is, for example, possible to use detect gesture characteristic and in advance
The characteristic of any gesture of storage carries out matching primitives, obtains matching degree numerical value between the two.Through the above way
The matching degree numerical value that the gesture detected is calculated respectively between each gesture for prestoring, from obtained matching degree
The matching degree numerical value of maximum is chosen in numerical value, can be true if the maximum matching degree numerical value exceedes predetermined matching threshold
The gesture of hand of the gesture prestored corresponding to fixed maximum matching degree numerical value with detecting matches.If the maximum
Matching degree numerical value is not less than predetermined matching threshold, then it fails to match, that is, the gesture of the hand detected is not prearranged gesture, this
When, above-mentioned steps S110 processing can be continued executing with.
Further, when it is determined that the gesture detected matches with corresponding prearranged gesture, can first determine to match
Hand gesture representated by implication, can it is set in advance it is multiple show choose in position it is related to its implication or accordingly
The position that shows show position in video image as business object to be shown.In addition, for above-mentioned steps S110's
The situation of the hand position determined in processing, can also it is set in advance it is multiple show choose in position with its implication and
Hand position is related or show position shows position as business object to be shown in video image accordingly.For example,
By taking net cast as an example, when detecting that main broadcaster hold in the palm the gesture of hand, the upper area of human hand candidate region can be chosen
To be associated therewith or show position accordingly.In another example when detecting the gesture that main broadcaster waves, can by palm area or its
Background area is chosen for associated therewith or shows position accordingly.
In step S130, business object is drawn using computer graphics mode showing position.
For example, by taking net cast as an example, when detecting that main broadcaster hold in the palm the gesture of hand, can in video image main broadcaster
Human hand candidate region in palm upper area in corresponding business object drawn (as with pre- using computer graphics mode
Determine display advertising of commodity sign etc.), if bean vermicelli is interested in the business object, where can clicking on the business object
Region, the terminal device of bean vermicelli can obtain network linking corresponding to the business object, and be entered and this by the network linking
The related page of business object, bean vermicelli can obtain the resource related to the business object in the page.
Wherein, drawing for business object the mode such as can be drawn or is rendered by appropriate graph image and realize, including
But it is not limited to:Drawn etc. based on OpenGL, OpenCL or Unity graph drawing engine.OpenGL and OpenCL are defined
One across programming language, the professional graphic package interface of cross-platform DLL specification, it is unrelated with hardware, can facilitate
Ground carries out the drafting of 2D or 3D graph images.By OpenGL, OpenCL or Unity, it can not only realize that 2D effects such as 2D is pasted
The drafting of paper, the drafting of 3D special efficacys and the drafting of particle effect etc. can also be realized.
Gestural control method provided in an embodiment of the present invention, by carrying out human hand and gesture to currently playing video image
Detection, and determine it is corresponding with the gesture detected show position, and then in the above-mentioned position that shows of video image using calculating
Machine plotting mode draws business object to be shown, so when business object is used to show advertisement, with traditional video ads
Mode is compared, on the one hand, the business object is combined with video playback, without passing through extra wide unrelated with video of network transmission
Video data is accused, has saved the system resource of Internet resources and/or client;On the other hand, in business object and video image
Gesture combine closely, both remained main image and the action of video main body (such as main broadcaster) in video image, be video figure again
As adding interest, while it will not also bother user and normally watch video, so as to reduce user to being opened up in video image
The dislike of existing business object, and the notice of spectators can be attracted to a certain extent, improve the influence power of business object.
Embodiment two
Fig. 2 is the acquisition of the first convolution network model and the second convolution network model that show according to embodiments of the present invention two
The flow chart of method.
The step S110 processing that gestures detection is carried out to currently playing video image can adopt in above-described embodiment one
Realized with corresponding feature extraction algorithm or using neural network model such as convolutional network model etc..With convolution in the present embodiment
Exemplified by network model, to the human hand candidate region where video image progress human hand and gestures detection, therefore, can be with training in advance
For the first convolution network model of human hand candidate region in detection image and for from the of human hand candidate region detection gesture
Two convolutional network models.Wherein, gesture includes:Wave, scissors hand, clench fist, ask hand, applause, palm to open, be palm closure, perpendicular
Thumb, rifle posture of waving, pendulum V words hand and pendulum OK hands, business object is to include the special efficacy of semantic information, the business object bag
Include the special efficacy of following at least one form comprising advertising message:Two-dimentional paster special efficacy, three-dimensional special efficacy, particle effect.
The gestural control method of the present embodiment can be by arbitrarily having the equipment of data sampling and processing and transfer function to hold
OK, including but not limited to mobile terminal and PC etc., the present invention implement not limit this.
Reference picture 2, in step S210, the first convolutional network mould is trained according to the sample image containing human hand markup information
Type, obtain information of forecasting of the first convolution network model for the human hand candidate region of sample image.
Wherein, the sample image containing human hand markup information can be derived from the video image of image capture device, by
Image forms one by one, or single a two field picture or piece image, can also derive from other equipment, so
Operation is labeled in sample image afterwards.Multiple candidate regions can be specifically marked in sample image.The present embodiment to containing
There are source and access approaches of the sample image of human hand markup information etc. not to limit.In the embodiment of the present invention, human hand candidate regions
Domain is identical with the human hand candidate region that the above is mentioned.
The information of forecasting of human hand candidate region can include:The positional information of human hand region in sample image, example
Such as, coordinate points information or pixel information;The integrity degree information of human hand in human hand region, for example, human hand region
Include a complete human hand or only include a finger;Specific gesture information in human hand region, for example, gesture
Type, etc..The present embodiment is not limited the particular content of the information of forecasting of human hand candidate region.
In force, because its bigger data volume of the resolution ratio of image is also bigger, it is follow-up carry out human hand candidate region and
During gestures detection, required computing resource is more, and detection speed is slower, in consideration of it, in a kind of specific implementation side of the present invention
In formula, above-mentioned sample image can be the image for meeting default resolution condition.For example, above-mentioned default resolution condition can be with
It is:The longest edge of image is no more than 640 pixels, and most short side is no more than 480 pixels etc..
After obtaining sample image, (can be by way of manually marking) human hand time can be marked in every sample image
The information of favored area and gesture, obtain being labeled with multiple sample images of human hand candidate region.Wherein, the human hand candidate regions of mark
Domain can be that minimum rectangular area or elliptical region of whole hand etc. can be covered in image.
First convolution network model can include:First input layer, the first output layer and multiple first convolutional layers, wherein,
First input layer is used for input picture, and multiple first convolutional layers are used to image is detected to obtain human hand candidate region, then
Human hand candidate region is exported by the first output layer.The number of plies of the network parameter of each layer and the first convolutional layer can be by artificial
Setting, can also set at random, be determined according to actual demand.
Specifically, when the first convolution network model is handled sample image using multiple first convolutional layers, i.e., to sample
This image carries out feature extraction, defeated by first when the first convolution network model obtains the human hand candidate region in sample image
Enter layer and obtain sample image, the feature of sample image is then extracted by the first convolutional layer, and combine extracted feature and determine
Human hand candidate region in sample image, then result is exported by the first output layer.
The markup information of hand region in sample image is obtained, using the markup information as training foundation, by sample
In the initial model of image aqueduct the first convolution network model, gradient descent method and back-propagation algorithm can be used to carry out model
Training, obtains the first convolution network model.When training obtains the first convolution network model, it can first train to obtain the first input layer
Parameter, the first output layer parameter and multiple first convolutional layer parameters, then further according to the parameter obtained, build the first convolution net
Network model.
The sample image containing human hand markup information can be specifically used to be trained the first convolution network model, to make
The the first convolution network model that must train to obtain is more accurate, can be selected when selecting sample image it is a variety of in the case of sample
Image, the sample image for being labeled with human hand information can be included in sample image, can also include not being labeled with human hand information
Sample image.
Moreover, in the present embodiment, the first convolution network model can be RPN (Region Proposal Network), when
So, the present embodiment is simply illustrated as example, and the first convolution network model is not limited to that in practical application, for example,
It can also be Multi-Box Network or YOLO etc..
In step S220, the information of forecasting of amendment human hand candidate region.
In the present embodiment, the information of forecasting of the human hand candidate region for the sample image that the first convolution network model of training obtains
It is rough judged result, it is understood that there may be certain error rate.Moreover, the information of forecasting of human hand candidate region is made in subsequent step
To train the input item of the second convolution network model, therefore before the second convolution network model is trained, the first convolution will be trained
The rough judged result that network model obtains is modified.
Specific makeover process can be corrected manually, or introduce the mistake that other convolutional network models carry out error result
Filter etc., the purpose of amendment is, in the case of the input information for ensureing the second convolution network model is accurate, improves training second
The accuracy rate of convolutional network model.The present embodiment is not limited specific makeover process.
In step S230, the second convolution net is trained according to the information of forecasting of revised human hand candidate region and sample image
Network model.
Wherein, the second convolution network model and the first convolution network model sharing feature extract layer, and in the second convolution net
The parameter constant of feature extraction layer is kept during network model training.
In force, the second convolution network model can include:Second input layer, the second output layer, multiple second convolution
Layer and multiple full articulamentums.Second convolutional layer is mainly used in carrying out feature extraction, and full articulamentum is equivalent to grader, to volume Two
The feature that lamination extracts is classified, when the second convolution network model obtains the gestures detection result being directed in sample image,
Human hand candidate region is obtained by the second input layer, the feature of above-mentioned human hand candidate region is then extracted by the second convolutional layer,
Full articulamentum carries out classification processing according to the feature of human hand candidate region, determines human hand whether is included in sample image, and bag
In the case of human hand, the gesture of human hand candidate region and hand, finally classification results are exported by the second output layer.
Due to including convolutional layer in the first convolution network model and the second convolution network model, for the ease of carrying out model
Training, reduce amount of calculation, the network parameter of the feature extraction layer in above-mentioned two convolutional network model can be arranged to identical
Network parameter, i.e. the second convolution network model and the first convolution network model sharing feature extract layer, and in the second convolution net
The parameter constant of feature extraction layer is kept during network model training.
Based on this, in the present embodiment, when training obtains the second convolution network model, can first train to obtain input layer
The network parameter of network parameter and classification layer, then the network parameter of the feature extraction layer of the first convolution network model is defined as the
The network parameter of the feature extraction layer of two convolutional network models, then joined according to the network of the network parameter of input layer, layer of classifying
The network parameter of number and feature extraction layer builds the second convolution network model.
Specifically can be using the information of forecasting and sample image of revised human hand candidate region to the second convolutional network mould
Type is trained, and to cause the second convolution network model that training obtains more accurate, can be selected when selecting sample image
Sample image in the case of a variety of, the sample image for being labeled with gesture can be included in sample image, can also include not marking
There is the sample image of gesture.
Moreover, the sample image in the present embodiment can be to meet above-mentioned resolution condition or other resolution conditions
Sample image.
The gestural control method provided by the present embodiment, is respectively trained two convolutional network models:According to containing human hand
The sample image of markup information trains the first convolution network model, obtains the human hand that the first convolution network model is directed to sample image
The information of forecasting of candidate region;Correct the information of forecasting of human hand candidate region;According to the prediction of revised human hand candidate region
Information and sample image train the second convolution network model.Wherein, the first convolution network model and the second convolution network model are deposited
In following incidence relation:First convolution network model and the second convolution network model sharing feature extract layer, and in the second convolution
The parameter constant of feature extraction layer is kept in network model training process.
The information of forecasting of the human hand candidate region of sample image obtained due to the first convolution network model of training is rough
Judged result, it is understood that there may be certain error rate, therefore before the second convolution network model is trained, will first train the first convolution
The rough judged result that network model obtains, which is modified, (as being manually modified, or introduces other convolutional network models
Carry out filtering of error result etc.), then using the information of forecasting of revised human hand candidate region and sample image as volume Two
The input of product network model, in the case of the input information for ensureing the second convolution network model is accurate, improves training second
The accuracy rate of convolutional network model.
Moreover, the first convolution network model and the second convolution network model sharing feature extract layer, and in the second convolution net
The parameter constant of feature extraction layer is kept during network model training, the feature extraction layer of the second convolution network model can be direct
Using the feature extraction layer of the first convolution network model, provided convenience for the second convolution network model of training, reduce training
The amount of calculation of second convolution network model.
In the present embodiment, by training obtained the first convolution network model and the first convolution network model, after can facilitating
It is continuous that human hand and gestures detection are carried out to currently playing video image, and determine it is corresponding with the gesture detected show position,
And then business object to be shown is drawn using computer graphics mode in the above-mentioned position that shows of video image, so work as business
When object is used to show advertisement, compared with traditional video ads mode, on the one hand, the business object is mutually tied with video playback
Close, be without by the network transmission additional ad video data unrelated with video, saved Internet resources and/or client
System resource;On the other hand, business object is combined closely with the gesture in video image, has both remained video main body in video image
The main image of (such as main broadcaster) and action, add interest, while will not also bother user and normally watch for video image again
Video, so as to reduce user to the dislike of the business object showed in video image, and it can inhale to a certain extent
Draw the notice of spectators, improve the influence power of business object.
Embodiment three
Fig. 3 is the flow chart for the gestural control method for showing according to embodiments of the present invention three.Wherein, video image is live
Class video image, business object are to include the special efficacy of semantic information, specifically may include to include following at least the one of advertising message
The special efficacy of kind form:Two-dimentional paster special efficacy, three-dimensional special efficacy, particle effect etc..
In step S310, currently playing video image is obtained.
Wherein, above-mentioned steps S310 step content may refer to phase in above-described embodiment one in step S110 inside the Pass
Hold, will not be repeated here.
In the present embodiment, it can be determined by the convolutional network model of video image and training in advance corresponding to hand information
Human hand candidate region, and the gesture of hand is detected in human hand candidate region, handle accordingly referring to following step S320~step
S330。
In step S320, video image is detected using the first convolutional network of training in advance, obtains the first of video image
Characteristic information and the information of forecasting of human hand candidate region.
Wherein, fisrt feature information includes hand-characteristic information.First convolution network model can be used for detection image and draw
Whether the multiple candidate regions divided are human hand candidate region.
In force, the video image comprising hand information got can be input in above-described embodiment two and trained
In the first obtained convolution network model, video image can be entered respectively by the network parameter in the first convolution network model
Row such as feature extraction, mapping and conversion processing, to carry out human hand candidate region detection to video image, are obtained in video image
Comprising human hand candidate region.The information of forecasting of human hand candidate region is referred to introduction and explanation in above-described embodiment,
This is repeated no more.
In step S330, the volume Two using the information of forecasting of fisrt feature information and human hand candidate region as training in advance
The second feature information of product network model, and video image is carried out according to second feature information using the second convolution network model
Gestures detection, obtain the gestures detection result of video image.
Wherein, the second convolution network model and the first convolution network model sharing feature extract layer.Gesture is included below extremely
It is one of few:Wave, scissors hand, clench fist, ask hand, applause, palm to open, palm closure, perpendicular thumb, rifle posture of waving, pendulum V words
Hand and pendulum OK hands.
Above-mentioned steps S330 processing procedure may refer to the related content in above-described embodiment, will not be repeated here.
In step S340, when detecting that gesture matches with prearranged gesture, human hand corresponding with the gesture detected is extracted
The characteristic point of hand in candidate region.
In force, certain feature can all be included for each video image comprising hand information, wherein hand
Point, such as finger, palm, hand profile characteristic point.Human hand in video image is detected and determines characteristic point, can be adopted
Realize that the embodiment of the present invention is not construed as limiting to this with the mode in any appropriate correlation technique.For example, Linear feature extraction side
Formula such as PCA principal component analysis, LDA linear discriminant analysis, ICA independent component analysis etc.;Such as Nonlinear feature extraction mode again
Such as Kernel PCA core principle component analysis, manifold learning;The neural network model that training is completed such as the present invention can also be used
Convolutional network model in embodiment carries out the extraction of the characteristic point of hand.
By taking net cast as an example, during net cast is carried out, human hand and true is detected from live video image
Determine the characteristic point of hand;For another example in the playing process of a certain video for having recorded completion, examined from the video image of broadcasting
Survey human hand and determine the characteristic point of hand;In another example in the recording process of a certain video, detected from the video image of recording
Human hand simultaneously determines characteristic point of hand etc..
In step S350, according to the characteristic point of hand, determine that business object to be shown shows position in video image
Put.
In force, after the characteristic point of hand determines, industry to be shown can be determined using the characteristic point of hand as foundation
Business object shows position in one or more of video image.
In the present embodiment, business object to be shown showing in video image is being determined according to the characteristic point of hand
During position, feasible implementation includes:
Mode one, according to the characteristic point of hand, using training in advance, for the exhibition from video images detection business object
3rd convolutional network model of existing position, determines the exhibition of business object to be shown corresponding with hand position in video image
Existing position;Mode two, according to the characteristic point of hand and the type of business object to be shown, determine and detect in video image
To the corresponding business object to be shown of gesture show position in video image.
Hereinafter, above two mode is described in detail respectively.
Mode one
Occupation mode one determine business object to be shown in video image when showing position, it is necessary to training in advance
One convolutional network model (i.e. the 3rd convolutional network model), training the 3rd convolutional network model of completion has determination business pair
As the function of showing position in video image;Or can also directly using third party trained completion, have determine
The convolutional network model of the function that shows position of the business object in video image.
It should be noted that in the present embodiment, the training to business object illustrates emphatically, but those skilled in the art
It should be understood that the 3rd convolutional network model while being trained to business object, can also be trained to hand, realize
The joint training of hand and business object.
When needing the 3rd convolutional network model of training in advance, a kind of feasible training method includes procedure below:
(1) characteristic vector of business object sample image to be trained is obtained.
Wherein, the positional information and/or confidence of the business object in business object sample image are included in characteristic vector
Spend information.When the confidence information of business object indicates business object and is illustrated in current location, the effect that can reach is (such as quilt
Pay close attention to or be clicked or watched) probability, the probability can be according to the setting of the statistic analysis result of historical data, can also
Set, can also be set according to artificial experience according to the result of emulation experiment.In actual applications, can be according to actual need
Will, only the positional information of business object is trained, only the confidence information of business object can also be trained, may be used also
To be trained to the two.The two is trained, enables to the 3rd convolutional network model after training more effective
The positional information and confidence information of business object are accurately determined, to provide foundation for the displaying of business object.
3rd convolutional network model is trained by substantial amounts of sample image, it is necessary to use bag in the embodiment of the present invention
Business object sample image containing business object is trained to the 3rd convolutional network model, and those skilled in the art should be bright
, in the business object sample image trained, in addition to comprising business object, it should also include hand information.This
Outside, the business object in the business object sample image in the embodiment of the present invention can be by advance labeling position information, or puts
Confidence information, or two kinds of information have.Certainly, in actual applications, these information can also be obtained by other approach.And
By in advance to business object carry out corresponding information mark, can with the data and interaction times of effectively save data processing,
Improve data-handling efficiency.
Using the business object sample image of the positional information with business object and/or confidence information as training sample
This, carries out characteristic vector pickup to it, acquisition include the positional information of business object and/or the feature of confidence information to
Amount.
It is alternatively possible to hand and business object are trained simultaneously using the 3rd convolutional network model, in this situation
Under, in the characteristic vector of business object sample image, it should also the feature comprising hand.
Extraction to characteristic vector can use the appropriate ways in correlation technique to realize that the embodiment of the present invention is herein no longer
Repeat.
(2) process of convolution is carried out to characteristic vector, obtains characteristic vector convolution results.
In force, the positional information and/or confidence level of business object are included in the characteristic vector convolution results of acquisition
Information.In the case where carrying out joint training to hand and business object, hand information is also included in characteristic vector convolution results.
The process of convolution number of characteristic vector can be set according to being actually needed, that is, the 3rd convolutional network mould
In type, the number of plies of convolutional layer is configured according to being actually needed, and will not be repeated here.
Convolution results are that the result after feature extraction has been carried out to characteristic vector, and the result being capable of Efficient Characterization video image
The feature of middle hand.
In the embodiment of the present invention, when both including the positional information of business object in characteristic vector, and business object is included
During confidence information, that is, in the case that the positional information and confidence information to business object are trained, this feature
Vector convolution result subsequently respectively carry out the condition of convergence judgement when share, without being reprocessed and being calculated, reduce by
Resource loss caused by data processing, improves data processing speed and efficiency.
(3) in judging characteristic Vector convolution result the positional information of corresponding business object and/or confidence information whether
Meet the condition of convergence.
Wherein, the condition of convergence is suitably set according to the actual requirements by those skilled in the art.When information meets the condition of convergence
When, it is believed that it is appropriate that the network parameter in the 3rd convolutional network model is set;, can be with when information can not meet the condition of convergence
It is inappropriate, it is necessary to be adjusted to it to think that the network parameter in the 3rd convolutional network model is set, the adjustment is an iteration
Process, until using the network parameter after adjustment to characteristic vector carry out process of convolution result meet the condition of convergence.
In a kind of feasible pattern, the condition of convergence can be entered according to default normal place and/or default standard degree of confidence
Row setting, e.g., position and the default normal place that the positional information of business object in characteristic vector convolution results is indicated it
Between distance whether meet the condition of convergence of certain threshold value as the positional information of business object;By in characteristic vector convolution results
Whether the difference between the confidence level of the confidence information instruction of business object and default standard degree of confidence meets certain threshold value
Condition of convergence as the confidence information of business object etc..
Wherein it is preferred to default normal place can be the business pair in the business object sample image for treat training
The mean place that the position of elephant obtains after being averaging processing;Default standard degree of confidence can be the business object for treating training
The average confidence that the confidence level of business object in sample image obtains after being averaging processing.Because sample image is to wait to train
Sample and data volume is huge, position that can be according to the business object in business object sample image to be trained and/or confidence level
Established standardses position and/or standard degree of confidence, the normal place so set and standard degree of confidence are also more objective and accurate.
It is specifically carrying out the positional information of corresponding business object in characteristic vector convolution results and/or confidence information
It is no meet the condition of convergence judgement when, a kind of feasible mode includes:
The positional information of corresponding business object in characteristic vector convolution results is obtained, passes through business object corresponding to calculating
Positional information instruction position and default normal place between Euclidean distance, obtain corresponding to business object position letter
The first distance between the position of instruction and default normal place is ceased, according to the position of business object corresponding to the first Distance Judgment
Whether confidence breath meets the condition of convergence;
And/or
Obtain the confidence information of corresponding business object in characteristic vector convolution results, business object corresponding to calculating
Euclidean distance between the confidence level of confidence information instruction and default standard degree of confidence, obtains putting for corresponding business object
The 3rd distance between the confidence level of confidence information instruction and default standard degree of confidence, according to industry corresponding to the 3rd Distance Judgment
Whether the confidence information of business object meets the condition of convergence.Wherein, by the way of Euclidean distance, realization is simple and can be effective
Whether the instruction condition of convergence is satisfied.But not limited to this, other manner, such as horse formula distance, bar formula distance etc. is equally applicable.
Preferably, as it was previously stated, default normal place is the business pair in the business object sample image for treat training
The mean place that the position of elephant obtains after being averaging processing;And/or default standard degree of confidence is the business pair for treating training
The average confidence obtained after being averaging processing as the confidence level of the business object in sample image.
(4) if meeting the condition of convergence, the training to convolutional network model is completed;If being unsatisfactory for the condition of convergence, basis
The positional information and/or confidence information of corresponding business object in characteristic vector convolution results, adjust the 3rd convolutional network mould
The network parameter of type simultaneously changes according to the network parameter of the 3rd convolutional network model after adjustment to the 3rd convolutional network model
Generation training, the positional information and/or confidence information of the business object after repetitive exercise meet the condition of convergence.
By carrying out above-mentioned training to the 3rd convolutional network model, the 3rd convolutional network model can be to being carried out based on hand
The position that shows of the business object of displaying carries out feature extraction and classification, determines business object in video image so as to have
Show the function of position.Wherein, when showing position and including multiple, by the training of above-mentioned business object confidence level, volume three
Product network model can also determine multiple orders of quality for showing the bandwagon effect in position, so that it is determined that optimal shows position
Put.In subsequent applications, when needing to show business object, the present image in video, which can determine that, effectively to be showed
Position.
In addition, before above-mentioned training is carried out to the 3rd convolutional network model, can also be in advance to business object sample graph
As being pre-processed, including:Multiple business object sample images are obtained, wherein, include in each business object sample image
The markup information of business object;The position of business object is determined according to markup information, judge determine business object position with
Whether the distance of predeterminated position is less than or equal to given threshold;By business corresponding to the business object less than or equal to given threshold
Object samples image, it is defined as business object sample image to be trained.Wherein, predeterminated position and given threshold can be by these
Art personnel are appropriately arranged with using any appropriate ways, such as according to data statistic analysis result or correlation distance meter
Formula or artificial experience etc. are calculated, the embodiment of the present invention is not construed as limiting to this.
By being pre-processed in advance to business object sample image, ineligible sample image can be filtered out,
To ensure the accuracy of training result.
The training of the 3rd convolutional network model is realized by said process, trains the 3rd convolutional network model of completion can
With for determining that business object shows position in video image.For example, during net cast, if main broadcaster's click-to-call service
When object instruction carries out business object displaying, the hand of main broadcaster in live video image is obtained in the 3rd convolutional network model
After characteristic point, the forehead position of the optimal location such as main broadcaster of displaying business object is can indicate that, and then controls live apply
The position shows business object;Or during net cast, if the instruction of main broadcaster's click-to-call service object carries out business object exhibition
When showing, the 3rd convolutional network model directly can show position according to what live video image determined business object.
Mode two
According to the characteristic point of hand and the type of business object to be shown, determined and hand position phase in video image
The business object to be shown answered shows position.
In force, after the characteristic point of hand is obtained, business to be shown can be determined according to the rule of setting
Object shows position.Wherein it is determined that the position that shows of business object to be shown includes at least one of:In video image
Body region beyond the palm area of personage, the upper area of palm, the lower zone of palm, the background area of palm, hand
The region in setting range in background area, video image in domain, video image centered on the region where hand, regard
Region set in advance etc. in frequency image.
After determining and showing position, it may further determine that business object to be shown shows position in video image
Put.For example, carry out business pair to show show place-centric point of the central point for showing region of position correspondence as business object
The displaying of elephant;For another example a certain coordinate position showed in region for showing position correspondence is defined as the center for showing position
Point etc., the embodiment of the present invention is not construed as limiting to this.
In a preferred embodiment, it is determined that business object to be shown shows position in video image
When, not only according to the characteristic point of hand, always according to the type of business object to be shown, determine business object to be shown regarding
Show position in frequency image.Wherein, the type of business object includes at least one of:Forehead patch type, cheek paster
Type, chin patch type, virtual hat-type, virtual costume type, virtual dressing type, virtual headwear type, virtual hair
Type, virtual jewellery type are adornd, in addition, virtual bottle cap type, virtual cup type, literal type etc. can also be included.
Can be business pair using the characteristic point of hand and hand position as reference in addition, always according to the type of business object
As selecting appropriate to show position.
In addition, in the characteristic point according to hand and the type of business object to be shown, business object to be shown is obtained
In the case that multiple in video image show position, can from it is multiple show select at least one to show position in position.
For example, for the business object of literal type, background area can be illustrated in, the palm area or hand of personage can also be illustrated in
Portion's upper area etc..
Gesture and show the corresponding relation of position furthermore, it is possible to prestore, it is determined that the gesture detected with it is corresponding
When prearranged gesture matches, it can be obtained from the gesture prestored and the corresponding relation for showing position corresponding to prearranged gesture
Target shows position and shows position in video image as business object to be shown., wherein it is desired to explanation, although
Above-mentioned gesture be present and show the corresponding relation of position, still, necessarily relation, gesture are not only gesture with showing position
A kind of mode that triggering business object shows, and show position and necessarily relation is also not present with human hand, it that is to say business object
Some region of hand can be presented in, the other regions that can also be shown in outside hand, such as the background area of video image
Domain etc..Moreover, identical gesture can also trigger the display of different business object, for example, main broadcaster has continuously done what is waved twice
Gesture, first time gesture can show two-dimentional paster special efficacy, and second of gesture can show three-dimensional special efficacy etc., and special efficacy twice
The contents such as corresponding advertisement can be with identical, can also be different.
In step S360, business object to be shown is drawn using computer graphics mode showing position.
When business object is the two-dimentional paster special efficacy for including semantic information, the paster can be used to carry out advertisement putting
And displaying.Before the drafting of business object is carried out, the relevant information of business object can be first obtained, such as the mark of business object
Knowledge, size etc..After determining and showing position, business object can be zoomed in and out, rotated according to the coordinate for showing position
Adjustment, then, is drawn by corresponding plotting mode such as OpenGL modes to business object to be shown.In some situations
Under, advertisement can also be shown in the form of three-dimensional special efficacy, and the word or LOGO of advertisement are such as shown by particle effect mode.Example
Such as, the title of a certain product is shown by the two-dimentional paster special efficacy of virtual bottle cap type, attracts spectators' viewing, improve advertisement putting
With displaying efficiency.
Gestural control method provided in an embodiment of the present invention, the displaying of business object is triggered by gesture, when business pair
During as showing advertisement, compared with traditional video ads mode, on the one hand, the business object is combined with video playback,
The system of Internet resources and/or client need not be saved by the network transmission additional ad video data unrelated with video
Resource;On the other hand, business object is combined closely with the gesture in video image, has both remained video main body in video image
The main image of (such as main broadcaster) and action, add interest, while will not also bother user and normally watch for video image again
Video, so as to reduce user to the dislike of the business object showed in video image, and it can inhale to a certain extent
Draw the notice of spectators, improve the influence power of business object.
Example IV
Fig. 4 is the flow chart for the gestural control method for showing according to embodiments of the present invention four.
Special efficacy of the present embodiment using business object to include semantic information, the business object include including advertising message
The special efficacy of following at least one form:Two-dimentional paster special efficacy, three-dimensional special efficacy, particle effect, exemplified by specially two-dimentional paster special efficacy,
The gesture control scheme of the embodiment of the present invention is illustrated.
The gestural control method of the present embodiment comprises the following steps:
In step S401, the first convolution network model is trained according to the sample image containing human hand markup information, obtains the
Information of forecasting of the one convolution network model for the human hand candidate region of sample image.
In step S402, the information of forecasting of amendment human hand candidate region.
In step S403, the second convolution net is trained according to the information of forecasting of revised human hand candidate region and sample image
Network model.
Wherein, the second convolution network model and the first convolution network model sharing feature extract layer, and in the second convolution net
The parameter constant of feature extraction layer is kept during network model training.
Above-mentioned steps S401~step S403 step content may refer to the related content in above-described embodiment, herein not
Repeat again.
In step S404, the characteristic vector of business object sample image to be trained is obtained.
Wherein, the positional information and/or confidence of the business object in business object sample image are included in characteristic vector
Spend information, and characteristic vector corresponding to gesture.Business object sample image to be trained can be above-mentioned containing human hand mark
The sample image of information.
In force, some training standards for not meeting the 3rd convolutional network model in business object sample image be present
Sample image is, it is necessary to be filtered out this part sample image by the pretreatment to business object sample image.
First, in the present embodiment, business object, and each business object are included in each business object sample image
All it is labeled with positional information and confidence information.In a kind of feasible embodiment, the position of the central point of business object is believed
Cease the positional information as the business object.In this step, sample image was carried out according only to the positional information of business object
Filter.The coordinate of the position of positional information instruction is obtained, the coordinate and the position coordinates of the business object of default the type are entered
Row compares, and calculates the position variance of the two.If the position variance is less than or equal to the threshold value of setting, the business object sample graph
Picture can be as sample image to be trained;If the position variance is more than the threshold value of setting, the business object sample is filtered out
Image.Wherein, default position coordinates and the threshold value of setting can suitably be set by those skilled in the art according to actual conditions
Put, for example, because the image for being generally used for the 3rd convolutional network model training has identical size, therefore the threshold value set can
Think image it is long or wide 1/20~1/5, it is preferable that can be image it is long or wide 1/10.
Further, it is also possible to pair determine business object sample image to be trained in business object position and confidence level
It is averaged, obtains mean place and average confidence, the mean place and average confidence can be used as follow-up determination to restrain
The foundation of condition.
It is used for the business object sample graph trained when being example by two-dimentional paster special efficacy of business object, in the present embodiment
As needing to be labeled with the coordinate of optimal location advertising and the confidence level of the advertisement position.Wherein, optimal location advertising can hand,
The place such as preceding background mark, therefore the joint training of the advertisement position in the place such as hand-characteristic point, preceding background can be realized, this is relative
In the scheme individually trained based on one technology of hand, be advantageous to save computing resource.It is wide that the size of confidence level illustrates this
The probability that position is optimal advertisement position is accused, if for example, this advertisement position is to be blocked more, confidence level is low.
In step S405, process of convolution is carried out to characteristic vector, obtains characteristic vector convolution results.
In step S406, the positional information and/or confidence of corresponding business object in this feature Vector convolution result are judged
Whether degree information meets the condition of convergence.
In step S407, if satisfied, then completing the training to the 3rd convolutional network model;If not satisfied, then according to feature
The positional information and/or confidence information of corresponding business object in Vector convolution result, the 3rd convolutional network model of adjustment
Network parameter is simultaneously iterated instruction according to the network parameter of the 3rd convolutional network model after adjustment to the 3rd convolutional network model
Practice, until the positional information and/or confidence information of the business object after repetitive exercise meet the condition of convergence.
Above-mentioned steps S404~step S407 specific processing may refer to the related content in above-described embodiment, herein not
Repeat again.
The 3rd convolutional network model of training completion can be obtained by above-mentioned steps S404~step S407 processing.Its
In, the structure of the 3rd convolutional network model may be referred to the first convolution network model or the second convolutional network in above-described embodiment two
The structure of model, will not be repeated here.
The first convolution network model, the second convolution network model and the 3rd convolutional network model obtained by above-mentioned training
Video image can be handled accordingly, specifically may comprise steps of S408~step S413.
In step S408, currently playing video image is obtained.
In step S409, video image is detected using the first convolutional network of training in advance, obtains the first of video image
Characteristic information and the information of forecasting of human hand candidate region.
In step S410, the volume Two using the information of forecasting of fisrt feature information and human hand candidate region as training in advance
The second feature information of product network model, and video image is carried out according to second feature information using the second convolution network model
Gestures detection, obtain the gestures detection result of video image.
Wherein, can be with probability in the case where including human hand during video image is determined after carrying out human hand candidate region detection
Form determine gesture in human hand candidate region.For example, so that palm opens gesture and palm closure gesture as an example, when palm
Open gesture probability it is high when, it is believed that in video image comprising palm open gesture human hand, when palm closure gesture it is general
When rate is high, it is believed that the human hand of palm closure gesture is included in video image.
And then in a kind of optional implementation of the application, the output result of the second convolution network model model can be with
Including:Human hand candidate region does not include the probability of the human hand of the probability of human hand, human hand candidate region comprising palm opening gesture, people
Hand candidate region includes probability of human hand of palm closure gesture etc..
To improve detection speed, in the case where the first convolutional layer parameter is consistent with the second convolutional layer parameter, the second convolution
Network model model obtains the gestures detection for video image according to human hand candidate region and the feature of various predetermined gestures
When as a result, the fisrt feature for the video image that the second convolution network model model directly can extract multiple first convolutional layers,
It is defined as the second feature of the human hand candidate region of multiple second convolutional layer extractions, then according to above-mentioned second feature, by more
Individual full articulamentum carries out classification processing to human hand candidate region, obtains the gestures detection result for video image.So can be with
Amount of calculation is greatlyd save, improves detection speed.
In step S411, when it is determined that the gesture of the hand detected matches with corresponding prearranged gesture, extraction and inspection
The characteristic point of hand in the corresponding human hand candidate region of gesture measured.
In step S412, according to the characteristic point of hand, using training in advance, for determining business object in video image
In the 3rd convolutional network model for showing position, corresponding with hand position business pair to be shown is determined in video image
Elephant shows position.
In step S413, business object to be shown is drawn using computer graphics mode showing position.
With the rise of the live and short video sharing in internet, increasing video is in a manner of live or short-sighted frequency
Occur.This kind of video is usually using personage as leading role (single personage or a small amount of personage), using personage plus simple background as prevailing scenario,
Spectators mainly watch on the mobile terminals such as mobile phone.In the case, the dispensing for some business objects (such as advertisement putting)
For, on the one hand, because the screen shows region of mobile terminal is limited, if placing advertisement with traditional fixed position, often
Main Consumer's Experience region is taken, easily causes user to dislike;On the other hand, for the live application of main broadcaster's class, due to live
Instantaneity, the advertisement of traditional fixed duration of insertion can substantially bother the continuity of user and anchor exchange, influence user's sight
See experience;Another further aspect, for short video ads, because the content duration of live or short-sighted frequency is natively shorter, also adopted
The advertisement that fixed duration is inserted with traditional approach brings difficulty.And by the present embodiment provide scheme, can in real time to regarding
Video image in frequency playing process is detected, and provides the optimal ad placement of effect, and do not influence the viewing of user
Experience, it is more preferable to launch effect;By the way that business object is combined with video playback, so as to need not by network transmission and video without
The additional ad video data of pass, has saved the system resource of Internet resources and/or client;Moreover, business object and video
Gesture in image is combined closely, and has both been remained the main image of video main body (such as main broadcaster) and action in video image, has been again
Video image adds interest, while will not also bother user and normally watch video, so as to reduce user to video figure
The dislike of business object is shown as in, and the notice of spectators can be attracted to a certain extent, improves the shadow of business object
Ring power.
Embodiment five
Based on identical technical concept, Fig. 5 is the box for the gesture control device for showing according to embodiments of the present invention five
Figure.Reference picture 5, described device include gesture detection module 501, show position determination module 502 and business object drafting module
503。
Gesture detection module 501, for carrying out gestures detection to currently playing video image.
Show position determination module 502, for when detecting that gesture matches with prearranged gesture, determining business to be shown
Object shows position in video image.
Business object drafting module 503, for drawing business object using computer graphics mode showing position.
The gesture control device that the present embodiment provides, by being carried out to the currently playing video image comprising hand information
Human hand candidate region and gestures detection, and the gesture detected is matched with corresponding prearranged gesture, when both match
When, determine that business object to be shown shows position in video image by hand position, be used to open up when business object
When showing advertisement, compared with traditional video ads mode, on the one hand, the business object is combined with video playback, without passing through
The network transmission additional ad video data unrelated with video, has saved the system resource of Internet resources and/or client;It is another
Aspect, business object are combined closely with the gesture in video image, have both been remained in video image video main body (such as main broadcaster)
Main image and action, add interest for video image again, while will not also bother user and normally watch video, so as to
To reduce user to the dislike of the business object showed in video image, and the attention of spectators can be attracted to a certain extent
Power, improve the influence power of business object.
Embodiment six
Based on identical technical concept, referring to the logic diagram of Fig. 6 gesture control device.
The gesture control device of the present embodiment includes:Gesture detection module 501, for entering to currently playing video image
Row gestures detection;Position determination module 502, for when detecting that gesture matches with prearranged gesture, determining business to be shown
Object shows position in video image;Business object drafting module 503, for using computer graphics side showing position
Formula draws business object.
Alternatively, showing position determination module 503 includes:Feature point extraction unit, for the gesture extracted and detected
The characteristic point of hand in corresponding human hand candidate region;Show position determination unit, for the characteristic point according to hand, it is determined that with
The corresponding business object to be shown of gesture detected shows position in video image.
Alternatively, position determination unit 503 is showed, for the characteristic point according to hand and the class of business object to be shown
Type, it is determined that business object to be shown corresponding with the gesture detected shows position in video image.
Alternatively, position determination unit 503 is showed, for the characteristic point according to hand and the class of business object to be shown
Type, it is determined that multiple in video image of business object to be shown corresponding with the gesture detected show position;From multiple
Show and select at least one to show position in position.
Alternatively, position determination module 503 is showed, for when the gesture and corresponding prearranged gesture phase that determine to detect
Timing, it is determined that corresponding with prearranged gesture business object to be shown shows position as with detecting in video image
The corresponding business object to be shown of gesture shows position in video image.
Alternatively, position determination module 503 is showed, for from the gesture that prestores and the corresponding relation for showing position
In, obtain prearranged gesture corresponding to target show position as the corresponding business object to be shown of the gesture with detecting regarding
Show position in frequency image.
Alternatively, business object is to include the special efficacy of semantic information, and video image is live class video image.
Alternatively, the business object includes the special efficacy of following at least one form comprising advertising message:Two-dimentional paster is special
Effect, three-dimensional special efficacy, particle effect.
Alternatively, showing position includes at least one of:The hair zones of personage, forehead region, face in video image
The background area in body region, video image beyond buccal region domain, chin area, head, in video image with where hand
Region centered on setting range in region, region set in advance in video image.
Alternatively, the type of business object includes at least one of:Forehead patch type, cheek patch type, chin
Patch type, virtual hat-type, virtual costume type, virtual dressing type, virtual headwear type, virtual hair decorations type, void
Intend jewellery type.
Alternatively, gesture includes at least one of:Wave, scissors hand, clench fist, hold in the palm hand, applause, palm open, palm
Closure, perpendicular thumb, rifle posture of waving, pendulum V words hand and pendulum OK hands.
Alternatively, gesture detection module 502, for detecting video image using the first convolutional network of training in advance, obtain
The fisrt feature information of video image and the information of forecasting of human hand candidate region are obtained, fisrt feature information includes hand-characteristic and believed
Breath;Second using the information of forecasting of fisrt feature information and human hand candidate region as the second convolution network model of training in advance
Characteristic information, and the gestures detection using the second convolution network model according to second feature information progress video image, depending on
The gestures detection result of frequency image;Wherein, the second convolution network model and the first convolution network model sharing feature extract layer.
Alternatively, the device also includes:Human hand area determination module 504, for according to the sample containing human hand markup information
This image trains the first convolution network model, obtains the first convolution network model for the pre- of the human hand candidate region of sample image
Measurement information;Correcting module 505, for correcting the information of forecasting of human hand candidate region;Convolution model training module 506, for root
The second convolution network model is trained according to the information of forecasting and sample image of revised human hand candidate region, wherein, the second convolution
Network model and the first convolution network model sharing feature extract layer, and keep special in the second convolution network model training process
Levy the parameter constant of extract layer.
Alternatively, show position determination module 503, for by gesture and training in advance, for being examined from video image
The 3rd convolutional network model for showing position of business object is surveyed, it is determined that business pair to be shown corresponding with the gesture detected
Elephant shows position.
The gesture control device provided by the present embodiment, by the currently playing video image for including hand information
Human hand candidate region and gestures detection are carried out, and the gesture detected is matched with corresponding prearranged gesture, when both phases
During matching, determine that business object to be shown shows position in video image by hand position, used when business object
When advertisement is shown, compared with traditional video ads mode, on the one hand, the business object is combined with video playback, without
By the additional ad video data that network transmission is unrelated with video, the system resource of Internet resources and/or client has been saved;
On the other hand, business object is combined closely with the gesture in video image, has both remained video main body in video image (such as master
Broadcast) main image and action, add interest again for video image, while will not also bother user and normally watch video,
So as to reduce user to the dislike of the business object showed in video image, and spectators can be attracted to a certain extent
Notice, improve the influence power of business object.
Embodiment seven
Reference picture 7, a kind of structural representation of according to embodiments of the present invention seven terminal device is shown, the present invention is specifically
Embodiment is not limited the specific implementation of terminal device.
As shown in fig. 7, the terminal device can include:Processor (processor) 702, communication interface
(Communications Interface) 704, memory (memory) 706 and communication bus 708.
Wherein:
Processor 702, communication interface 704 and memory 706 complete mutual communication by communication bus 708.
Communication interface 704, the network element for clients such as other with miscellaneous equipment or server etc. communicate.
Processor 702, for configuration processor 710, it can specifically perform the correlation step in above method embodiment.
Specifically, program 710 can include program code, and the program code includes computer-managed instruction.
Processor 710 is probably central processor CPU, or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or it is arranged to implement the integrated electricity of one or more of the embodiment of the present invention
Road, or graphics processor GPU (Graphics Processing Unit).One or more processing that terminal device includes
Device, can be same type of processor, such as one or more CPU, or, one or more GPU;It can also be different type
Processor, such as one or more CPU and one or more GPU.
Memory 706, for depositing program 710.Memory 706 may include high-speed RAM memory, it is also possible to also include
Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.
Program 710 specifically can be used for so that processor 702 performs following operation:Currently playing video image is carried out
Gestures detection;When detecting that gesture matches with prearranged gesture, business object to be shown showing in video image is determined
Position;Business object is drawn using computer graphics mode showing position.
In a kind of optional embodiment, program 710 is additionally operable to cause processor 702 it is determined that business pair to be shown
As the position that shows in the video image, including:Extract hand in human hand candidate region corresponding with the gesture detected
Characteristic point;According to the characteristic point of hand, it is determined that business object to be shown corresponding with the gesture detected is in video image
In show position.
In a kind of optional embodiment, program 710 is additionally operable to cause processor 702 in the characteristic point according to hand,
It is determined that business object to be shown corresponding with the gesture detected shows position in video image, including:According to hand
Characteristic point and business object to be shown type, it is determined that business object to be shown corresponding with the gesture detected regarding
Show position in frequency image.
In a kind of optional embodiment, program 710 be additionally operable to cause processor 702 in the characteristic point according to hand and
The type of business object to be shown, it is determined that business object to be shown corresponding with the gesture detected is in video image
Show position, including:According to the characteristic point of hand and the type of business object to be shown, it is determined that corresponding to the gesture detected
Multiple in video image of business object to be shown show position;From it is multiple show at least one show is selected in position
Position.
In a kind of optional embodiment, program 710 is additionally operable to cause processor 702 it is determined that business pair to be shown
As the position that shows in video image, including:From the gesture prestored and the corresponding relation for showing position, obtain predetermined
Target corresponding to gesture shows position as the corresponding business object to be shown of the gesture with detecting in video image
Show position.
In a kind of optional embodiment, business object is to include the special efficacy of semantic information, and video image is live
Class video image.
In a kind of optional embodiment, above-mentioned business object includes following at least one form comprising advertising message
Special efficacy:Two-dimentional paster special efficacy, three-dimensional special efficacy, particle effect.
In a kind of optional embodiment, showing position includes at least one of:The hair of personage in video image
Body region beyond region, forehead region, cheek region, chin area, head, the background area in video image, video
The region in setting range in image centered on the region where hand, region set in advance in video image.
In a kind of optional embodiment, the type of business object includes at least one of:Forehead patch type, face
Cheek patch type, chin patch type, virtual hat-type, virtual costume type, virtual dressing type, virtual headwear type,
Virtual hair decorations type, virtual jewellery type.
In a kind of optional embodiment, gesture includes at least one of:Wave, scissors hand, clench fist, hold in the palm hand, drum
The palm, palm opening, palm closure, perpendicular thumb, rifle posture of waving, pendulum V words hand and pendulum OK hands.
In a kind of optional embodiment, program 710 is additionally operable to cause processor 702 to currently playing video figure
As carrying out gestures detection, including:Video image is detected using the first convolutional network of training in advance, obtains the first of video image
Characteristic information and the information of forecasting of human hand candidate region, fisrt feature information include hand-characteristic information;By fisrt feature information
With the second feature information of the information of forecasting of human hand candidate region as the second convolution network model of training in advance, and using the
Two convolutional network models carry out the gestures detection of video image according to second feature information, obtain the gestures detection knot of video image
Fruit;Wherein, the second convolution network model and the first convolution network model sharing feature extract layer.
In a kind of optional embodiment, program 710 is additionally operable to cause processor 702 to currently playing video figure
As before carrying out gestures detection, training the first convolution network model according to the sample image containing human hand markup information, obtaining the
Information of forecasting of the one convolution network model for the human hand candidate region of sample image;Correct the prediction letter of human hand candidate region
Breath;Second convolution network model is trained according to the information of forecasting of revised human hand candidate region and sample image, wherein, second
Convolutional network model and the first convolution network model sharing feature extract layer, and protected in the second convolution network model training process
Hold the parameter constant of feature extraction layer.
In a kind of optional embodiment, program 710 be additionally operable to so that processor 702 it is determined that with the gesture that detects
Corresponding business object to be shown shows position in video image, including:By gesture and training in advance, for from
The 3rd convolutional network model for showing position of video images detection business object, it is determined that corresponding with the gesture detected wait to show
The business object shown shows position.
The terminal device provided by the present embodiment, by being carried out to the currently playing video image comprising hand information
Human hand candidate region and gestures detection, and the gesture detected is matched with corresponding prearranged gesture, when both match
When, determine that business object to be shown shows position in video image by hand position, be used to open up when business object
When showing advertisement, compared with traditional video ads mode, on the one hand, the business object is combined with video playback, without passing through
The network transmission additional ad video data unrelated with video, has saved the system resource of Internet resources and/or client;It is another
Aspect, business object are combined closely with the gesture in video image, have both been remained in video image video main body (such as main broadcaster)
Main image and action, add interest for video image again, while will not also bother user and normally watch video, so as to
To reduce user to the dislike of the business object showed in video image, and the attention of spectators can be attracted to a certain extent
Power, improve the influence power of business object.
It may be noted that according to the needs of implementation, each step/part described in this application can be split as more multistep
Suddenly/part, the part operation of two or more step/parts or step/part can be also combined into new step/part,
To realize the purpose of the present invention.
Above-mentioned the method according to the invention can be realized in hardware, firmware, or be implemented as being storable in recording medium
Software or computer code in (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk), or it is implemented through network download
Original storage in long-range recording medium or nonvolatile machine readable media and the meter that will be stored in local recording medium
Calculation machine code, so as to which method described here can be stored in using all-purpose computer, application specific processor or programmable or special
With such software processing in hardware (such as ASIC or FPGA) recording medium.It is appreciated that computer, processor, micro-
Processor controller or programmable hardware include can storing or receive software or computer code storage assembly (for example, RAM,
ROM, flash memory etc.), when the software or computer code are by computer, processor or hardware access and when performing, realize herein
The processing method of description.In addition, when all-purpose computer accesses the code for realizing the processing being shown in which, the execution of code
All-purpose computer is converted into the special-purpose computer for performing the processing being shown in which.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (10)
1. a kind of gestural control method, it is characterised in that methods described includes:
Gestures detection is carried out to currently playing video image;
When detecting that gesture matches with prearranged gesture, determine that business object to be shown shows position in the video image
Put;
The business object is drawn using computer graphics mode in the position that shows.
2. according to the method for claim 1, it is characterised in that described to determine business object to be shown in the video figure
Show position as in, including:
Extract the characteristic point of hand in human hand candidate region corresponding with the gesture detected;
According to the characteristic point of the hand, it is determined that business object to be shown corresponding with the gesture detected regards described
Show position in frequency image.
3. according to the method for claim 2, it is characterised in that the characteristic point according to the hand, it is determined that with detection
To the corresponding business object to be shown of the gesture show position in the video image, including:
According to the characteristic point of the hand and the type of the business object to be shown, it is determined that with the gesture phase that detects
The business object to be shown answered shows position in the video image.
4. according to the method for claim 3, it is characterised in that the characteristic point according to the hand and described to be shown
Business object type, it is determined that the business object to be shown corresponding with the gesture detected is in the video figure
Show position as in, including:
According to the characteristic point of the hand and the type of the business object to be shown, it is determined that with the gesture phase that detects
Multiple in the video image of the business object to be shown answered show position;
From it is the multiple show select at least one to show position in position.
5. according to any described methods of claim 1-4, it is characterised in that the gesture includes at least one of:Wave,
Scissors hand, clench fist, hold in the palm hand, applause, palm opening, palm closure, perpendicular thumb, rifle posture of waving, pendulum V words hand and pendulum OK hands.
6. according to any described methods of claim 1-5, it is characterised in that described to carry out hand to currently playing video image
Gesture detects, including:
The video image is detected using the first convolutional network of training in advance, obtains the fisrt feature information of the video image
With the information of forecasting of human hand candidate region, the fisrt feature information includes hand-characteristic information;
The second convolutional network using the information of forecasting of the fisrt feature information and the human hand candidate region as training in advance
The second feature information of model, and the video is carried out according to the second feature information using the second convolution network model
The gestures detection of image, obtain the gestures detection result of the video image;Wherein, the second convolution network model and described
First convolution network model sharing feature extract layer.
7. according to the method for claim 6, it is characterised in that described that gestures detection is carried out to currently playing video image
Before, methods described also includes:
First convolution network model is trained according to the sample image containing human hand markup information, obtains the first convolutional network mould
Information of forecasting of the type for the human hand candidate region of the sample image;
Correct the information of forecasting of the human hand candidate region;
Second convolution network model is trained according to the information of forecasting of the revised human hand candidate region and the sample image,
Wherein, the second convolution network model and the first convolution network model sharing feature extract layer, and in the volume Two
The parameter constant of the feature extraction layer is kept in product network model training process.
8. according to any described methods of claim 1-7, it is characterised in that described to determine business object to be shown described
Show position in video image, including:
By the gesture and training in advance, for the 3rd convolution net for showing position from video images detection business object
Network model, it is determined that business object to be shown corresponding with the gesture detected shows position.
9. a kind of gesture control device, it is characterised in that described device includes:
Gesture detection module, for carrying out gestures detection to currently playing video image;
Show position determination module, for when detecting that gesture matches with prearranged gesture, determining that business object to be shown exists
Show position in the video image;
Business object drafting module, for drawing the business object using computer graphics mode in the position that shows.
10. a kind of terminal device, including:Processor, memory, communication interface and communication bus, the processor, the storage
Device and the communication interface complete mutual communication by the communication bus;
The memory is used to deposit an at least executable instruction, and the executable instruction makes the computing device such as right will
Ask operation corresponding to the gestural control method described in 1 to 8 any one.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610694510.2A CN107340852A (en) | 2016-08-19 | 2016-08-19 | Gestural control method, device and terminal device |
PCT/CN2017/098182 WO2018033154A1 (en) | 2016-08-19 | 2017-08-19 | Gesture control method, device, and electronic apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610694510.2A CN107340852A (en) | 2016-08-19 | 2016-08-19 | Gestural control method, device and terminal device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107340852A true CN107340852A (en) | 2017-11-10 |
Family
ID=60223091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610694510.2A Pending CN107340852A (en) | 2016-08-19 | 2016-08-19 | Gestural control method, device and terminal device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107340852A (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108707A (en) * | 2017-12-29 | 2018-06-01 | 北京奇虎科技有限公司 | Gesture processing method and processing device based on video data, computing device |
CN108614995A (en) * | 2018-03-27 | 2018-10-02 | 深圳市智能机器人研究院 | Gesture data collection acquisition method, gesture identification method and device for YOLO networks |
CN108921081A (en) * | 2018-06-27 | 2018-11-30 | 百度在线网络技术(北京)有限公司 | The detection method and device of user's operation |
CN108932053A (en) * | 2018-05-21 | 2018-12-04 | 腾讯科技(深圳)有限公司 | Drawing practice, device, storage medium and computer equipment based on gesture |
CN109327760A (en) * | 2018-08-13 | 2019-02-12 | 北京中科睿芯科技有限公司 | A kind of intelligent sound and its control method for playing back |
CN109492577A (en) * | 2018-11-08 | 2019-03-19 | 北京奇艺世纪科技有限公司 | A kind of gesture identification method, device and electronic equipment |
CN109614953A (en) * | 2018-12-27 | 2019-04-12 | 华勤通讯技术有限公司 | A kind of control method based on image recognition, mobile unit and storage medium |
CN109799905A (en) * | 2018-12-28 | 2019-05-24 | 深圳云天励飞技术有限公司 | A kind of hand tracking and advertisement machine |
CN110287891A (en) * | 2019-06-26 | 2019-09-27 | 北京字节跳动网络技术有限公司 | Gestural control method, device and electronic equipment based on human body key point |
CN110442238A (en) * | 2019-07-31 | 2019-11-12 | 腾讯科技(深圳)有限公司 | A kind of method and device of determining dynamic effect |
CN110879946A (en) * | 2018-09-05 | 2020-03-13 | 武汉斗鱼网络科技有限公司 | Method, storage medium, device and system for combining gesture with AR special effect |
CN110942005A (en) * | 2019-11-21 | 2020-03-31 | 网易(杭州)网络有限公司 | Object recognition method and device |
CN110971924A (en) * | 2018-09-30 | 2020-04-07 | 武汉斗鱼网络科技有限公司 | Method, device, storage medium and system for beautifying in live broadcast process |
CN111078011A (en) * | 2019-12-11 | 2020-04-28 | 网易(杭州)网络有限公司 | Gesture control method and device, computer readable storage medium and electronic equipment |
CN111341013A (en) * | 2020-02-10 | 2020-06-26 | 北京每日优鲜电子商务有限公司 | Moving method, device and equipment of intelligent vending machine and storage medium |
CN111625102A (en) * | 2020-06-03 | 2020-09-04 | 上海商汤智能科技有限公司 | Building display method and device |
CN111897436A (en) * | 2020-08-13 | 2020-11-06 | 北京未澜科技有限公司 | Hand-grabbing object grip strength prediction method based on single RGB image |
CN111931762A (en) * | 2020-09-25 | 2020-11-13 | 广州佰锐网络科技有限公司 | AI-based image recognition solution method, device and readable storage medium |
CN112767357A (en) * | 2021-01-20 | 2021-05-07 | 沈阳建筑大学 | Yolov 4-based concrete structure disease detection method |
CN113191403A (en) * | 2021-04-16 | 2021-07-30 | 上海戏剧学院 | Generation and display system of theater dynamic poster |
CN113301356A (en) * | 2020-07-14 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Method and device for controlling video display |
CN115119004A (en) * | 2019-05-13 | 2022-09-27 | 阿里巴巴集团控股有限公司 | Data processing method, information display method, device, server and terminal equipment |
WO2022247650A1 (en) * | 2021-05-28 | 2022-12-01 | 北京字节跳动网络技术有限公司 | Gesture-based interaction method and device, and client |
CN116030411A (en) * | 2022-12-28 | 2023-04-28 | 宁波星巡智能科技有限公司 | Human privacy shielding method, device and equipment based on gesture recognition |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030063133A1 (en) * | 2001-09-28 | 2003-04-03 | Fuji Xerox Co., Ltd. | Systems and methods for providing a spatially indexed panoramic video |
CN102455898A (en) * | 2010-10-29 | 2012-05-16 | 张明 | Cartoon expression based auxiliary entertainment system for video chatting |
CN105451029A (en) * | 2015-12-02 | 2016-03-30 | 广州华多网络科技有限公司 | Video image processing method and device |
CN105728878A (en) * | 2016-04-27 | 2016-07-06 | 昆山星锐普思电子科技有限公司 | Vacuum heating brazing equipment on basis of intermediate-frequency power eddy current magnetic fields |
-
2016
- 2016-08-19 CN CN201610694510.2A patent/CN107340852A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030063133A1 (en) * | 2001-09-28 | 2003-04-03 | Fuji Xerox Co., Ltd. | Systems and methods for providing a spatially indexed panoramic video |
CN102455898A (en) * | 2010-10-29 | 2012-05-16 | 张明 | Cartoon expression based auxiliary entertainment system for video chatting |
CN105451029A (en) * | 2015-12-02 | 2016-03-30 | 广州华多网络科技有限公司 | Video image processing method and device |
CN105728878A (en) * | 2016-04-27 | 2016-07-06 | 昆山星锐普思电子科技有限公司 | Vacuum heating brazing equipment on basis of intermediate-frequency power eddy current magnetic fields |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108707A (en) * | 2017-12-29 | 2018-06-01 | 北京奇虎科技有限公司 | Gesture processing method and processing device based on video data, computing device |
CN108614995A (en) * | 2018-03-27 | 2018-10-02 | 深圳市智能机器人研究院 | Gesture data collection acquisition method, gesture identification method and device for YOLO networks |
CN108932053A (en) * | 2018-05-21 | 2018-12-04 | 腾讯科技(深圳)有限公司 | Drawing practice, device, storage medium and computer equipment based on gesture |
CN108921081B (en) * | 2018-06-27 | 2020-10-09 | 百度在线网络技术(北京)有限公司 | User operation detection method and device |
CN108921081A (en) * | 2018-06-27 | 2018-11-30 | 百度在线网络技术(北京)有限公司 | The detection method and device of user's operation |
CN109327760A (en) * | 2018-08-13 | 2019-02-12 | 北京中科睿芯科技有限公司 | A kind of intelligent sound and its control method for playing back |
CN109327760B (en) * | 2018-08-13 | 2019-12-31 | 北京中科睿芯科技有限公司 | Intelligent sound box and playing control method thereof |
CN110879946A (en) * | 2018-09-05 | 2020-03-13 | 武汉斗鱼网络科技有限公司 | Method, storage medium, device and system for combining gesture with AR special effect |
CN110971924A (en) * | 2018-09-30 | 2020-04-07 | 武汉斗鱼网络科技有限公司 | Method, device, storage medium and system for beautifying in live broadcast process |
CN109492577B (en) * | 2018-11-08 | 2020-09-18 | 北京奇艺世纪科技有限公司 | Gesture recognition method and device and electronic equipment |
CN109492577A (en) * | 2018-11-08 | 2019-03-19 | 北京奇艺世纪科技有限公司 | A kind of gesture identification method, device and electronic equipment |
CN109614953A (en) * | 2018-12-27 | 2019-04-12 | 华勤通讯技术有限公司 | A kind of control method based on image recognition, mobile unit and storage medium |
CN109799905B (en) * | 2018-12-28 | 2022-05-17 | 深圳云天励飞技术有限公司 | Hand tracking method and advertising machine |
CN109799905A (en) * | 2018-12-28 | 2019-05-24 | 深圳云天励飞技术有限公司 | A kind of hand tracking and advertisement machine |
CN115119004A (en) * | 2019-05-13 | 2022-09-27 | 阿里巴巴集团控股有限公司 | Data processing method, information display method, device, server and terminal equipment |
CN115119004B (en) * | 2019-05-13 | 2024-03-29 | 阿里巴巴集团控股有限公司 | Data processing method, information display device, server and terminal equipment |
CN110287891A (en) * | 2019-06-26 | 2019-09-27 | 北京字节跳动网络技术有限公司 | Gestural control method, device and electronic equipment based on human body key point |
CN110442238A (en) * | 2019-07-31 | 2019-11-12 | 腾讯科技(深圳)有限公司 | A kind of method and device of determining dynamic effect |
CN110942005A (en) * | 2019-11-21 | 2020-03-31 | 网易(杭州)网络有限公司 | Object recognition method and device |
CN111078011A (en) * | 2019-12-11 | 2020-04-28 | 网易(杭州)网络有限公司 | Gesture control method and device, computer readable storage medium and electronic equipment |
CN111341013A (en) * | 2020-02-10 | 2020-06-26 | 北京每日优鲜电子商务有限公司 | Moving method, device and equipment of intelligent vending machine and storage medium |
CN111625102A (en) * | 2020-06-03 | 2020-09-04 | 上海商汤智能科技有限公司 | Building display method and device |
CN113301356A (en) * | 2020-07-14 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Method and device for controlling video display |
CN111897436A (en) * | 2020-08-13 | 2020-11-06 | 北京未澜科技有限公司 | Hand-grabbing object grip strength prediction method based on single RGB image |
CN111931762A (en) * | 2020-09-25 | 2020-11-13 | 广州佰锐网络科技有限公司 | AI-based image recognition solution method, device and readable storage medium |
CN112767357A (en) * | 2021-01-20 | 2021-05-07 | 沈阳建筑大学 | Yolov 4-based concrete structure disease detection method |
CN113191403A (en) * | 2021-04-16 | 2021-07-30 | 上海戏剧学院 | Generation and display system of theater dynamic poster |
WO2022247650A1 (en) * | 2021-05-28 | 2022-12-01 | 北京字节跳动网络技术有限公司 | Gesture-based interaction method and device, and client |
CN116030411A (en) * | 2022-12-28 | 2023-04-28 | 宁波星巡智能科技有限公司 | Human privacy shielding method, device and equipment based on gesture recognition |
CN116030411B (en) * | 2022-12-28 | 2023-08-18 | 宁波星巡智能科技有限公司 | Human privacy shielding method, device and equipment based on gesture recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107340852A (en) | Gestural control method, device and terminal device | |
CN107343211B (en) | Method of video image processing, device and terminal device | |
CN107341434A (en) | Processing method, device and the terminal device of video image | |
US10748324B2 (en) | Generating stylized-stroke images from source images utilizing style-transfer-neural networks with non-photorealistic-rendering | |
CN107341435A (en) | Processing method, device and the terminal device of video image | |
WO2018033156A1 (en) | Video image processing method, device, and electronic apparatus | |
US8265351B2 (en) | Method, system and computer program product for automatic and semi-automatic modification of digital images of faces | |
US11551337B2 (en) | Boundary-aware object removal and content fill | |
CN107347166B (en) | Video image processing method and device and terminal equipment | |
US8660319B2 (en) | Method, system and computer program product for automatic and semi-automatic modification of digital images of faces | |
CN107343225B (en) | The method, apparatus and terminal device of business object are shown in video image | |
CN111787242B (en) | Method and apparatus for virtual fitting | |
CN109508681A (en) | The method and apparatus for generating human body critical point detection model | |
WO2018033154A1 (en) | Gesture control method, device, and electronic apparatus | |
CN108229325A (en) | Method for detecting human face and system, electronic equipment, program and medium | |
US20120299945A1 (en) | Method, system and computer program product for automatic and semi-automatic modificatoin of digital images of faces | |
CN108109010A (en) | A kind of intelligence AR advertisement machines | |
CN107341436B (en) | Gestures detection network training, gestures detection and control method, system and terminal | |
CN109740571A (en) | The method of Image Acquisition, the method, apparatus of image procossing and electronic equipment | |
US8019182B1 (en) | Digital image modification using pyramid vignettes | |
CN107343220A (en) | Data processing method, device and terminal device | |
WO2022089166A1 (en) | Facial image processing method and apparatus, facial image display method and apparatus, and device | |
CN104852892B (en) | A kind of autonomous login method of novel Internet of Things web station system | |
CN111182350B (en) | Image processing method, device, terminal equipment and storage medium | |
CN104809288A (en) | Trying method or customizing nail art |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171110 |
|
RJ01 | Rejection of invention patent application after publication |