CN102081918A

CN102081918A - Video image display control method and video image display device

Info

Publication number: CN102081918A
Application number: CN 201010612804
Authority: CN
Inventors: 方伟; 赵勇; 袁誉乐; 罗卫
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Shenzhen Rui Technology Co., Ltd.
Priority date: 2010-09-28
Filing date: 2010-12-29
Publication date: 2011-06-01
Anticipated expiration: 2030-12-29
Also published as: CN102081918B

Abstract

The invention discloses a video image display control method and a video image display device, and the video image display control method comprises the following steps: collecting a scene before the display device in a real-time manner; acquiring a human body region image from a collected real-time scene image; further performing hand gesture detection on the human body region image, and determining a control command in a hand gesture database, which corresponds to a hand gesture according to detection result; and finally outputting the control command, and utilizing the video image display device to control a video image to be displayed on the display device according to the control command, thereby completing active interaction between a user and the video image, enabling the user to select required information according to interests, improving the interaction efficiency between the user and advertisement content and simultaneously bringing a new experience to the user.

Description

A kind of video image display control method and video image display

Technical field

The present invention relates to Flame Image Process and field of human-computer interaction, relate in particular to a kind of video image display control method and video image display.

Background technology

All kinds of in recent years advertising media dog-eat-dog, and digital billboard stands in the breach as a kind of brand-new advertising media.The numeral billboard is as the product under a kind of advertising media numeral development trend, it is a kind of Digital Media System of issuing various advertising messages by terminal presentation facility, having ad content dynamically throws in, the autonomous service of satisfying personalized and differentiation, specific crowd is carried out the characteristic of advertisement information play in specific place and time, thereby obtained good demonstration effect, in the market, the market application potential of the public place that converges of supermarket, hotel, medical treatment, movie theatre and other stream of peoples is very big, has wide market outlook.

Current digital billboard all is by automatic playing advertisements picture of predefined broadcast mode or video cartoon fragment, when pedestrian way is out-of-date, can only see the content that current billboard is shown, can not see own interested ad content with the wish of oneself.If wonder the advertisement content that other do not show, need stop and wait for the long time, this is a kind of passive acceptance and can't predicts the interactive mode of ad content, and people often can not be easy to obtain the useful ad content oneself wanted, and the effect of advertisement is also just had a greatly reduced quality like this.

Summary of the invention

The main technical problem to be solved in the present invention is that a kind of video image display control method and video image display are provided.The present invention has realized that the active between user and the video image is mutual, allows user oneself easily select information of interest, thus the interactive efficiency of the information of raising.

For solving the problems of the technologies described above, the technical solution used in the present invention is as follows:

A kind of video image display control method comprises step:

Real-time scene image before A, the collection display device;

B, described real-time scene image is carried out human detection, and obtain the human region image;

C, in described human region image, detect gesture;

D, determine the pairing control command of described gesture;

E, according to the demonstration of described control command control of video image on display device.

Wherein, described step B comprises: thus the current real-time scene picture frame that obtains is detected the human region image with comparing according to the reference picture of background model gained.

Further, the step of described human body area image comprises:

Current real-time scene picture frame that obtains and the reference picture according to the background model gained are carried out the subduction operation of Pixel-level, obtain difference image;

Described difference image is carried out binary conversion treatment, obtain the binaryzation difference image;

Described binaryzation difference image is carried out morphology to be handled;

To meet the predetermined binaryzation difference image that is communicated with rule and be communicated with processing, obtain connected region;

Judge whether each connected region is the noise range, if then deletion;

The area image that to be made up of all connected regions that stay at last is as the human region image, and exports described human region image.

Further, above-mentioned method also comprises step: judge whether each pixel in the current real-time scene picture frame that obtains belongs to the pixel in the detected human region, if then background model remains unchanged, otherwise update background module.

Wherein, described gesture comprises the hand shape of palm, and described step C comprises:

Described human region image is carried out the palm target detection, and obtain the palm target area image;

Described palm target area image is carried out hand-shaped characteristic to be extracted;

Hand shape sorter according to the hand-shaped characteristic of the described palm that extracts and foundation in advance carries out the identification of hand shape, judges whether the hand shape of described palm is effective hand shape;

In step D, when the hand shape of judging this palm is effective hand shape, determine the control command of this effective hand shape correspondence according to the gesture database of setting up in advance; Or

Described gesture comprises the hand shape of palm and the movement locus of palm, and described step C comprises:

Carry out the identification of hand shape according to the hand-shaped characteristic of this palm that extracts and the hand shape sorter set up in advance, judge whether the hand shape of this palm is effective hand shape, when the hand shape of this palm of judgement is effective hand shape, this palm is designated current activation palm;

Detect the movement locus of current activation palm, determine the type of sports of current activation palm;

In step D, in the gesture database of setting up in advance, determine corresponding control command according to the type of sports of this effective hand shape and current activation palm;

In step e, switch corresponding video image or current video image displayed is operated according to described control command.

Further, the step that described human region image is carried out the palm target detection comprises:

Described human region image section is carried out Face Detection, obtain and comprise people's face, arm or palm area image;

Obtain the area image of arm and/or palm according to the Face Detection model of setting up in advance;

In the area image of arm and/or palm, detect palm.

Further, the described step that detects palm in the area image of arm and/or palm comprises:

Whether the length breadth ratio of area image of judging described arm and/or palm is greater than 2, if judge that then this zone is arm and palm area image, otherwise be the palm area image;

When being judged to be arm and palm area image, described arm and palm area image are carried out rim detection, obtain marginal information, obtain region contour;

Described region contour is carried out minimum external ellipse fitting, obtain the information of described external ellipse;

According to the information of described external ellipse, obtain the directional information of described region contour, thereby finally obtain the sensing of described arm and palm;

Described arm regions image and the palm area image that obtains sensing carried out the image rectification, make arm and palm be oriented to straight up;

On described arm after the rectification and palm area image, carry out the palm detection and localization, obtain the palm target area image.

Corresponding to above-mentioned method, the present invention also provides a kind of video image display, comprising:

Camera head is used to gather the preceding real-time scene image of display device;

Human body detection device is used for described real-time scene image is carried out human detection, obtains the human region image;

Hand gesture detecting device is used for detecting gesture at described human region image;

Control command is determined device, is used for determining the pairing control command of gesture;

Image display control apparatus is used for according to the demonstration of described control command control of video image on display device.

Further, thus described human body detection device is used for the current real-time scene picture frame that obtains is detected the human region image with comparing according to the reference picture of background model gained.

Above-mentioned video image display also comprises the context update device, is used for judging whether current each pixel of real-time scene picture frame that obtains belongs to the pixel in the detected human region, if then background model remains unchanged, otherwise update background module.

Wherein, described gesture comprises the hand shape of palm, and described hand gesture detecting device comprises:

The palm detecting unit is used for described human region image is carried out the palm target detection, and obtains the palm target area image;

The hand-shaped characteristic extraction unit is used for that described palm target area image is carried out hand-shaped characteristic and extracts;

Hand shape recognition unit is used for carrying out the identification of hand shape according to the hand shape sorter of the hand-shaped characteristic of this palm that extracts and foundation in advance, judges whether the hand shape of this palm is effective hand shape;

Described control command determines that device when the hand shape of judging this palm is effective hand shape, determines the control command of this effective hand shape correspondence according to the gesture database of setting up in advance; Or

Described gesture comprises the hand shape of palm and the movement locus of palm, and described hand gesture detecting device comprises:

The hand-shaped characteristic extraction unit carries out hand-shaped characteristic to described palm target area image and extracts;

Hand shape recognition unit, hand shape sorter according to the hand-shaped characteristic of this palm that extracts and foundation in advance carries out the identification of hand shape, whether the hand shape of judging this palm is effective hand shape, when the hand shape of judging this palm is effective hand shape, this palm is designated current activation palm;

The palm tracking cell is used to detect the movement locus of current activation palm, determines the type of sports of current activation palm;

Described control command determines that device is used for the gesture database of setting up in advance, determines corresponding control command according to the type of sports of this effective hand shape and current activation palm;

Described image display control apparatus switches corresponding video image according to described control command or current video image displayed is operated.

The invention has the beneficial effects as follows:

Video image display control method of the present invention and video image display, by the scene before the video image display is gathered, and extraction human region image wherein, from this human region image, extract the corresponding gesture of user again, thereby determine its control commands corresponding according to this gesture, the video image display is controlled corresponding video image according to this control command again and is shown, thereby the active of having finished between user and the video image is mutual.The user can optionally check according to own interested content on one's own initiative by method and apparatus of the present invention.The technical solution used in the present invention makes and has realized initiatively alternately between user and the device, improved the interactive efficiency between video image and the user, thereby the effect of publicity that has improved video image self is brought complete new experience to the user simultaneously.

Description of drawings

Fig. 1 is the block diagram of a kind of embodiment of video image display of the present invention;

Fig. 2 is the block diagram of the another kind of embodiment of video image display of the present invention;

Fig. 3 a is the block diagram of a kind of embodiment of hand gesture detecting device among Fig. 1;

Fig. 3 b is the block diagram of the another kind of embodiment of hand gesture detecting device among Fig. 1;

Fig. 4 is the synoptic diagram of a kind of embodiment of palm detecting unit among Fig. 1;

Fig. 5 is the process flow diagram of a kind of embodiment of video image display control method of the present invention;

Fig. 6 is for obtaining the process flow diagram of human region image among Fig. 5;

Fig. 7 is for obtaining the process flow diagram of difference image among Fig. 6;

Fig. 8 is the process flow diagram of regional connectivity analysis among Fig. 6;

Fig. 9 is the process flow diagram of update background module among Fig. 7;

Figure 10 is the process flow diagram of gestures detection among Fig. 5;

Figure 11 is the process flow diagram that obtains of palm target area among Figure 10;

The process flow diagram that Figure 12 locatees and obtains for palm among Figure 11;

Figure 13 is a process flow diagram of determining the palm type of sports among Figure 11;

Figure 14 a, Figure 14 b, Figure 14 c, Figure 14 d, Figure 14 e and Figure 14 f are corresponding to the location of Figure 12 palm and the synoptic diagram of a kind of embodiment that obtains;

Figure 15 a, Figure 15 b, Figure 15 c, Figure 15 d, Figure 15 e, Figure 15 f, Figure 15 g, Figure 15 h and Figure 15 i are the synoptic diagram of a kind of embodiment that activates the type of sports classification of palm among Figure 13;

Figure 16 is a synoptic diagram of determining a kind of embodiment of control command among Fig. 6.

Embodiment

In conjunction with the accompanying drawings the present invention is described in further detail below by embodiment.

In recent years, computer vision technique has developed to such an extent that reach its maturity and has been used widely in a lot of fields, under this background, come thereby the hand shape and gesture of human body is discerned the action behavior of understanding and explaining the people by computer vision, and then finish between humans and machines also become possibility alternately, the present invention promptly is based on the video image display control method and the video image display of this computer vision technique.

Please refer to Fig. 1, a kind of embodiment of a kind of video image display of the present invention, comprise: camera head 1, human body detection device 2, hand gesture detecting device 3, control command are determined device 4 and image display control apparatus 5, wherein camera head 1 links to each other with human body detection device 2, this human body detection device 2 links to each other with hand gesture detecting device 3, this hand gesture detecting device 3 determines that with control command device 4 links to each other, and control command determines that device 4 links to each other with image display control apparatus 5.Wherein, camera head 1 is used for the real-time scene image before the images acquired display control unit 5, and sends to human body detection device 2; Human body detection device 2 is used for the real-time scene image that receives is carried out human detection, obtains the human region image, and sends to hand gesture detecting device 3; Hand gesture detecting device 3 is used for the human region image that receives is carried out gestures detection, and this gesture is sent to control command determines device 4; Control command determines that device 4 is used for determining corresponding control command according to the gesture that receives, and this control command is sent to image display control apparatus 5; Image display control apparatus 5 is used for according to this demonstration of control command control of video image on display device.

Please refer to Fig. 2, among the another kind of embodiment of the present invention, this video image display also comprises: the context update device 6 that links to each other with human body detection device 2, be used for judging whether current each pixel of real-time scene picture frame that obtains belongs to the pixel in the detected human region image, if then background model remains unchanged, otherwise update background module.

Please refer to Fig. 3 a, among a kind of embodiment of the present invention, when the gesture of hand gesture detecting device 3 detections comprises the hand shape of palm, hand gesture detecting device 3 comprises: palm detecting unit 31, hand-shaped characteristic extraction unit 32 and hand shape recognition unit 33, this palm detecting unit 31 links to each other with hand-shaped characteristic extraction unit 32, the human region image that is used for human body detection device 2 is obtained carries out the palm target detection, and obtains the palm target image, sends to hand-shaped characteristic extraction unit 32 again; Hand-shaped characteristic extraction unit 32 links to each other with hand shape recognition unit 33, is used for that the palm target image that receives is carried out hand-shaped characteristic and extracts, and send to hand shape recognition unit 33; Hand shape recognition unit 33 determines that with control command device 4 links to each other, hand shape sorter according to the hand shape of the palm that receives and foundation in advance carries out the identification of hand shape, whether the hand shape of judging this palm is effective hand shape, if effectively, then control command determines that device 4 determines the pairing control command of this efficient database according to the gesture database of setting up in advance.5 of image display control apparatus switch corresponding video image according to this control command or current video image displayed are operated, current video image displayed can be the video image that switched without the user, also can be the video image after just having switched according to user's gesture.

Please refer to Fig. 3 b, among the another kind of embodiment of the present invention, when the gesture that detects when hand gesture detecting device 3 comprised the movement locus of the hand shape of palm and palm, this hand shape pick-up unit 3 comprised: palm detecting unit 31, hand-shaped characteristic extraction unit 32, hand shape recognition unit 33 and the palm tracking cell 34 that links to each other with hand shape recognition unit 33.When hand shape recognition unit 33 judges that the hand shape of palms is effective hand shape, then this palm is designated the activation palm, and sends to palm tracking cell 34 and control command is determined device 4; Palm tracking cell 34 determines that with control command device 4 links to each other, and is used to detect the movement locus of the activation palm of reception, and determines the type of sports of current activation palm; Type of sports and effective hand shape that control command is determined 4 these current activation palms of basis of device are determined corresponding control command in the gesture database of setting up in advance.5 of image display control apparatus switch corresponding video image according to this control command or current video image displayed are operated.

Please refer to Fig. 4, among a kind of embodiment of the present invention, palm detecting unit 31 comprises Face Detection module 311, people's face detection module 312 and palm target acquisition module 313, wherein Face Detection module 311 links to each other with people's face detection module 312, the human region image that is obtained according to the human body complexion feature detection, and extract people's face, palm and/or arm regions; People's face detection module 312 links to each other with palm target acquisition module 313, be used for human face region being detected from the zone that has obtained, and testing result sent to palm target acquisition module 313, palm target acquisition module 313 is deleted human face region according to testing result, and obtains the palm target area image.

Please refer to Fig. 4, when the gesture that detects when hand gesture detecting device 3 comprises the movement locus of the hand shape of palm and palm, this palm target acquisition module 313 comprises that palm recognin module 3131 and coupled palm obtain submodule 3132, this palm area recognin module 3131 is used for from the palm and/or the arm regions of deleting human face region, judge whether this zone only comprises the zone of palm, in this way, then discern the palm target area image, otherwise this area image is identified as palm and arm regions image, and send it to palm and obtain submodule 3132, obtain submodule 3132 by palm and from this palm and arm regions image, obtain the palm target area image.

Among the another kind of embodiment of the present invention, palm detecting unit 31 also comprises the palm target correcting module 314 that links to each other with palm target acquisition module 313, be used for the palm target area image that palm target acquisition module 313 obtains is carried out the regional connectivity analysis, thereby obtain complete palm target area image.

Based on above video image display, the present invention proposes a kind of video image display control method.Below in conjunction with the drawings and specific embodiments this method is described in detail.

Please refer to Fig. 5, a kind of video image display control method comprises step:

Real-time scene image before S1, the collection display device.

S2, this real-time scene image is carried out human detection, and obtain the human region image.

S3, in this human region image, detect gesture.

S4, determine the pairing control command of this gesture.

S5, according to this demonstration of control command control of video image on display device.

In an embodiment of the present invention, when collecting a two field picture, also this image is carried out buffer memory, so also comprises after gathering the real-time scene image in the present embodiment: step S6 with the real-time scene image buffer storage gathered in the frame data buffer zone.

In order to carry out better controlled to view data, thereby guarantee the smoothness of data acquisition and processing (DAP), frame data buffer zone in the present embodiment has adopted the double buffering queue technology of video flowing, takes out buffer zone separately thereby deposit frame image data in buffer zone and data.

In an embodiment of the present invention,, need carry out pre-service, comprise step the real-time scene image of gathering in order to obtain more accurate image:

S7, the color space of the real-time scene image gathered is transformed into HSV from RGB.

For the ease of the human detection among the step S2, again because the colour of skin is quite concentrated in the distribution of color space, but can be subjected to throwing light on and the very big influence of ethnic group, influenced by illumination intensity in order to reduce the colour of skin, therefore, in the present embodiment, the real-time scene image is carried out color space conversion certain color space to brightness and chrominance separation, abandon luminance component then.

Because the HSV space be with color tone (Hue, H), saturation degree (Saturation, S) and brightness (Value, V) three elements are represented, belong to non-linear color representation system.HSV color representation method is with the perception unanimity of people to color, and in the HSV space, the people is more even to the perception of color, therefore, the HSV space is suitable for the color space of human vision property, rgb space is converted to HSV after, make message structure compact more, the independence of each component strengthens, and colouring information is lost few.Therefore, adopt the hsv color space in the present embodiment.

Certainly the color space model in the present embodiment can also be other color spaces, for example YCbCr etc.

Rgb space is as follows to the transformational relation in HSV space, establish R, G, B between [0,1]:

V＝Max(R，G，B)

S8, the image that will carry out after the color space conversion have carried out denoising, adopt the mode of medium filtering that this image is carried out denoising in the present embodiment

Owing to have noise etc. in the real-time scene image that step S1 gathers, therefore,, need this image is carried out denoising for the image that better obtains.

Please refer to Fig. 6, in an embodiment of the present invention, human detection among the step S2, and obtain the human region image and comprise step:

S21, with current real-time scene picture frame that obtains and the subduction operation of carrying out Pixel-level according to the reference picture of background model gained, obtain difference image.

S22, this difference image is carried out binary conversion treatment, obtain the binaryzation difference image.

S23, this binaryzation difference image is carried out morphology handle.

In some cases, the direction of taking as video camera has comprised some black holes and noise spot in the preliminary difference binary image that obtains during with human motion direction basically identical, and the difference binary image that therefore needs tentatively to obtain is done the morphology processing.

In an embodiment of the present invention, step S23 morphology is handled and is comprised: adopt the corrosion operation to remove this to noise spot isolated in the binaryzation difference image, adopt expansive working to fill hollow sectors in this binaryzation difference image.Wherein, the structural element of corrosion operation and expansive working is got length and width and is respectively 3 decussate texture element.

S24, will meet the predetermined binaryzation difference image that is communicated with rule and be communicated with processing, thereby obtain connected region.

Owing to carry out having comprised some scattered zone or pixels in the image after the binary conversion treatment, therefore, need be communicated with processing by the image that the regional connectivity analysis will meet pre-defined rule.In the present embodiment, the predetermined rule that is communicated with adopts 8-to be communicated with rule among the step S24, and this pre-defined rule can also be that other are communicated with rule certainly, and for example 4-is communicated with rule.

S25, judge that whether the area interior pixel number summation of each connected region is less than setting threshold, then this connected region is considered as the noise range in this way, and delete this connected region, the area image that all connected regions that then stay are at last formed is the human region image, exports this human region image.Wherein setting threshold can rule of thumb be provided with.

Owing in directly detecting the gesture process, tend to exist noise in the palm area image of extraction, and these noises are very near palm, thereby influence the judgement to gesture.In order to obtain more accurate gesture, the present invention has adopted and has at first carried out human detection, detects gesture again, thus in carrying out the human detection process with noise remove, make that detected gesture is more accurate.

Because human body may be in constant motion, also changing with respect to its background image of scene image of gathering each time, in order to obtain accurate more background image, just background model need be upgraded.

Therefore, in another embodiment of the invention, step S2 also comprises step:

S26, judge in the current real-time scene image that obtains whether a pixel belongs to the pixel in the detected human region, and then background model remains unchanged in this way, otherwise update background module.

Please refer to Fig. 7, among a kind of embodiment of the present invention, step S21 comprises step:

S211, obtain pretreated image.

S212, judge whether the current background model is set up, execution in step S213 then in this way, otherwise execution in step S214.

S213, with the current pretreated two field picture f that obtains _k(x, y) in the pixel value of each pixel, and corresponding to background reference image b according to the background model gained _k(x, y) in the pixel value of each pixel reduce operation, obtain difference image D _k(x y), then has D _k(x, y)=| f _k(x, y)-b _k(x, y) |.

S214, set up Model B=[μ, δ that a single Gaussian distribution of usefulness is represented for each pixel ²], wherein μ is an average, δ ²Be variance.

S215, output difference image.

In an embodiment of the present invention, among the step S22 difference image being carried out binary conversion treatment is:

S221, set in advance an image segmentation threshold value T=k δ, every the pixel value and the predetermined threshold value of difference image compared, predetermined threshold value can rule of thumb be provided with, or calculates according to existing adaptive algorithm.Threshold value T is made as 3 times of sizes of the standard deviation of current pixel point pixel value in the present embodiment.

S222, pixel value and this segmentation threshold T of each pixel in the difference image compared, and this difference image is cut apart, thereby obtain the binaryzation difference image according to comparative result

M_{k} (x, y) = \{\begin{matrix} 1 & foreground & D_{k} (x, y) > T \\ 0 & background & otherwise \end{matrix} .

The pixel value that adopts current pixel point in the present embodiment is greater than this threshold value T, and then its pixel value is set to 1; The pixel value of current pixel point is smaller or equal to this threshold value T, and then its pixel value is set to 0, thereby difference image has been carried out binaryzation, promptly obtains the binaryzation difference image.

Certainly, can pixel value be set to 0 greater than the pixel of threshold value in the present embodiment, pixel value is set to 1 smaller or equal to the pixel of threshold value and also is fine.

Please refer to Fig. 8, among a kind of embodiment of the present invention, among the step S24 binaryzation difference image carried out the connected region analysis and comprise step:

S241, according to from top to bottom, order from left to right scans current binaryzation difference image.

S242, judge whether current pixel point is the foreground point, in this way, then it is labeled as a new ID, otherwise execution in step S241.

The foreground point here is the pixel that is changed by the caused pixel value of the appearance of human motion corresponding in the current real scene.

S243, judge whether the pixel on the 8-communication direction of this foreground point is the foreground point, in this way, then it is labeled as identical ID, and adds stacked Stack.

S244, judged above-mentioned 8 pixels after, check whether stack is empty, and if not empty then ejects stack top element, if sky then finishes scanning and execution in step S246.

S245, the 8-above the pixel that ejects continued are communicated with and judge, constantly repeat top process, be sky until stack, have obtained having the foreground area of identical ID.

S246, behind the entire image end of scan, just obtained all connected regions, and each connected region all has unique sign ID.

Please refer to Fig. 9, the step S26 update background module in the present embodiment comprises step:

S261, obtain foreground mask, promptly obtain pixel value and be 1 pixel.

S262, judge whether this pixel is the pixel that belongs in the human region that obtains among the step S26, in this way, execution in step S263 then, otherwise execution in step S264.

The parameter constant of the statistical model of S263, maintenance background pixel point.Establishing current frame image in the present embodiment is I _i, α is a learning rate, and μ is an average, and δ is a standard deviation, and its context update formula is:

μ _i+1＝μ _i

{δ_{i + 1}}^{2} = {δ_{i}}^{2} .

S264, the parameter of the statistical model of background pixel point is upgraded the more new formula of then having powerful connections:

μ _i+1＝(1-α)μ _i+αI _i

{δ_{i + 1}}^{2} = (1 - α) {δ_{i}}^{2} + α {(I_{i} - μ_{i})}^{2},

Wherein, learning rate α can be made as 0.002 in the present embodiment, can certainly be made as other values.

Please refer to Figure 10, in an embodiment of the present invention, when the gesture among the step S3 comprised the hand shape of palm, step S3 comprised:

S31, carry out the palm target detection, and obtain the palm target area image obtaining human region.

S32, this palm target area image is carried out hand-shaped characteristic extract.

S33, according to the hand-shaped characteristic of this extraction and the hand shape sorter of setting up in advance carry out the identification of hand shape, judge whether the hand shape of this palm is effective hand shape, then execution in step S4.

Please refer to Figure 11, among a kind of embodiment of the present invention, carry out the palm target detection among the step S31 and obtain the palm target area image comprising step:

S311, the human region image that obtains is carried out Face Detection, obtain and comprise people's face, palm or arm regions image.

Because the tone of human body skin is distributed in certain scope, can people's face and arm palm portion be extracted from human region by features of skin colors.

Because the colour of skin is quite concentrated in the distribution of color space, but can be subjected to throwing light on and the very big influence of ethnic group, influenced by illumination intensity in order to reduce the colour of skin, therefore, present embodiment has carried out the color of image space conversion with scene image in step S6 be HSV, thereby with brightness and chrominance separation.Simultaneously, for avoiding the influence that brightness in the same camera lens changes and other brightness that cause change, thereby in the present embodiment, abandon luminance component when carrying out Face Detection in step S311, the H component of only selecting image is as detecting foundation.

Cut apart skin pixel according to colour of skin cluster on the H component again, promptly make the threshold value in HSV space, and carry out cutting apart of area of skin color, thereby people's face, palm and/or arm regions are distinguished according to this threshold value according to statistical study.

S312, from the above-mentioned zone image, choose a zone.

S313, according to the faceform who sets up in advance people's face is carried out in this zone and detect, as detect people's face and then this zone is abandoned, and execution in step S314, otherwise export this palm and/or arm regions image, and execution in step S315.

S314, judge whether to be still waiting surveyed area, in this way, execution in step S313 then, otherwise end operation.

If S315 judges the length breadth ratio of this area image and is not more than 2, judge that then this area image is the palm target area image, and execution in step S317; Otherwise judge that this zone is palm and arm regions image, and execution in step S316.

S316, employing palm location algorithm position the palm in this palm and the arm regions, and obtain this palm area.

In order to obtain complete palm area image, among a kind of embodiment of the present invention, also comprise among the step S315:

When being judged to be the palm area image, execution in step S318 then: this palm area image is carried out the regional connectivity analysis, thereby obtain complete palm area image, execution in step S317 again;

When being judged to be palm and arm regions image, execution in step S318 before execution in step S316 promptly carries out the regional connectivity analysis to this palm and arm regions image, thereby obtains complete palm and arm regions image.

In the present embodiment, this connected component analysis adopts 8 to be communicated with rules and to be communicated with processing: judge on the seed points coordinate in the primitive frame image pixel and on every side the value of the H component of 8 neighbor pixels whether less than setting threshold, in this way, then be regarded as belonging to same class pixel, join in the connected region, obtain complete palm and/or arm regions image.

Adopt people's face to detect in this example and deleted human face region.Wherein people's face detects and comprises two kinds of methods:

One is based on the method for detecting human face of knowledge: by detecting the position of different people face portion feature, locate people's face according to some knowledge rules then, because always there is certain rules in the distribution of the local feature of people's face, for example eyes always are being symmetrically distributed in people's half part etc. on the face, so can utilize one group to describe rule that people's face local feature distributes and carry out that people's face detects, and from bottom to top two kinds detect strategy from top to bottom.

Two are based on the method for presentation: because people's face has the unified structure pattern, and the realization of sorter can adopt different strategies, as adopting neural network method and traditional statistical method etc.Therefore, at first by study, on the basis of a large amount of training sample sets, set up the sorter that an energy is correctly discerned people's face and non-face sample, detected image is carried out whole scan then, detect the image window that scans with sorter and whether comprise people's face, if have, then provide the position at people's face place.

In an embodiment of the present invention, people's face detects the method that has adopted based on presentation, comprising: S313a, a large amount of facial image samples of off-line collection; S313b, extract the multidimensional characteristic vectors of people's face again, and adopt PCA method (Principal Component Analysis, principal component analysis (PCA)) dimensionality reduction; This proper vector that S313c, utilization are extracted is trained neural network and is obtained people's face sorter; S313d, in people's face sorter, this human region image is carried out people's face according to above-mentioned proper vector again and detect; S313e, be people's face,, thereby obtain palm and/or arm regions image then with the human face region deletion as detecting.

Please refer to Figure 12, among a kind of embodiment of the present invention, carry out the palm location among the step S316 and obtain comprising step:

S316a, employing Canny operator carry out rim detection to this palm and arm regions image, obtain marginal information, obtain region contour, shown in Figure 14 a.

S316b, this region contour is carried out minimum external ellipse fitting, obtains this external oval information, comprising: major axis, minor axis, with the angle angle of transverse axis, shown in Figure 14 b.

S316c, according to this external long axis of ellipse with obtain the directional information of this region contour with the angle angle of transverse axis, thereby finally obtain wherein arm and the sensing of palm, shown in Figure 14 c.

S316d, by the image geometry space coordinate transformation image is carried out in this zone that has obtained to point to and correct, make arm and palm be oriented to straight up, shown in Figure 14 d.

S316e, arm and palm area after correcting are carried out the palm detection and localization, and obtain the palm target area image.

Shown in Figure 14 e and Figure 14 f, adopt the palm location algorithm that palm is positioned in the present embodiment, be specially: the edge pixel of this palm and arm regions is carried out projection operation on the vertical direction, find palm place end; Again all pixels of this palm and arm regions are carried out projection operation on the vertical direction, and begin to seek peak point on projection coordinate's axle from palm place end; With the valley point that occurs behind this peak point cut-point as arm and palm; According to this cut-point this palm and arm regions are carried out cutting apart on the vertical direction, obtain palm portion, promptly obtain the palm target area image thereby remove arm.

S317, output palm target area image, and execution in step S32 carries out the hand-shaped characteristic extraction to this palm target area image.

Please refer to Figure 10, in an embodiment of the present invention, when if the gesture among the step S3 comprises the hand shape of palm, after then carrying out the hand-shaped characteristic extraction, step S33 comprises: the hand shape sorter according to hand-shaped characteristic that extracts and foundation in advance carries out the identification of hand shape, whether the hand shape of judging this palm is effective, as execution in step S4 effectively then: determine the control command of this effective hand shape correspondence according to the gesture database of setting up in advance, otherwise abandon this hand shape.

Please refer to Figure 13, in another embodiment of the invention, when if the gesture among the step S3 comprises the movement locus of the hand shape of palm and palm, after carrying out the hand-shaped characteristic extraction, step S33 also comprises: this palm is designated the activation palm, and follow the tracks of the movement locus of current activation palm, to determine the type of sports of current activation palm.

When the hand shape of judging current palm is effective, execution in step S4 then: in the gesture database of setting up in advance, determine control commands corresponding according to the type of sports of current activation palm.

Last execution in step S5: switch corresponding video image or current video image is operated according to the control command of determining.

Among a kind of embodiment of the present invention, gesture is divided into static and motion, when be static gesture, then obtains control commands corresponding according to effective hand shape; Determine its type of sports when gesture for the elder generation that then needs that moves, type of sports and/or the effective hand shape according to palm obtained control commands corresponding then.Wherein, motion has comprised again upwards, has waited downwards, left or to the right.

Effective hand shape: N1, left the five fingers palm, right the five fingers palm are shown in Figure 15 c; N2, left the five fingers palm, right fist are shown in Figure 15 d; N3, left the five fingers palm, right refer to palm, shown in Figure 15 e; N4, the first from left refer to palm, right the five fingers palm, shown in Figure 15 f.

Type of sports: motion left comprises that M1, single the five fingers palm are moved to the left, shown in Figure 15 b; Move right and comprise that M2, single the five fingers palm move right, shown in Figure 15 a; Move right left and comprise that M3, left the five fingers palm are moved to the left, right the five fingers palm moves right, and shown in Figure 15 g, and M4, left the five fingers palm move right, and right the five fingers palm is moved to the left, shown in Figure 15 h; Be association of activity and inertia: NAM, left the five fingers palm transfixion, right refer to that palm moves, shown in Figure 15 i.

Certainly type of sports can also be other in the present embodiment.

As shown in figure 13, among a kind of embodiment of the present invention, wherein the foundation of hand shape sorter and training comprise: a large amount of hand shape image sample sets of off-line collection; Extract hand-shaped characteristic wherein; Utilize the hand-shaped characteristic that obtains neural network to be trained the sorter that obtains hand shape again.

In the present embodiment, each above-mentioned sample set is the image template of the different hand shape of representative; Above-mentioned hand-shaped characteristic comprises: hand shape profile, hand shape curvature, hand shape girth, hand shape area, hand shape convex-concave degree, the projection of hand shape edge-perpendicular, hand shape edge horizontal projection.Certainly the hand-shaped characteristic in the present embodiment can also be other feature.Neural network in the present embodiment has adopted the three-layer neural network model, can certainly use other neural network models.

Please refer to Figure 16, among a kind of embodiment of the present invention, step S4 determines that control command comprises:

The type of sports of the activation palm of S41, obtaining step S3 sign.

S42, activate the type of sports of palm, in the hand shape and gesture database of setting up in advance, search corresponding gesture,, then obtain and the corresponding order of this gesture, otherwise do not do any action if in this database, successfully find corresponding gesture according to this.This order comprises the operation that this gesture will be finished and the object of operation.

S43, judge that this operand is video cartoon file or picture file, if the video cartoon file, if execution in step S44 then is image file execution in step S45 then.

S44, understanding are also explained this gesture, and the output control commands corresponding, for example:

When if the gesture of current activation palm is M1, then it plays a last video cartoon file corresponding to the control command in the gesture database for switching to, and the output control commands corresponding;

When if the gesture of current activation palm is M2, then this gesture is interpreted as and plays next video cartoon file, and the output control commands corresponding;

When if the gesture of current activation palm is N1, then this gesture is interpreted as and plays the current video animation file, and the output control commands corresponding;

When if the gesture of current activation palm is N2, then this gesture is interpreted as to suspend and plays the current video animation file, and the output control commands corresponding;

If when the gesture of current activation palm was N3, then this gesture was interpreted as fast-forward play current video animation file, and the output control commands corresponding;

If when the gesture of current activation palm was N4, then this gesture was interpreted as the fast reverse play current video image, and the output control commands corresponding.

S45, understanding are also explained this gesture, and the output control commands corresponding, for example:

When if the gesture of current activation palm is M1, then this gesture is interpreted as and shows a last pictures, the output control signal corresponding;

When if the gesture of current activation palm is M2, then this gesture is interpreted as and shows next pictures, the output control commands corresponding;

If when the gesture of current activation palm was M3, then this gesture was interpreted as the amplification picture, the output control commands corresponding;

If when the gesture of current activation palm was M4, then this gesture was interpreted as and dwindles picture, the output control commands corresponding;

If when the gesture of current activation palm was NAM, then this gesture was interpreted as mobile picture, the output control commands corresponding.

By video image display control method of the present invention, the user only need make corresponding gesture, comprise gesture static or motion, show with the video image of need selecting, perhaps current display video image is operated, make and realized between user and the video image display initiatively alternately, improved the interactive efficiency between video image and the user.

Above-mentioned a kind of video image display control method can be used for the demonstration of video ads picture or animation, also can be used for the demonstration of other picture or animation.

Above content be in conjunction with concrete embodiment to further describing that the present invention did, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims

1. a video image display control method is characterized in that, comprises step:

Real-time scene image before A, the collection display device;

C, in described human region image, detect gesture;

D, determine the pairing control command of described gesture;

2. the method for claim 1 is characterized in that, described step B comprises: thus the current real-time scene picture frame that obtains is detected the human region image with comparing according to the reference picture of background model gained.

3. method as claimed in claim 2 is characterized in that, the step of described human body area image comprises:

Judge whether each connected region is the noise range, if then deletion;

4. as claim 2 or 3 described methods, it is characterized in that, also comprise step: judge whether each pixel in the current real-time scene picture frame that obtains belongs to the pixel in the detected human region, if then background model remains unchanged, otherwise update background module.

5. as each described method in the claim 1 to 4, it is characterized in that described gesture comprises the hand shape of palm, described step C comprises:

6. method as claimed in claim 5 is characterized in that, the step that described human region image is carried out the palm target detection comprises:

In the area image of arm and/or palm, detect palm.

7. method as claimed in claim 6 is characterized in that, the described step that detects palm in the area image of arm and/or palm comprises:

8. video image display is characterized in that comprising:

9. video image display as claimed in claim 8 is characterized in that, thereby described human body detection device is used for the current real-time scene picture frame that obtains is detected the human region image with comparing according to the reference picture of background model gained.

10. video image display as claimed in claim 8 or 9, it is characterized in that also comprising the context update device, it is used for judging whether current each pixel of real-time scene picture frame that obtains belongs to the pixel in the detected human region, if then background model remains unchanged, otherwise update background module.

11. as each described video image display in the claim 8 to 10, it is characterized in that described gesture comprises the hand shape of palm, described hand gesture detecting device comprises: