CN104504378A

CN104504378A - Method and device for detecting image information

Info

Publication number: CN104504378A
Application number: CN201410838292.6A
Authority: CN
Inventors: 王涛
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2014-12-29
Filing date: 2014-12-29
Publication date: 2015-04-08

Abstract

An embodiment of the invention discloses a method and a device for detecting image information, and relates to the field of technologies for processing images. The method includes determining human eye regions in various video frames in video segments with preset durations; respectively computing light stream information of various pixel points in the human eye regions in the various video frames in the video segments; respectively counting the quantities of the downwardly moving pixel points and the upwardly moving pixel points in the human eye regions of the various video frames in the video segments according to the computed light stream information; acquiring human eye state characteristics of the various video frames in the video segments according to counting results; detecting to determine whether blinking phenomena are available in the video segments or not according to the acquired human eye state characteristics. The light stream information of the various pixel points in the human eye regions in each video frame in the corresponding video segment corresponds to a front frame of the video frame. According to the scheme, the method and the device in the embodiment of the invention have the advantages that the human eye state characteristics of the video frames are acquired by the aid of the image information among the video frames, and are utilized when whether the blinking phenomena are available in the video segments or not is detected, and accordingly the blinking detection accuracy can be improved.

Description

A kind of image information detecting method and device

Technical field

The present invention relates to technical field of image processing, particularly a kind of image information detecting method and device.

Background technology

Along with the development of intelligent terminal, various new man-machine interaction mode constantly occurs, such as, realizes man-machine interaction etc. by speech recognition or action recognition.Compared with voice recognition mode, action recognition by the impact of surrounding acoustic environment, can not adapt to the application scenarios of various acoustic environment.

Identification nictation is the one of action recognition, when whether detection exists phenomenon nictation, it is generally the human eye state feature of video segment each frame of video interior first obtaining certain time length, wherein, human eye state feature is for representing the state of human eye in frame of video, such as, eye opening motion state, eye closing motion state etc., then judge whether to there is phenomenon nictation according to the human eye state feature of each obtained frame of video.In prior art, usual employing obtains the human eye state feature of each frame of video based on the human eye state feature preparation method of still image, concrete, detect the human eye key point in each frame of video, according to the human eye key point detected, determine the upper palpebra inferior position of human eye in each frame of video, using the human eye state feature of the upper palpebra inferior position of human eye in each frame of video determined as each frame of video.Under normal circumstances, the human eye state feature that application said method obtains, can detect whether there is phenomenon nictation.But, when application said method obtains human eye state feature, be subject to the impact of the factors such as ambient lighting condition, customer location, user's human eye individual difference, detected human eye key point is caused to there is larger error, and then make the human eye state feature of each obtained frame of video correctly can not reflect the change of human eye state between each frame of video, cause blink detection accuracy low.

Summary of the invention

The embodiment of the invention discloses a kind of image information detecting method and device, with the human eye state feature utilizing the image information between frame of video to obtain frame of video, improve the accuracy of blink detection.

For achieving the above object, the embodiment of the invention discloses a kind of image information detecting method, described method comprises:

Determine the human eye area in the video segment of preset duration each frame of video interior;

Calculate the Optic flow information of human eye area each pixel interior in described video segment in each frame of video relative to the former frame of this frame of video respectively;

According to the Optic flow information calculated, add up the quantity of pixel and the quantity of the pixel moved upward that move downward in the human eye area in described video segment each frame of video interior respectively;

According to statistics, obtain the human eye state feature of each frame of video in described video segment;

According to obtained human eye state feature, detect in described video segment whether there is phenomenon nictation.

Concrete, the human eye area in the described video segment determining preset duration in each frame of video, comprising:

Human eye area in the video segment determining preset duration according to following steps respectively in each frame of video:

Detect arbitrary frame of video P in described video segment ₁in human eye area A ₁;

Detect frame of video P ₁former frame in human eye area A ₂;

Determine frame of video P ₁middle A ₁and A ₂union region be frame of video P ₁in human eye area.

Concrete, arbitrary frame of video P in the described video segment of described detection ₁in human eye area A ₁, comprising:

Detect arbitrary frame of video P in described video segment ₁in human face region;

Human eye key point is detected in detected human face region;

According to the human eye key point determination frame of video P detected ₁in human eye area A ₁.

Concrete, the Optic flow information that described basis calculates, the quantity of adding up the pixel moved downward in the human eye area in described video segment in each frame of video respectively and the quantity of the pixel moved upward, comprising:

The quantity of the pixel moved downward in the human eye area of adding up in described video segment in each frame of video respectively according to following steps and the quantity of pixel moved upward:

Arbitrary frame of video P in described video segment ₂in human eye area in the vertical direction component H of Optic flow information of arbitrary pixel i _i<-Th ₁when, upgrade Num _ufor: current Num _uvalue+1, wherein, Num _ufor representing frame of video P ₂in human eye area in the quantity of pixel that moves downward, its initial value is 0, Th ₁for the first threshold preset;

At frame of video P ₂in human eye area in the vertical direction component H of Optic flow information of arbitrary pixel i _i> Th ₂when, upgrade Num _ofor: current Num _ovalue+1, wherein, Num _ofor representing frame of video P ₂in human eye area in the quantity of pixel that moves upward, its initial value is 0, Th ₂for the Second Threshold preset.

Concrete, described according to statistics, obtain the human eye state feature of each frame of video in described video segment, comprising:

For the arbitrary frame of video P in described video segment ₂, according to statistics, obtain the human eye state feature of this frame of video according to following steps:

At (Num _u-Num _o)/Num _t> Th ₃when, determine frame of video P ₂human eye state be characterized as: eye closing motion state;

At (Num _u-Num _o)/Num _t<-Th ₃when, determine frame of video P ₂human eye state be characterized as: eye opening motion state;

In other situations, determine frame of video P ₂human eye state be characterized as: stationary state;

Wherein, Num _tfor frame of video P ₂in human eye area in the quantity of pixel, Th ₃for the 3rd threshold value preset.

For achieving the above object, the embodiment of the invention discloses a kind of image information detecting device, described device comprises:

Human eye area determination module, for determine preset duration video segment in human eye area in each frame of video;

Optic flow information computing module, for calculating in the human eye area in described video segment in each frame of video each pixel respectively relative to the Optic flow information of the former frame of this frame of video;

Pixel quantity statistical module, for according to the Optic flow information calculated, adds up the quantity of pixel and the quantity of the pixel moved upward that move downward in the human eye area in described video segment each frame of video interior respectively;

Human eye state feature obtains module, for according to statistics, obtains the human eye state feature of each frame of video in described video segment;

Blink detection module, for according to obtained human eye state feature, detects in described video segment whether there is phenomenon nictation.

Concrete, described human eye area determination module, comprising: the first human eye region detection submodule, the second human eye area detection sub-module and human eye area determination submodule;

Described human eye area determination module, specifically for by each submodule above-mentioned, determines the human eye area in the video segment of preset duration each frame of video interior respectively;

Wherein, described first human eye region detection submodule, for detecting arbitrary frame of video P in described video segment ₁in human eye area A ₁;

Described second human eye area detection sub-module, for detecting frame of video P ₁former frame in human eye area A ₂;

Described human eye area determination submodule, for determining frame of video P ₁middle A ₁and A ₂union region be frame of video P ₁in human eye area.

Concrete, described first human eye region detection submodule, comprising:

Human face region detecting unit, for detecting arbitrary frame of video P in described video segment ₁in human face region;

Human eye critical point detection unit, for detecting human eye key point in detected human face region;

Human eye area determining unit, for the human eye key point determination frame of video P that basis detects ₁in human eye area A ₁.

Concrete, described pixel quantity statistical module, specifically for according to following situation, adds up the quantity of pixel and the quantity of the pixel moved upward that move downward in the human eye area in described video segment each frame of video interior respectively;

Concrete, described human eye state feature obtains module, specifically for for the arbitrary frame of video P in described video segment ₂, according to statistics, obtain the human eye state feature of this frame of video according to following situation:

As seen from the above, in the scheme that the embodiment of the present invention provides, according to the quantity of the pixel moved downward in the human eye area in each frame of video in the video segment of preset duration and the quantity of the pixel moved upward, obtain the human eye state feature of each frame of video, and according to obtained human eye state feature, detect in preset duration whether there is phenomenon nictation in each frame of video.Although in practical application, by ambient lighting condition, user's human face posture, whether wear glasses, the impact of the factor such as user's human eye individual difference, may error be there is in the human eye area in each frame of video determined, but in human eye area, each pixel is constant relative to the movement tendency of its former frame in each frame of video, so, the scheme that the application embodiment of the present invention provides, when whether there is phenomenon nictation in detection video segment, make use of the human eye state feature of the frame of video that the image information between frame of video obtains, the accuracy of blink detection can be improved.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

The schematic flow sheet of a kind of image information detecting method that Fig. 1 provides for the embodiment of the present invention;

The schematic flow sheet of the another kind of image information detecting method that Fig. 2 provides for the embodiment of the present invention;

The structural representation of a kind of image information detecting device that Fig. 3 provides for the embodiment of the present invention;

The structural representation of the another kind of image information detecting device that Fig. 4 provides for the embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

The schematic flow sheet of a kind of image information detecting method that Fig. 1 provides for the embodiment of the present invention, the method comprises:

S101: determine the human eye area in the video segment of preset duration each frame of video interior.

In practical application, may there is multiple personage in a frame of video, then, when determining the human eye area in this frame of video, can be the human eye area of part personage, also can be the human eye area of all personages.

Preferably, people is when blinking, and the state of two eyes is generally identical, so, for reducing calculated amount, only the region at an eyes place can be defined as the human eye area of personage.

During human eye area in the video segment determining preset duration in each frame of video, directly according to the image information of each frame of video, the human eye area in each frame of video can be detected respectively.But, in practical application, when only utilizing the human eye area in the current image information detection frame of video being frame of video, the error of calculation existed is larger, and life period correlativity between continuous print two frame of video in video segment, that is: in video segment, between continuous print two frame of video, image content is similar, so, when determining the human eye area in frame of video, the human eye area detected in current video frame and the human eye area detected in the former frame of current video frame can be considered.

In one particular embodiment of the present invention, see Fig. 2, provide the schematic flow sheet of another kind of image information detecting method, in the present embodiment, determine the human eye area in the video segment of preset duration each frame of video interior, comprising:

S101A: detect arbitrary frame of video P in video segment ₁in human eye area A ₁.

In practical application, can by first detecting arbitrary frame of video P in video segment ₁in human face region, then detect human eye key point in detected human face region, and according to the human eye key point determination frame of video P detected ₁in human eye area A ₁method detect arbitrary frame of video P in video segment ₁in human eye area A ₁.

Wherein, detect human face region in frame of video to be realized by the boosting human-face detector based on Haar or LBP (Local Binary Patterns) feature of comparative maturity in prior art, certainly, the method for human face region in frame of video that detects in practical application is not limited in this.

The human eye key point detected in human face region can be realized by methods such as AAM (Active Appearance Model), the ESR (Explicit Shape Regression) of comparative maturity in prior art, certainly, the method detecting human eye key point in human face region in practical application is not limited in this.

S101B: detect frame of video P ₁former frame in human eye area A ₂.

S101C: determine frame of video P ₁middle A ₁and A ₂union region be frame of video P ₁in human eye area.

As seen from the above, in embodiment illustrated in fig. 2, by frame of video P ₁in the human eye area that detects and the union of human eye area detected in its former frame, at frame of video P ₁the middle human eye area determined in this frame of video, make use of the temporal correlation between frame of video, can compared with limits at frame of video P ₁in determine human eye area.

S102: calculate the Optic flow information of human eye area each pixel interior in video segment in each frame of video relative to the former frame of this frame of video respectively.

Those skilled in the art, it is appreciated that optical flow method is the important method of current movement image analysis, refer to the pattern further speed in time varying image.Because when object is when moving, the luminance patterns of its corresponding point on image is also in motion.The apparent motion of this brightness of image pattern is exactly light stream.Light stream have expressed the change of image, because it contains the information of target travel, therefore observed person can be used for determining the motion conditions of target.Can amplify out optical flow field by the definition of light stream, it refers to the one two dimension instantaneous velocity field that in image, all pixels are formed, and two-dimension speed vector is wherein the projection of three dimensional velocity vectors at imaging surface of visible point in scenery.So light stream not only contains the movable information of observed object, but also comprise the abundant information about scenery three-dimensional structure.

Seen from the above description, aforesaid Optic flow information can be two-dimensional signal, also can be three-dimensional information, when aforementioned Optic flow information is two-dimensional signal, this Optic flow information comprises horizontal direction component and vertical direction component, represents the horizontal direction of pixel relative to last frame of video and the pattern further speed of vertical direction respectively.

Concrete, Optic flow information can use the acquisition such as cvCalcOpticalFlowFarneback () function, cvCalcOpticalFlowLK () function, cvCalcOpticalFlowHS () function in opencv, wherein, the Optic flow information precision using cvCalcOpticalFlowFarneback () function to obtain is higher.

S103: according to the Optic flow information calculated, adds up the quantity of the pixel moved downward in the human eye area in video segment in each frame of video and the quantity of the pixel moved upward respectively.

Under normal circumstances, when human eye is in eye closing motion state, each pixel in upper eyelid moves downward, and each pixel of palpebra inferior is almost motionless; When human eye is in eye opening motion state, each pixel in upper eyelid moves upward, and each pixel of palpebra inferior is almost motionless.In practical application, when human eye can be utilized to be in different motion state, whether the motion conditions of each pixel in upper eyelid, exist phenomenon nictation in the video segment judging preset duration.

In an alternate embodiment of the present invention where, add up the quantity of the pixel moved downward in the human eye area in video segment in each frame of video and the quantity of the pixel moved upward respectively, comprising:

The quantity of the pixel moved downward in the human eye area of adding up in video segment in each frame of video respectively according to following steps and the quantity of pixel moved upward:

Arbitrary frame of video P in video segment ₂in human eye area in the vertical direction component H of Optic flow information of arbitrary pixel i _i<-Th ₁when, upgrade Num _ufor: current Num _uvalue+1, wherein, Num _ufor representing frame of video P ₂in human eye area in the quantity of pixel that moves downward, its initial value is 0, Th ₁for the first threshold preset;

S104: according to statistics, obtains the human eye state feature of each frame of video in video segment.

Above-mentioned human eye movement's state can comprise: eye closing motion state, eye opening motion state and stationary state etc.

Under normal circumstances, when human eye is in eye closing motion state, most of pixels in human eye area are all in the state of moving downward, and when human eye is in eye opening motion state, most of pixels in human eye area are all in the state of moving upward, even if detected for determining that the human eye key point of human eye area has error in above-mentioned S101A, the human eye area in other words in determined frame of video has error, and in human eye area, the motion state of most of pixel also can not change.

For the arbitrary frame of video P in video segment ₂, according to statistics, the human eye state feature of this frame of video can be obtained according to following steps:

At (Num _u-Num _o)/Num _t> Th ₃when, frame of video P is described ₂the pixel moved downward in middle human eye area is more, determines frame of video P ₂human eye state be characterized as: eye closing motion state;

At (Num _u-Num _o)/Num _t<-Th ₃when, frame of video P is described ₂the pixel moved upward in middle human eye area is more, determines frame of video P ₂human eye state be characterized as: eye opening motion state;

When face planar the anglec of rotation angle time, the light stream direction of motion of pixel i arbitrary in human eye area can be decomposed along with the vertical of eyes line and horizontal direction, thus the direction of motion component obtained perpendicular to eyes line, then carry out calculating above.

Certainly, the method obtaining the human eye state feature of each frame of video in video segment in practical application is not limited in this, such as, by frame of video P ₂in human eye area in ratio, the frame of video P of pixel quantity in the quantity of pixel that moves downward and this human eye area ₂in human eye area in the ratio of pixel quantity in the quantity of pixel that moves upward and this human eye area, as foundation, obtain the human eye state feature etc. of each frame of video in video segment.

S105: according to obtained human eye state feature, detects in video segment whether there is phenomenon nictation.

Be understandable that, in human eye process nictation, comprise the eye closing campaign of human eye and motion of opening eyes, then obtain preset duration video segment in each frame of video human eye state feature after, if the human eye state feature of continuous some frame of video is converted to eye opening motion state by motion state of closing one's eyes, or by eye opening motion state be converted to eye closing motion state all can be regarded as in this video segment exist nictation phenomenon.

As seen from the above, in the scheme that each embodiment above-mentioned provides, according to the quantity of the pixel moved downward in the human eye area in each frame of video in the video segment of preset duration and the quantity of the pixel moved upward, obtain the human eye state feature of each frame of video, and according to obtained human eye state feature, detect in preset duration whether there is phenomenon nictation in each frame of video.Although in practical application, by ambient lighting condition, user's attitude, whether wear glasses, the impact of the factor such as user's human eye individual difference, may error be there is in the human eye area in each frame of video determined, but in human eye area, each pixel is constant relative to the movement tendency of its former frame in each frame of video, so, apply the scheme that each embodiment above-mentioned provides, when whether there is phenomenon nictation in detection video segment, make use of the human eye state feature of the frame of video that the image information between frame of video obtains, the accuracy of blink detection can be improved.

Corresponding with above-mentioned image information detecting method, the embodiment of the present invention additionally provides a kind of image information detecting device.

The structural representation of a kind of image information detecting device that Fig. 3 provides for the embodiment of the present invention, this device comprises: human eye area determination module 301, Optic flow information computing module 302, pixel quantity statistical module 303, human eye state feature obtain module 304 and blink detection module 305.

Wherein, human eye area determination module 301, for determine preset duration video segment in human eye area in each frame of video;

Optic flow information computing module 302, for calculating in the human eye area in described video segment in each frame of video each pixel respectively relative to the Optic flow information of the former frame of this frame of video;

Pixel quantity statistical module 303, for according to the Optic flow information calculated, adds up the quantity of pixel and the quantity of the pixel moved upward that move downward in the human eye area in described video segment each frame of video interior respectively;

Human eye state feature obtains module 304, for according to statistics, obtains the human eye state feature of each frame of video in described video segment;

Blink detection module 305, for according to obtained human eye state feature, detects in described video segment whether there is phenomenon nictation.

In one particular embodiment of the present invention, see Fig. 4, provide the structural representation of another kind of image information detecting device, compared with embodiment illustrated in fig. 3, in the present embodiment, human eye area determination module 301, comprising: the first human eye region detection submodule 3011, second human eye area detection sub-module 3012 and human eye area determination submodule 3013.

Human eye area determination module 301, specifically for by each submodule above-mentioned, determines the human eye area in the video segment of preset duration each frame of video interior respectively;

Wherein, the first human eye region detection submodule 3011, for detecting arbitrary frame of video P in described video segment ₁in human eye area A ₁;

Second human eye area detection sub-module 3012, for detecting frame of video P ₁former frame in human eye area A ₂;

Human eye area determination submodule 3013, for determining frame of video P ₁middle A ₁and A ₂union region be frame of video P ₁in human eye area.

Concrete, the first above-mentioned human eye region detection submodule 3011 can comprise: human face region detecting unit, human eye critical point detection unit and human eye area determining unit (not shown).

Wherein, human face region detecting unit, for detecting arbitrary frame of video P in described video segment ₁in human face region;

In a kind of Alternate embodiments of the present invention, above-mentioned pixel quantity statistical module 303, specifically for according to following situation, add up the quantity of pixel and the quantity of the pixel moved upward that move downward in the human eye area in described video segment each frame of video interior respectively;

Optionally, above-mentioned human eye state feature obtains module 304, specifically can be used for for the arbitrary frame of video P in described video segment ₂, according to statistics, obtain the human eye state feature of this frame of video according to following situation:

As seen from the above, in the scheme that each embodiment above-mentioned provides, according to the quantity of the pixel moved downward in the human eye area in each frame of video in the video segment of preset duration and the quantity of the pixel moved upward, obtain the human eye state feature of each frame of video, and according to obtained human eye state feature, detect in preset duration whether there is phenomenon nictation in each frame of video.Although in practical application, by ambient lighting condition, user's human face posture, whether wear glasses, the impact of the factors such as user's human eye individual difference, may error be there is in the human eye area in each frame of video determined, but in human eye area, each pixel is constant relative to the movement tendency of its former frame in each frame of video, so, apply the scheme that each embodiment above-mentioned provides, when whether there is phenomenon nictation in detection video segment, make use of the human eye state feature of the frame of video that the image information between frame of video obtains, the accuracy of blink detection can be improved.

For systems/devices embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.

It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.

One of ordinary skill in the art will appreciate that all or part of step realized in said method embodiment is that the hardware that can carry out instruction relevant by program has come, described program can be stored in computer read/write memory medium, here the alleged storage medium obtained, as: ROM/RAM, magnetic disc, CD etc.

The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims

1. an image information detecting method, is characterized in that, described method comprises:

2. method according to claim 1, is characterized in that, the human eye area in the described video segment determining preset duration in each frame of video, comprising:

Detect frame of video P ₁former frame in human eye area A ₂;

3. method according to claim 2, is characterized in that, arbitrary frame of video P in the described video segment of described detection ₁in human eye area A ₁, comprising:

Human eye key point is detected in detected human face region;

4. the method according to any one of claim 1-3, it is characterized in that, the Optic flow information that described basis calculates, the quantity of adding up the pixel moved downward in the human eye area in described video segment in each frame of video respectively and the quantity of the pixel moved upward, comprising:

5. method according to claim 4, is characterized in that, described according to statistics, obtains the human eye state feature of each frame of video in described video segment, comprising:

6. an image information detecting device, is characterized in that, described device comprises:

7. device according to claim 6, is characterized in that, described human eye area determination module, comprising: the first human eye region detection submodule, the second human eye area detection sub-module and human eye area determination submodule;

8. device according to claim 7, is characterized in that, described first human eye region detection submodule, comprising:

9. the device according to any one of claim 6-8, it is characterized in that, described pixel quantity statistical module, specifically for according to following situation, add up the quantity of pixel and the quantity of the pixel moved upward that move downward in the human eye area in described video segment each frame of video interior respectively;

10. device according to claim 9, is characterized in that, described human eye state feature obtains module, specifically for for the arbitrary frame of video P in described video segment ₂, according to statistics, obtain the human eye state feature of this frame of video according to following situation: