CN111898407B

CN111898407B - Human-computer interaction operating system based on human face action recognition

Info

Publication number: CN111898407B
Application number: CN202010508604.2A
Authority: CN
Inventors: 李昱昂; 梁星辰; 张聪昱; 张�雄; 樊兆雯
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-06-06
Filing date: 2020-06-06
Publication date: 2022-03-29
Anticipated expiration: 2040-06-06
Also published as: CN111898407A

Abstract

The invention belongs to the field of human-computer interaction, and particularly relates to a human-computer interaction operating system based on human face action recognition. Firstly, acquiring videos in real time through a camera, and carrying out preprocessing such as mirror image turning on each frame of image; detecting a face region by adopting a classifier, detecting characteristic regions such as eyes, mouths and the like of a human in a rectangular region where the human face is located, extracting characteristic points of the human face, performing identity authentication of the human face, and calculating the motion direction and instantaneous speed of the face; secondly, respectively detecting the front-back movement of the human face, the eye, the mouth and other facial actions by adopting an interframe difference method through a preset threshold value; and finally, according to the position parameters and actions of each region obtained by detection, corresponding mouse movement, clicking and roller control, touch function control and keyboard combination shortcut simulation keys under three modes of system, application and general use are completed, switching among different application modes can be realized through simple actions, the traditional manual operation modes of mouse, keyboard, touch and the like are replaced, and the non-contact man-machine interaction function is realized.

Description

Human-computer interaction operating system based on human face action recognition

Technical Field

The invention relates to the technical field of human-computer interaction, in particular to a human-computer interaction operating system based on human face action recognition.

Background

With the rapid development of contemporary computers and the internet, the era of informatization and intellectualization is becoming overwhelming. In addition, the newly created internet of things also raises the development wave of the third world information industry structure upgrading transformation, and becomes a new strategic industry with new economic growth points and high-quality market benefits. Meanwhile, various human-computer interaction (HCI) modes have come into force. Human-computer interaction belongs to the field of multidisciplinary, and theory and practical experience of many different disciplines such as computer science, anthropology, cognition and behavioral psychology, industrial design and the like are combined in a crossed manner. The HCI is used for completing efficient interconnection and coordination work between human and machine equipment through multi-channel information exchange, emotion analysis, natural language recognition and the like between a user and the machine. The machine not only refers to a computer and corresponding software, but also comprises a movable intelligent device and a household appliance, and even with the deep development of the HCI technology, the mode of receiving data information is changed from passive perception into autonomous understanding feedback data, and the interactive object can be expanded to any object, so that the real omnibearing multi-mode perception everything interaction is realized.

However, along with human-computer interaction's operating mode is diversified and convenient, electronic terminal equipment is more and more intelligent, and the user uses threshold low rank ization, and people are addicted in electronic terminal equipment's time and continue to increase, and the long-term strain of cervical vertebra backbone and wrist has seriously influenced people's health and quality of life. Meanwhile, almost all information terminals and information services, especially interaction designs, are designed for healthy common people, application requirements of some special interaction people (such as disabled people) are not considered, and real information sharing is hindered, so that a new and unfamiliar human-computer interaction technology, namely a face recognition technology, is emerging, and an interaction mode based on face recognition is urgently needed in the market to replace traditional mouse and keyboard operation.

The face recognition technology is mainly divided into four steps of image acquisition and preprocessing, face detection, face calibration, face identification and the like, and has the advantages of non-contact property, high safety, simplicity, rapidness, high recognition accuracy and the like. The method mainly comprises the steps of obtaining a face image video stream through a camera, carrying out noise reduction processing and enhancing image contrast to ensure the accuracy and effectiveness of extracting face features and the relative positions of main facial organs, then extracting global and local features of a face according to an established face gray level image model, carrying out classification and identification on the global and local features, referring to a preset threshold value and an image contrast in a database, selecting an image with the highest matching degree and feeding back required information to an output end. The human face image analysis is used as the core content of visual perception, and nonverbal expression information such as mental activities, emotional expressions and the like is visually and vividly transmitted through abundant expressions and actions of the face. The biometric identification technology is strong in description capacity, high in information accommodation degree and high in confidence coefficient, and compared with information media such as character transmission and voice expression, the biometric identification technology has more direct and convenient interaction and multi-dimensional expression capacity.

Compared with the visual mouse operating system and method based on face recognition disclosed in patent CN108108029, the system controls the mouse through the action of the face to achieve the function of operating the computer, and comprises an image acquisition module, a face image processing module for acquiring a face image by using a camera, a face recognition module for processing the acquired image, a mouse operating module for recognizing the nose and mouth in the face image, and mouse operation according to the positions of the nose and mouth. Compared with the system and the method provided by the patent, the system and the method can only simulate the interactive function of the traditional mouse, can not simulate the functions of a keyboard and shortcut keys thereof in application software with higher use frequency, are complex to use, and have mechanical and unnatural interactive modes.

Compared with the visual mouse operating system and method based on face recognition disclosed in the patent CN108108029, the human-computer interaction system provided by the patent designs three application modes according to the operating characteristics of a computer which is commonly used at present, wherein the three application modes are respectively a system mode, a general mode and an application mode, and can be freely switched according to different application environments and user requirements, so that the rapid, natural and efficient human-computer interaction is realized, and meanwhile, the body injury caused by long-time use of products such as computers is effectively reduced through the active actions of the head, the face and the eyes of a user.

Disclosure of Invention

To the traditional key mouse of intelligent electronic product, the human-computer interaction methods such as touch exist to eyes, the cervical vertebra easily causes fatigue damage and mutual unnatural problem, this patent has been proposed one kind and has been realized contactless mutual through facial action, carry out simple and convenient control to the common function of application program, make the user initiative carry out face, neck motion when mutual more natural, thereby realize natural interaction and reduce the health damage, make the handicapped personage also can accomplish the control to intelligent terminal simultaneously, realize information sharing's accessible ization. The system can execute corresponding interactive functions, mode awakening and quitting, mouse, touch and keyboard shortcut key functions by detecting the face action through the camera, and sets three interactive modes, namely a system mode, a general mode and an application mode according to different application scenes and user requirements. The system mode is used for previewing and switching different running tasks or application programs and system common programs. The interactive function of the general mode simulates the traditional mouse, keyboard and touch function; the application mode selects a browsing mode and a dialog box mode according to different application programs, so that a specific quick and adaptive interaction function is realized. Based on the above purpose, the present invention provides a human-computer interaction operation system based on human face action recognition, which comprises:

the image acquisition module comprises a camera and an image preprocessing unit;

the human face action recognition module comprises a human face, eyes, a nose and mouth position and action detection unit and a human face identity authentication unit;

the host module comprises a central processing unit, a storage unit, a data and control bus, a display, a power supply and management unit thereof, other peripheral units, an operating system, an interactive control unit and an application program.

Further, the operating system is WINDOWS, LINUX, ANDROID, IOS or other derivative operating systems, and the interactive host module is one of a desktop computer, a workstation, a notebook computer, a mobile phone and a tablet computer.

The system is realized by the following specific steps of human face image acquisition and image detection preprocessing, human face and facial features region segmentation and detection positioning, human face characteristic point extraction and movement speed calculation, human face identity authentication, facial action tracking and judgment, interactive mode function awakening and quitting, mouse, touch and keyboard shortcut key combination simulation. The interaction mode comprises a system mode, a general mode and an application mode, wherein the system mode realizes the rolling preview and switching of each running task, application program and system common program; the universal mode simulates the functions of a traditional mouse, a keyboard and a touch screen; the application mode realizes a specific quick interaction function according to different application programs, and automatically enters the application mode when the specific application program is opened; the three modes can be switched.

The application mode includes:

in a browsing mode:

the face moves left, right, up and down rapidly and returns to the front slowly, and single page turning of the displayed content, rapid progress adjustment of multimedia audio and video, and rapid adjustment of volume or brightness are triggered; if the face is not returned, continuing to turn the page in the direction until the face returns slowly and stops turning the page;

the face moves left, right, up and down slowly and returns to the front quickly, and the upper, lower, left and right single movement of the display content, the slow single progress adjustment of the multimedia audio and video, and the slow single adjustment of the volume or brightness are triggered; if the face does not return, the control function corresponding to the directional action is continued until the face quickly returns to the front and stops moving;

the front and back movement of the face triggers the zooming-in and zooming-out of the displayed content;

in the dialog mode:

the slow up-and-down movement of the face triggers a confirmation function one or more times;

the slow left-right movement of the face triggers a cancel function one or more times;

the generic mode includes:

the method comprises the following steps that the up-down, left-right and left movements of the face trigger a screen cursor following movement event, the rapid opening and closing of eyes or a mouth within a period of time Ts when the face action stops triggers a mouse click event at the position of the cursor, and the rapid opening and closing of the eyes or the mouth triggers a mouse double click event at the position of the cursor more than twice; if the cursor stays and is selected in the input box, a virtual keyboard is popped up on the screen, and the mouse is simulated and controlled to carry out keyboard input in the mode; the Ts time range is 0.01-2 seconds;

the system mode includes:

rapidly moving the face left, right, up and down and returning to the front slowly, and triggering a task or application program to preview the upper, lower, left and right single page turning of the picture; if the face is not returned, continuing to turn the page in the direction until the face returns slowly and stops turning the page;

the face moves left, right, up and down slowly and returns to the front quickly, and the task or application program preview image is triggered to move up, down, left and right continuously; if the face is not returned, continuing the movement of the direction task or the application program preview picture until the face quickly returns to the front side and stops moving;

the face moves up and down slowly for one time or multiple times to trigger a confirmation function, and a task or an application program corresponding to the preview image is switched;

the slow left-right movement of the face triggers a cancel function one or more times; and returning to the task or application program which is operated before.

A fast up-down or side-to-side movement one to and fro means switching between the general mode and the application mode; rapid up and down or side to side movement triggers or turns off the system mode a number of times.

Sending corresponding mouse left and right keys and roller messages, keyboard direction key messages, touch messages, keyboard shortcut keys of application programs and combined messages of a keyboard and a mouse to the operating system, then distributing the messages to specific application programs by the operating system, and finally realizing corresponding functions by a message response module of the application programs.

Firstly, the acquisition of the face image and the preprocessing of the detection image are to acquire a video image in real time through a camera, and to perform preprocessing on each frame of captured image, such as graying, filtering and denoising, contrast enhancement, scaling, mirror image turning and the like.

The human face and the five sense organs can be segmented, detected and positioned according to a haar classifier trained by a Viola-Jones detection framework, the front side, the left side and the right side of the human face are detected, data information of coordinates of a left upper point of a target frame and the length and the width of a frame is obtained, a corresponding edge rectangular frame area is determined, an intercepted human face image is output, and interference of a background environment is eliminated. And confirming whether the intercepted face image is an authorized face through various face recognition algorithms, and if the intercepted face image passes the authentication, continuously detecting in the intercepted face image to obtain the positions of eyes, a nose and a mouth.

The extraction of the human face characteristic points and the calculation of the movement speed can adopt an integrated regression tree algorithm training model to position the nose tip part in the human face, all characteristic points of the human eye and the mouth, and establish a two-dimensional optical flow field according to a sparse optical flow algorithm to obtain the instantaneous speed value and the displacement vector of each frame of human face characteristic points.

The face motion tracking and judging is to detect the direction and gradient change of the face moving speed by combining the face detection result and the instantaneous speed value of the characteristic point, and set double thresholds to judge the left, right, upper and lower rapid and slow movements of the face. And meanwhile, calculating the aspect ratio of the mouth image according to the coordinates of the eye or mouth characteristic points, and judging the opening and closing actions of the eyes or the mouth when the absolute value of the difference value of the aspect ratios in two continuous frames exceeds a preset threshold value. In addition, the change amplitude of the face target rectangular frame in the horizontal direction is calculated through a continuous two-frame difference method, and if the change amplitude of the face rectangular frame in the horizontal direction is larger than a preset threshold value in the horizontal direction, the face is judged to move back and forth.

As a scheme of the invention, mouse control and touch operation can be performed by taking the coordinate difference value of the human face characteristic points in two captured continuous frames of images as a displacement vector, and the displacement vector of a mouse cursor or a touch point is obtained through linear mapping. Taking a mouse cursor in a screen as a starting point, and moving the mouse in any direction in the screen; and taking the current touch point in the touch screen as an initial position, and moving the touch point in any direction in the screen. And the system utilizes left and right face detection and smile detection of a haar classifier, the face is static and is respectively used as a trigger signal for mouse roller movement and mouse clicking, when an image frame is captured and a left face is detected, the mouse roller moves upwards for a plurality of pixels, otherwise, the mouse roller moves downwards for a plurality of pixels, and when a smile area or continuous static existence is detected in the face image, the mouse clicks at the position of a cursor. And when the mouth is detected to be opened and closed twice quickly, carrying out mouse double-click operation.

As one scheme of the invention, when the face is judged to move forward, the system enlarges the screen picture by simulating and displaying enlarged touch, mouse or keyboard combined keys, such as combination of ctrl and +; when the face is judged to move backwards, the system simulates the combination keys of the keyboard ctrl and the keyboard, and the screen picture is reduced. And extracting a continuous video image sequence, judging the existence of pupils in the human eye region, setting the existence of eye closing action in the human face state when the pupil region does not exist in the continuous three-frame image, triggering a screen capture function, capturing the current picture of the screen, and storing the captured picture into a corresponding folder through a preset file path. And according to the judgment of the direction and speed change of the face action, executing shortcut key operations of page turning, video speed regulation, volume regulation and system mode switching.

The technical effects are as follows: the system of the invention completes the face identity authentication by identifying various parameters detected according to each frame of face, automatically wakes up and quits the interactive system, controls a series of operations of webpage scaling, volume control and video playing progress adjustment by combining a mouse cursor, a roller, a click and simulation keyboard, and replaces the traditional mouse and keyboard operation mode by combining a single camera, thereby basically meeting the control requirement of a user for browsing the webpage. The functions only need the user to control the head movement and the facial expression, manual operation is not needed, and application of special interaction people is facilitated to a great extent. And the system only depends on the PC end and a built-in camera at the front end of the PC end to complete all functions of detection, identification, interaction and the like of the system, has low requirements on hardware facilities and has good expansibility and universality. Meanwhile, the system is provided with three interaction modes, namely a system mode, a general mode and an application mode according to different application scenes and user requirements, and the three modes can be flexibly switched according to human face actions. The system mode is used for previewing and switching different running tasks or application programs and system common programs. The interactive function of the general mode simulates the traditional mouse, keyboard and touch function; the application mode selects a browsing mode and a dialog box mode according to different application environments, so that specific quick and adaptive interactive operation is realized. In addition, in the actual operation process of the windows system, the interactive system has good real-time performance and high system processing speed, the highest operation speed can reach 10 frames of images per second, and the accuracy is high.

Drawings

FIG. 1 is a block diagram of the overall design of the present invention.

FIG. 2 is a flow chart of a specific implementation of the present invention.

Fig. 3 is a functional introduction of three interaction modes of the present invention.

Fig. 4 is a cascade classifier operating principle.

Detailed Description

For the purpose of describing the embodiments of the present invention in detail, the technical effects and technical solutions will be more clearly described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the described embodiments are only some of the embodiments of the present invention. Other embodiments, which can be derived by one of ordinary skill in the art from the disclosed embodiments without inventive faculty, are intended to be within the scope of the invention.

Example 1

As shown in fig. 1, a human-computer interaction operating system based on human face motion recognition includes:

1. the image acquisition module comprises a camera and an image preprocessing unit;

2. the human face action recognition module comprises a human face, eyes, a nose and mouth position and action detection unit and a human face identity authentication unit;

3. a host module: the system comprises a central processing unit, a storage unit, a data and control bus, a display, a power supply and management unit thereof, other peripheral units, an operating system, an interactive control unit and an application program; the operating system is WINDOWS, LINUX, ANDROID, IOS or other derivative operating systems, and the interactive host module is one of a desktop computer, a workstation, a notebook computer, a mobile phone and a tablet computer.

Example 2

In this embodiment, a specific work flow is shown in fig. 2, and the working method of the system includes:

after the system is powered on, the host module completes the connection and initialization of other modules of the system, and the image acquisition module starts to acquire videos in real time and preprocess images; firstly, the acquisition of the face image and the preprocessing of the detection image are to acquire a video image in real time through a camera, and to perform preprocessing on each frame of captured image, such as graying, filtering and denoising, contrast enhancement, scaling, mirror image turning and the like.

When the face action recognition module detects that the video collected by the camera contains a face, carrying out face identity authentication, tracking the area where the face is located after the authentication is successful, judging the positions and actions of the face, eyes, nose and mouth, and awakening an interactive control unit of the host module;

and the interaction control unit executes interaction functions such as awakening and quitting of corresponding interaction modes and functions, a mouse, a touch screen, a keyboard shortcut key and the like according to the data obtained by the face action recognition module, so that man-machine interaction is realized.

The interaction mode comprises a system mode, a general mode and an application mode, wherein the system mode realizes the rolling preview and switching of each running task, application program and system common program; the interactive function of the general mode is to simulate the functions of a traditional mouse, a keyboard and a touch screen; the application mode realizes a specific quick interaction function according to different application programs, and automatically enters the application mode when the specific application program is opened. The three modes can be switched.

The application mode includes:

in a browsing mode:

in the dialog mode:

the generic mode includes:

the system mode includes:

The specific implementation of the above functions may be realized by sending a corresponding mouse left and right key and wheel message, a keyboard direction key message, a touch message, a keyboard shortcut key of an application program, and a combined message of a keyboard and a mouse to the operating system, distributing a message to the specific application program by the operating system, and finally, specifically implementing the corresponding function by a message response module of the application program.

The man-machine interaction function of the corresponding application mode is performed as shown in fig. 3. When a front camera is used, in order to be consistent with the control direction of a user, the image needs to be mirrored or the calculated result needs to be mirrored. In this embodiment, the face detection algorithm may analyze the input gray-scale image by using Haar feature detection based on the Viola-Jones detection framework. The Viola-Jones detection framework first calculates an integral image of the image and selects a three-rectangle Haar feature template to extract human face features. And then, utilizing the trained Adaboost classifier feature library and utilizing a Cascade method to simplify the scale of the classifier. The classifier feature library used by the system is composed of 22 cascade strong classifiers, and each strong classifier is composed of a plurality of weak classifiers. As shown in fig. 4, the system first captures all 80 × 80 sub-windows in the whole image, and each sub-window passes through the cascade classifier in sequence to eliminate non-face sub-windows step by step. If only one sub-window passes through all 22-level classifiers, the window is determined to be a face sub-window, if a plurality of sub-windows pass through all 22-level classifiers, the adjacent 6 × 6 sub-windows of the plurality of face sub-windows to be selected are merged and screened, and the best face sub-window is selected. If the matched sub-windows are not detected, the size of the sub-windows is increased by 1.16 times to build an image pyramid, and the scanning and matching of the sub-windows are performed on each image in turn from large to small through a cascade classifier.

In the present embodiment, in the motion recognition, the present system determines the motion of the face moving forward and backward and the closed-eye detection by using a frame difference method. According to the pixel difference value of the face edge rectangular frame obtained in the face detection module in the video sequence image frames of two continuous frames, when the face moves forwards to the screen, the face edge rectangular frame of the captured frame is enlarged, when the face moves backwards, the frame size is reduced, and meanwhile, a threshold value is set to judge the forward movement and backward movement of the face. The setting of the threshold value needs to balance both the sensitivity of the action judgment and the misjudgment rate, and an optimal threshold value is found. The smaller the threshold value is, the more sensitive the action judgment is, the high action recognition speed is achieved, the misjudgment rate is increased, and otherwise, the recognition accuracy is high, and the reaction sensitivity of the system is low.

Meanwhile, the eye closing action is judged by adopting three continuous frames, so that the misjudgment rate of the eye closing action is reduced, and a better detection effect can be obtained.

In addition, an optical flow method is adopted, a plurality of instantaneous velocity vectors are formed by calculating different motion directions of each pixel point on the sparse optical flow image, and a vector field formed by the distribution of the vectors on the image is an optical flow field. And performing parallax calculation on the feature points between the two connected frames of images to obtain a sparse optical flow field, judging the moving direction and the moving speed of the face by using the instantaneous speed value and the displacement vector, and simultaneously adding double thresholds to judge slow and fast actions.

And for the face still judgment, judging by adopting the coordinates of the feature points of the five continuous frames, the speed value and the size of each frame in the face detection, and judging that the person is still if the values are basically unchanged or the variation is smaller than a certain value.

The threshold value required by the design and judgment of the facial action can be customized according to the habit of the user or automatically adjusted in the using process of the user.

According to the technical scheme, the mouse control and the touch operation can be performed by taking the coordinate difference value of the human face characteristic points in the two captured continuous frames of images as a displacement vector, and the displacement vector of the mouse cursor or the touch point is obtained through linear mapping. Taking a mouse cursor in a screen as a starting point, and moving the mouse in any direction in the screen; and taking the current touch point in the touch screen as an initial position, and moving the touch point in any direction in the screen. And the system utilizes left and right face detection and smile detection of a haar classifier, the face is static and is respectively used as a trigger signal for mouse roller movement and mouse clicking, when an image frame is captured and a left face is detected, the mouse roller moves upwards for a plurality of pixels, otherwise, the mouse roller moves downwards for a plurality of pixels, and when a smile area or continuous static existence is detected in the face image, the mouse clicks at the position of a cursor. And when the mouth is detected to be opened and closed twice quickly, carrying out mouse double-click operation.

As a technical scheme of the invention, when the face is judged to move forward, the system amplifies the screen picture by simulating and displaying amplified touch, mouse or keyboard combined keys, such as combination of ctrl and +; when the face is judged to move backwards, the system simulates the combination keys of the keyboard ctrl and the keyboard, and the screen picture is reduced. And extracting a continuous video image sequence, judging the existence of pupils in the human eye region, setting the existence of eye closing action in the human face state when the pupil region does not exist in the continuous three-frame image, triggering a screen capture function, capturing the current picture of the screen, and storing the captured picture into a corresponding folder through a preset file path. And according to the judgment of the direction and speed change of the face action, executing shortcut key operations of page turning, video speed regulation, volume regulation and system mode switching.

The embodiments of the present invention are described above, but the embodiments of the present invention are not limited to the above, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and they are included in the scope of the present invention.

Claims

1. A human-computer interaction operating system based on human face action recognition is characterized in that: the system comprises:

a host module: the system comprises a central processing unit, a storage unit, a data and control bus, a display, a power supply and management unit thereof, other peripheral units, an operating system, an interactive control unit and an application program;

the working method of the system comprises the following steps:

after the system is powered on, the host module completes the connection and initialization of other modules of the system, and the image acquisition module starts to acquire videos in real time and preprocess images;

the interaction control unit executes awakening and quitting of corresponding interaction modes and functions and interaction functions of a mouse, a touch screen and a keyboard shortcut key according to the data obtained by the face action recognition module, so that man-machine interaction is realized;

the interaction mode comprises a system mode, a general mode and an application mode, wherein the system mode realizes the rolling preview and switching of each running task, application program and system common program; the interactive function of the general mode is to simulate the functions of a traditional mouse, a keyboard and a touch screen; the application mode realizes a specific quick interaction function according to different application programs, and automatically enters the application mode when the specific application program is opened; the three modes can be switched;

the application mode includes:

in a browsing mode:

in the dialog mode:

the generic mode includes:

the system mode includes:

2. A human-computer interaction operating system based on human face action recognition according to claim 1, characterized in that rapid up-down or left-right movement one-to-back representation switches between general mode and application mode; rapid up and down or side to side movement triggers or turns off the system mode a number of times.

3. The human-computer interaction operating system based on human face action recognition as claimed in claim 2, wherein the operating system distributes messages to specific applications by sending corresponding mouse left and right keys and scroll wheel messages, keyboard direction key messages, touch messages, keyboard shortcut keys of applications and combined messages of keyboard and mouse to the operating system, and finally the message response module of the applications realizes corresponding functions.

4. A human-computer interaction operating system based on human face action recognition as claimed in claim 1, wherein the operating system is WINDOWS, LINUX, ANDROID, IOS or other derivative operating systems.

5. The human-computer interaction operating system based on the human face action recognition is characterized in that the host module is one of a desktop computer, a workstation, a notebook computer, a mobile phone and a tablet computer.