CN112099634A

CN112099634A - Interactive operation method and device based on head action, storage medium and terminal

Info

Publication number: CN112099634A
Application number: CN202010981138.XA
Authority: CN
Inventors: 黄斌
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2020-09-17
Filing date: 2020-09-17
Publication date: 2020-12-18

Abstract

The invention provides an interactive operation method, an interactive operation device, a storage medium and a terminal based on head actions, wherein the method comprises the following steps: when a screen interface is browsed, acquiring a face image and carrying out face recognition; when the recognized face is matched with a preset face, obtaining a depth image of the head action and recognizing the head action; and executing corresponding interactive operation on the screen interface according to the head action recognition result. According to the invention, when the screen interface of the terminal is browsed, the user can not effectively perform touch operation on the screen interface, and interactive operation can be performed based on head action.

Description

Interactive operation method and device based on head action, storage medium and terminal

Technical Field

The invention relates to the technical field of intelligent terminals, and provides an interactive operation method and device based on head actions, a storage medium and a terminal.

Background

Along with the development of intelligent terminal equipment, more and more people use terminals such as mobile phones in daily life, in order to improve the flexibility of mobile terminal use, a physical key or a touch key is generally arranged below a display screen, and the operation of a display interface on the display screen of the mobile terminal is realized by pressing the physical key or the touch key, so that interface navigation is realized.

In practical applications, a user can perform a series of touch gestures on the touch display screen to complete a desired function. For example, in a browser interface, a user may perform a page refresh by a sliding operation to load more content into the current screen. However, a touch gesture performed on a screen requires that a user's hand must make contact with the screen, which may cause inconvenience to operations in some scenarios. For example, when the hands of the user are wet, the touch operation on the screen may not be effectively performed.

Therefore, the mode of realizing interface navigation through the physical keys or the touch keys positioned below the display screen is a technical problem which needs to be solved urgently at present, so that the use experience which cannot meet the convenience requirement of a user more and more is achieved, the flexibility of triggering the navigation event of the mobile terminal is improved, and the influence on the use of the current application program is avoided.

Disclosure of Invention

The invention provides an interactive operation method, an interactive operation device, a storage medium and a terminal based on head actions, which are used for solving the problems of how to improve the flexibility of triggering a navigation event of a mobile terminal, avoiding influencing the use of the current application program and improving the use experience of a user.

The technical scheme of the invention is as follows:

in a first aspect, the present invention provides a head action-based interactive operation method, including:

when a screen interface is browsed, acquiring a face image and carrying out face recognition;

when the recognized face is matched with a preset face, obtaining a depth image of the head action and recognizing the head action;

and executing corresponding interactive operation on the screen interface according to the head action recognition result.

According to the embodiment of the present invention, preferably, the acquiring a face image and performing face recognition when browsing a screen interface includes:

when a screen interface is browsed, a face image at a preset position away from the screen interface is obtained, and face recognition is carried out on the face image.

According to the embodiment of the present invention, preferably, when the recognized face matches a preset face, acquiring a depth image of a head motion and performing head motion recognition includes:

when the recognized face is matched with a preset face, displaying a preset icon on the screen interface to prompt the head to act;

acquiring a depth image of head movement and extracting head movement track characteristics;

comparing the extracted head movement track characteristics with the head movement track characteristics of a preset model;

and if the extracted head motion track characteristics are matched with the head motion track characteristics of the preset model, the head action recognition result is matched with the preset model.

According to an embodiment of the present invention, preferably, the method further comprises:

establishing a preset model, wherein the input of the preset model is the head motion track characteristic, and the output of the preset model is the head action;

and training a preset model by utilizing the existing head motion track characteristics and the corresponding head motions.

According to an embodiment of the present invention, preferably, the performing, according to the head action recognition result, a corresponding interactive operation on the screen interface includes:

when the head action recognition result is matched with the preset model, executing interactive operation corresponding to the preset model;

wherein, the preset model at least comprises: a head raising action model, a nodding action model, a leftward head swinging action model and a rightward head swinging action model;

the interactive operation at least comprises: page up, page down, page left, page right.

In a second aspect, the present invention provides a head-action-based interactive operation device, comprising:

the face recognition module is used for acquiring a face image and performing face recognition when a screen interface is browsed;

the action recognition module is used for acquiring head action and recognizing the head action when the recognized face is matched with a preset face;

and the execution module is used for executing corresponding interactive operation on the screen interface according to the head action recognition result.

In a third aspect, the present invention provides a storage medium having stored thereon a computer program which, when executed by one or more processors, implements the head action based interoperation method as described in the first aspect.

In a fourth aspect, the present invention provides a terminal comprising a memory and a processor, wherein the memory stores a computer program, and the computer program is executed by the processor to implement the head action-based interactive operation method according to the first aspect.

According to the embodiment of the present invention, preferably, the terminal further includes:

and the camera is connected with the processor and is used for acquiring a face image when a screen interface is browsed.

and the depth sensor is connected with the processor and used for acquiring a depth image of the head action when the recognized face is matched with a preset face.

Compared with the prior art, the invention has at least the following beneficial effects:

according to the interactive operation method based on the head action, when a screen interface is browsed, a face image is obtained and face recognition is carried out; when the recognized face is matched with a preset face, obtaining a depth image of the head action and recognizing the head action; according to the head action recognition result, corresponding interactive operation is executed on the screen interface, so that the problem that a user cannot effectively perform touch operation on the screen interface when browsing the screen interface of the terminal is solved, and the interactive operation can be performed based on the head action.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flowchart of an interactive operation method based on head movement according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a head skeleton information capture node according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a head-up motion model, a head nodding motion model, a left head swinging motion model, and a right head swinging motion model according to a first embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a flow of interaction operations based on head movements according to a second embodiment of the present invention;

fig. 5 is a block diagram of an interactive operation device based on head movement according to a third embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Example one

Fig. 1 shows a flow chart of an interactive operation method based on a head action, which may be applied to a terminal such as a mobile phone and a tablet computer, and when browsing a screen interface of the terminal, if a user cannot effectively perform a touch operation on the screen interface, the method may be applied to perform an interactive operation based on the head action, and this embodiment provides an interactive operation method based on the head action, as shown in fig. 1, including the following steps:

step S110, when a screen interface is browsed, a face image is obtained and face recognition is carried out.

Further, when browsing a screen interface, acquiring a face image and performing face recognition, including:

when the screen interface is browsed, the face image at a preset position away from the screen interface is obtained, and face recognition is carried out on the face image. The obtaining of the face image at the preset position from the screen interface may be obtaining the face image at a position 20 cm to 30 cm from the screen interface. The face image at the preset position is collected, so that the situation that the user still carries out face recognition in the process of normally operating a screen interface by hands is avoided, and then head action recognition is triggered to influence the normal use terminal of the user.

Taking the terminal as a mobile phone as an example, the user places the screen of the mobile phone at a position 20 cm to 30 cm away from the front of the head to obtain a face image at a position 20 cm to 30 cm away from the screen interface, and specifically, the face image can be acquired by a camera of the mobile phone.

And step S120, when the recognized face is matched with a preset face, obtaining a depth image of the head action and recognizing the head action.

In practical application, after a face image at a distance of 20 cm to 30 cm from a screen interface is acquired, the screen of the mobile phone is kept at the position of 20 cm to 30 cm on the front face of the head for about 1 second, and a face recognition result can be obtained. It can be understood that the face recognition can be implemented by using the existing algorithm, and details are not described here.

In order to avoid that the head action of other people causes corresponding interactive operation to affect the use experience of the user, the embodiment only identifies the head action of the user matched with the preset face by performing face identification, and then triggers the corresponding interactive operation, so that the triggering efficiency of the interactive operation is improved, and meanwhile, the interactive operation is prevented from being triggered by the head action of a non-specific user by mistake.

The preset face can be shot in advance through a camera of the terminal to obtain a face image, the face image is input into the terminal in advance and then stored, and the face image can also be input into the terminal and then stored.

The user can set the preset face according to the requirement, on one hand, the face images of a plurality of users who often use the terminal can be input according to the user requirement, so that when the interactive operation is required through head action, the users can directly use the method to carry out face recognition verification and trigger the interactive operation. On the other hand, according to the user requirement, the input preset human face can be deleted or modified.

Further, when the recognized face is matched with the preset face in step S120, obtaining a depth image of the head motion and performing head motion recognition, the method may further include the following sub-steps:

and S121, when the recognized face is matched with a preset face, displaying a preset icon on a screen interface to prompt the head to act.

The preset icon can be used for prompting the user that the face recognition is successful and the head action can be performed. When the recognized face is matched with the preset face, the screen interface displays the preset icon to prompt the user that the face recognition is successful, and the user can start head action after seeing the preset icon.

And step S122, acquiring a depth image of the head motion and extracting the head motion track characteristics.

In practical application, the depth image of the head action can be acquired through the kinect depth sensor, and the head motion track characteristic is extracted through a characteristic extraction algorithm carried by the kinect depth sensor.

Specifically, head movements are acquired based on a kinect depth sensor, depth images of the head movements are collected, whether the depth images contain head movement information is judged, if the depth images contain the head movement information, a spatial three-dimensional coordinate is established through a head skeleton extraction algorithm carried by the kinect depth sensor, three-dimensional coordinates of 4 joint points are extracted from a human body in each frame of image, features of the obtained head skeleton are extracted, and head skeleton information capturing nodes are frontal bones, temporal bones, maxilla and mandible respectively, as shown in fig. 2.

And S123, comparing the extracted head motion track characteristics with the head motion track characteristics of the preset model.

The preset models at least include a head-up motion model, a nodding motion model, a left head-swinging motion model and a right head-swinging motion model as shown in fig. 3. The preset model is obtained by pre-training, and the preset model and the corresponding head motion track characteristics are stored in a head motion database for comparison.

Further, before the step S110 is executed to obtain a face image and perform face recognition when browsing the screen interface, the method further includes:

s100-1, establishing a preset model, wherein the input of the preset model is the head motion track characteristic, and the output is the head action; and

and S100-2, training a preset model by utilizing the existing head motion track characteristics and the corresponding head motion.

For example, a large amount of data of 4 head actions are collected and a preset model of the head actions is established, the preset model is used for training a terminal to understand operation intentions corresponding to the 4 head gestures, a head continuous action recognition system recognizes action instructions, sequence feature data of the head continuous actions are extracted to establish the preset model of the head actions, a model combining a deep confidence network and hidden markov is adopted to model the feature data, the head actions are recognized, four types of predefined actions (head raising, head nodding, left head swinging and right head swinging) are trained by one hidden markov model for each type of actions, one deep confidence network model is shared, and an effective action threshold value of each type of actions is obtained by training, namely, when the similarity is greater than the effective action threshold value for each type of actions, the effective actions are recognized.

Step S124, if the extracted head motion trajectory feature matches with the head motion trajectory feature of the preset model, the head motion recognition result is matching with the preset model.

It can be understood that, if the extracted head motion trajectory feature does not match the head motion trajectory feature of the preset model, the head motion recognition result is not matched with the preset model.

In practical application, whether the head motion trajectory features are matched or not can be determined through a matching threshold, for example, a certain number of feature points exist in the extracted head motion trajectory features and are consistent with feature points included in the head motion trajectory features of a preset model, if the proportion of the consistent number of feature points to the total number of the feature points of the head motion trajectory features is greater than a preset proportion, the two feature points are considered to be matched, the preset proportion is the matching threshold, the proportion reflects the similarity between the current head motion and the preset model, and the higher the proportion value is, the higher the similarity is. For example, the matching threshold of the head-up motion is 80%, if the proportion of the number of the consistent feature points to the total number of the feature points of the head motion trajectory features is 85%, the two feature points are considered to be matched, and the head motion corresponding to the currently extracted head motion trajectory features is the preset model.

It is understood that the effective action threshold of each type of action described in the previous example can be used as the matching threshold herein, and in practical applications, the matching thresholds of each type of action may be the same or different, and are not limited herein.

In some embodiments, the acquired depth image of the head movement is input into a head movement database which is acquired and recorded in advance for motion trajectory feature matching, if the acquired depth image of the head movement is smaller than a matching threshold, the acquired depth image of the head movement is not a corresponding preset model (head movement), the movement recognition fails, the acquired result is an abnormal form, if the acquired depth image of the head movement is not smaller than the matching threshold, the acquired result is a corresponding preset model (head movement), the movement recognition succeeds, the acquired result is a normal form, and the corresponding operation intention on the screen interface is continued.

And step S130, performing corresponding interactive operation on the screen interface according to the head action recognition result.

Specifically, when the head action recognition result is matched with the preset model, the interactive operation corresponding to the preset model is executed.

Wherein, the interactive operation at least comprises: page up, page down, page left, page right.

For example, a head-up motion corresponds to turning up the page, a nodding motion corresponds to turning down the page, a head-swinging motion corresponds to turning left the page, and a head-swinging motion corresponds to turning right the page. The following table specifically shows:

head movement	Interactive operation
		Head swings leftwards	Screen interface left page turning
Head swings to the right	Screen interface right page
		Nodding head	Screen interface slide down
Raising head	Screen interface slide upwards

For example, a user wants to control the screen interface of the mobile phone to page to the left, when the head is aligned to a position 25 cm away from the screen interface of the mobile phone, face recognition is successful for about 1 second, the screen interface displays a head movement recognition icon (preset icon), then the head of the user swings to the left, and the screen interface of the mobile phone starts page turning to the left by recognizing the head movement.

In some embodiments, a head motion track feature can be recognized by using a head motion recognition tracking algorithm, head detection is performed through a kinect depth sensor, the head motion track feature is extracted, similarity calculation of head motion is performed on a trained preset model, the similarity is larger than a preset matching threshold, the position of the head is predicted through particle filtering, head motion tracking is achieved, and then interactive operations such as sliding up and page turning are performed on a screen interface of the terminal.

It can be understood that according to the user requirement, the interactive operation with the screen interface can be realized by training more other head action models and corresponding to more different interactive operations.

Example two

Fig. 4 is a schematic diagram illustrating an interactive operation flow based on head movements, and this embodiment exemplifies an interactive operation process implemented with a terminal screen interface based on each type of head movements:

(1) the head swings to the left, and the screen interface turns to the left in a specific processing mode as follows;

step S1.0: collecting a large amount of data of leftward head swing and establishing a model for training a terminal to understand an operation intention corresponding to leftward head swing so as to control a screen interface to turn left;

step S1.1: when entering a head action recognition scene, recognizing the current human face and positioning the collected motion track of the target head swinging leftwards;

step S1.2: extracting characteristics of the collected motion trail of the target head swinging leftwards, and detecting whether the collected motion trail of the target head swinging leftwards is the same as the head action characteristics in a head action library collected and recorded by a database;

step S1.3: if the currently collected head gesture motion trail is the same as the head action characteristics collected and recorded by the database, controlling the terminal screen interface in a response control mode, and executing interactive operation corresponding to the leftward swing of the head to process a page turning event of the terminal equipment screen interface to the left;

step S1.4: if the currently collected head gesture motion track is different from the head action characteristics collected and recorded by the database, the characteristic comparison fails, and the corresponding processing event cannot be executed.

(2) The head swings to the right, and the screen interface turns to the right in a specific processing mode as follows;

step S2.0: collecting a large amount of data of head swing to the right and establishing a model for training a terminal to understand an operation intention corresponding to the head action so as to control a screen interface to turn pages to the right;

step S2.1: if the scene of head action recognition is entered, recognizing the current face and positioning the collected motion track of the target head swinging to the right;

step S2.2: extracting the characteristics of the collected motion trail of the target head swinging rightwards, and detecting whether the collected motion trail of the target head swinging rightwards is the same as the head action characteristics in a head action library collected and recorded by a database;

step S2.3: if the motion trail of the currently collected head action is the same as the head action characteristic collected and recorded by the database, controlling the terminal screen interface in a response control mode, and executing interactive operation corresponding to the head swinging to the right to process a page turning event of the terminal equipment screen interface to the right;

step S2.4: if the currently collected head gesture motion track is different from the head action characteristics collected and recorded by the database, the characteristic comparison fails, and the corresponding processing event cannot be executed.

(3) The head part performs nodding action, and the specific processing mode of downward sliding of the screen interface is as follows;

step S3.0: collecting a large amount of data of head nodding actions and establishing a model for training a terminal to understand operation intentions corresponding to the head actions so as to control a screen interface to slide downwards;

step S3.1: if the scene of head action recognition is entered, recognizing the current face and positioning the collected motion track of the nodding of the target head;

step S3.2: extracting the characteristics of the collected motion track of the nodding of the target head, and detecting whether the collected motion track of the nodding of the target head is the same as the head action characteristics collected and input by the database;

step S3.3: if the motion trail of the currently collected head action is the same as the motion trail of the head action collected and recorded by the database, controlling a terminal screen interface in a response control mode, and executing interactive operation corresponding to head nodding to process a screen interface downward sliding event of the terminal equipment;

step S3.4: if the motion trail of the head action collected at present is different from the head posture motion trail library collected and recorded by the system library, the feature comparison fails, and the corresponding processing event cannot be executed.

(4) The head makes a head-up action gesture, and the screen interface of the terminal equipment slides upwards in a specific processing mode as follows;

step S4.0: collecting a large amount of data of head raising actions and establishing a model for training a terminal to understand an operation intention corresponding to the head actions so as to control a screen interface to slide upwards;

step S4.1: if the scene of head action recognition is entered, recognizing the current face and positioning the motion trail of the collected target head raising action;

step S4.2: performing feature extraction on the collected motion trail of the head raising action of the target, and detecting whether the collected motion trail of the head raising action of the target is the same as the head action feature collected and recorded by the database;

step S4.3: if the motion trail of the currently acquired head action is the same as the motion trail of the head action acquired and recorded by the system library, controlling a terminal screen interface in a response control mode, and executing corresponding interactive operation to process an upward sliding event of the terminal equipment screen interface;

step S4.4: if the motion trail of the head action collected at present is different from the motion trail of the head action collected and recorded by the system library, the feature comparison fails, and the corresponding processing event cannot be executed.

The method comprises the steps of collecting a large amount of data of 4 head actions and establishing a preset model of the head actions, wherein the preset model is used for training a terminal to understand operation intentions corresponding to the 4 head gestures, automatically detecting, matching and recognizing the detected head through a face biological recognition technology, acquiring a head action instruction through a Kinect depth sensor after detecting that the face is successfully matched with the stored face, collecting a depth image of the head actions through a Kinect sensor, adjusting a data processing area in real time, and improving tracking efficiency and precision; for example, if the head performs a nodding action, the interactive operation of turning down the page on the screen interface of the mobile phone can be realized; the head is raised, so that the interactive operation of turning up the page on the screen interface of the mobile phone can be realized.

EXAMPLE III

Correspondingly to the embodiment, the embodiment provides an interactive operation device based on head movement, as shown in fig. 5, including the following modules:

a face recognition module 510, configured to obtain a face image and perform face recognition when browsing a screen interface;

the motion recognition module 520 is configured to, when the recognized face matches a preset face, acquire a head motion and perform head motion recognition;

and the executing module 530 is configured to execute corresponding interactive operations on the screen interface according to the head action recognition result.

It is understood that the face recognition module 510 may be configured to execute the step S110 in the first embodiment, the action recognition module 520 may be configured to execute the step S120 in the first embodiment, and the execution module 530 may be configured to execute the step S130 in the first embodiment.

Example four

The present invention provides a storage medium comprising: the storage medium stores thereon a computer program which, when executed by one or more processors, implements the head action-based interoperation method provided by an embodiment.

In this embodiment, the storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

EXAMPLE five

The present invention provides a terminal, including: the head-action-based interactive operation method comprises a memory and a processor, wherein the memory stores a computer program, and the computer program realizes the head-action-based interactive operation method provided by the embodiment when being executed by the processor.

In this embodiment, the terminal may be a mobile phone, a tablet computer, etc., and the Processor may be implemented by an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and is configured to execute the method in the foregoing embodiments. The specific embodiment provided in the first embodiment of the present invention may be referred to as a head-action-based interactive operation method implemented when a computer program running on a processor is executed, and details are not described here again.

The above terminal, still include:

and the camera is connected with the processor and is used for acquiring a human face image when a screen interface is browsed.

And the depth sensor is connected with the processor and used for acquiring a depth image of the head action when the recognized face is matched with a preset face. The depth sensor may be a kinect depth sensor, but is not limited thereto.

In the embodiment, the camera is combined with the sensor, the camera collects the face image, the processor identifies the face image, when the identified face image is matched with the preset image, the depth sensor collects the depth image of the head action, identifies the head actions (nodding, raising, left-swinging, right-swinging) and the like, and executes the interactive operation corresponding to the head action according to the identification result, so that when the screen interface of the terminal is browsed, a user cannot effectively perform touch operation on the screen interface, and the interactive operation can be performed based on the head action.

In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. The system and method embodiments described above are merely illustrative.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of interactive operation based on head movements, comprising:

2. The method of claim 1, wherein the obtaining a face image and performing face recognition while browsing a screen interface comprises:

3. The method of claim 1, wherein when the recognized face matches a preset face, acquiring a depth image of a head motion and performing head motion recognition, comprising:

4. The method of claim 1, further comprising:

5. The method according to claim 1, wherein the performing corresponding interactive operation on the screen interface according to the head action recognition result comprises:

6. A head-action-based interactive operation device, comprising:

7. A storage medium having stored thereon a computer program which, when executed by one or more processors, implements the head-action-based interoperation method according to any one of claims 1 to 5.

8. A terminal, characterized in that it comprises a memory and a processor, the memory having stored thereon a computer program which, when executed by the processor, implements the head action based interoperation method of any one of claims 1 to 5.

9. The terminal of claim 8, further comprising:

10. The terminal of claim 8, further comprising: