WO2019134527A1

WO2019134527A1 - Method and device for man-machine interaction, medium, and mobile terminal

Info

Publication number: WO2019134527A1
Application number: PCT/CN2018/122308
Authority: WO
Inventors: 陈岩; 刘耀勇
Original assignee: Oppo广东移动通信有限公司
Priority date: 2018-01-03
Filing date: 2018-12-20
Publication date: 2019-07-11
Also published as: CN108241434B; CN108241434A

Abstract

Disclosed in an embodiment of the present invention are a method and device for man-machine interaction, a medium, and a mobile terminal. The method comprises: upon detection of activation of a target application, controlling a 3D depth camera to acquire facial information; determining a user state according to the facial information; and determining a control instruction according to the user state, and controlling the target application according to the control instruction.

Description

Human-computer interaction method, device, medium and mobile terminal

The present application claims priority to Chinese Patent Application No. 201810005036.7, filed on Jan. 3, 2011, the entire disclosure of which is hereby incorporated by reference.

Technical field

The embodiments of the present invention relate to the technical field of mobile terminals, for example, to a human-computer interaction method, device, medium, and mobile terminal.

Background technique

With the development of mobile terminal technology, the use of mobile terminals is no longer limited to calling and sending information. More and more users install applications such as video players, music players and e-readers in mobile terminals. easy to use.

In the related art, the control of the application is usually a manual control mode. During the use of the application, the user usually needs to input some simple operations repeatedly, affecting the convenience of human-computer interaction, and is prone to false touches.

Summary of the invention

The embodiment of the present application provides a human-computer interaction method, device, medium, and mobile terminal, which can optimize a human-computer interaction solution and improve the convenience and accuracy of application control.

The embodiment of the present application provides a human-computer interaction method, including:

Controlling a three-dimensional (3 Dimensions, 3D) depth camera to acquire facial information, wherein the facial information includes a facial image having depth of field information, in the case where the target application is detected to be activated;

Determining a user status based on the facial information;

Determining a control indication according to the user state, and controlling the target application according to the control indication.

The embodiment of the present application further provides a human-machine interaction device, and the device includes:

An information acquiring module, configured to control a 3D depth camera to acquire facial information when detecting that the target application is activated, wherein the facial information includes a facial image having depth information;

a state determining module, configured to determine a user state according to the face information;

The application control module is configured to determine a control indication according to the user state, and control the target application according to the control indication.

The embodiment of the present application further provides a computer readable storage medium storing a computer program, which is executed by a processor to implement the human-computer interaction method as described above.

The embodiment of the present application further provides a mobile terminal, including a 3D depth camera, a memory, a processor, and a computer program stored in the memory and operable on the processor, the 3D depth camera including a normal camera and an infrared camera, configured to shoot A face image having depth of field information, the processor implementing the human computer interaction method as described above when executing the computer program.

The human-computer interaction solution provided by the embodiment of the present application tracks the user's face based on the facial image with the depth information, thereby obtaining the motion state of the user's head, and determining the corresponding relationship by the correspondence between the preset control indication and the user state. Controlling the indication, and further controlling the target application according to the control indication. Since the user image has depth information, more detailed information can be detected, the accuracy of the motion detection is improved, and the application is prevented from being mistakenly caused by the user's accidental touch. The problem has improved the accuracy and convenience of human-computer interaction, enabling mobile terminals to "see" users, improve the intelligence of human-computer interaction, and enrich the application scenarios of human-computer interaction functions.

DRAWINGS

FIG. 1 is a flowchart of a human-computer interaction method according to an embodiment;

2 is a flowchart of another human-computer interaction method according to an embodiment;

FIG. 3 is a schematic diagram of a solution for calculating a reference offset angle according to an embodiment; FIG.

4 is a structural block diagram of a human-machine interaction apparatus according to an embodiment;

FIG. 5 is a structural block diagram of a mobile terminal according to an embodiment; FIG.

FIG. 6 is a structural block diagram of a smart phone according to an embodiment.

Detailed ways

The present application will be described below in conjunction with the accompanying drawings and embodiments. The specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. In addition, for the convenience of description, only some but not all of the structures related to the present application are shown in the drawings.

FIG. 1 is a flowchart of a human-computer interaction method according to an embodiment. The method can be performed by a human-machine interaction device, wherein the device can be implemented by software and/or hardware, and can generally be integrated in a mobile terminal, such as a mobile terminal having a 3D depth camera. As shown in FIG. 1, the method includes the following steps.

Step 110: Control the 3D depth camera to acquire facial information when detecting that the target application is started.

In the case of initializing the human-computer interaction function, the user is prompted to input an application to be controlled by the face information, recorded as a target application, and the target application is stored in the white list. Target applications include video applications, audio applications, and e-books. In an embodiment, the target application may also be an application default by the system, and the target application is configured in the form of a configuration file in the mobile terminal before the mobile terminal leaves the factory.

In this embodiment, the 3D depth camera can be used to capture an image with depth of field information, can detect a variety of user actions, and provides various control actions for the target application, enriching the types of control actions. In an embodiment, the 3D depth camera includes a depth camera based on structured light depth ranging and a depth camera based on Time Of Flight (TOF) ranging.

For example, a depth camera based on structured light depth ranging includes a conventional camera (eg, may be a Red Green Blue (RGB) camera) and an infrared camera (which may be an infrared camera). The infrared camera projects a certain mode of the light structure into the scene to be photographed, and forms a three-dimensional image of the light strip modulated by the person or object in the scene on the surface of the person or object in the scene, and then detects the light by the ordinary camera. A three-dimensional image of the strip can be obtained by a three-dimensional image. The degree of distortion of the light bar depends on the relative position between the ordinary camera and the infrared camera and the surface profile or height of the person or object in the scene to be photographed. Since the relative position between the ordinary camera and the infrared camera in the depth camera is constant, the image coordinates of the two-dimensional distorted image of the light bar can reproduce the three-dimensional contour of the surface of the person or object in the scene, thereby obtaining Depth information. Structured light depth ranging has high resolution and measurement accuracy, which can improve the accuracy of acquired depth information.

In an embodiment, the 3D depth camera may also be a depth camera based on TOF ranging, and the phase change of the modulated infrared light emitted from the light emitting unit to the object and reflected back from the object is recorded by the sensor within a range of wavelengths. According to the speed of light, the depth of the entire scene can be obtained in real time. The depth position of the person or object in the scene to be photographed is different, so the time taken to modulate the infrared light from the time of sending to receiving is different, so that the depth information of the scene can be obtained. The depth camera based on TOF depth ranging calculates the depth information without being affected by the gray level and features of the surface of the object, and can quickly calculate the depth information, which has high real-time performance.

In this embodiment, the face information includes a face image having depth information. In this embodiment, the state of the target application is monitored by the mobile terminal. If it is detected that the target application is started, the operation of opening the 3D depth camera is performed in parallel with the startup operation of the target application. After the 3D depth camera is turned on, the 3D depth camera is controlled to shoot the user. In this embodiment, a face image is obtained by photographing a face of the user through a 3D depth camera. If it is detected that the user's complete facial image is not acquired by the 3D depth camera, the user is prompted to adjust the facial gesture. In an embodiment, a prompt box may be displayed in the preview interface of the camera to prompt the user to align the face with the prompt box.

In an embodiment, the method of controlling the 3D depth camera to capture the user may be to control the 3D depth camera to capture the face of the user according to the set period to obtain a multi-frame facial image.

Step 120: Determine a user status according to the facial information.

In an embodiment, the user state corresponding to the preset control indication is preset, including but not limited to a control instruction such as turning the page or cutting the song by the user's left and right swinging heads, and the user turns the head to the set position and stays. The state of the set time corresponds to the control instruction of the video fast forward, and the head offset angle of the user exceeds the set angle threshold corresponding to the control instruction of the video switching. For example, the user swings the head to the left and right, corresponding to the page turning instruction of the electronic book, that is, the user swings the head to the right corresponding to the control instruction "next page", and the user swings the head to the left corresponding to the control instruction "previous page". For another example, the offset angle of the head to the right to the set position is less than the set angle threshold, and if the dwell time at the set position belongs to the set time interval, the video played in the target video application is fast forwarded for the first time. length. If the offset angle of the user's head to the right to the set position is less than the set angle threshold, and the dwell time at the set position exceeds the set time threshold, the control video continues to fast forward until the user state is detected to change. The fast forward operation of the video is stopped.

In this embodiment, the offset angle of the face is determined based on the depth information of the face image. Since the depth of field information reflects the spatial positional relationship of the face pixels, the depth of the face can be calculated by the depth of field information. In an embodiment, the position of both eyes in the facial image is identified, and the facial symmetry axis is determined from both eyes. When the face is facing the 3D depth camera, since the distance between the left face region and the right face region and the 3D depth camera are substantially the same, the set sampling points of the left face region and the right face region are respectively extracted, and the depth of field of the set sampling point is set. The information is basically the same. If the user's head is deflected, the depth information of the left face area and the right face area will change accordingly, so that the left face area and the right face area are in different depth planes, and thus the depth information of the set sampling point is no longer the same. . The offset angle of the face can be calculated based on the triangular relationship of the depth information of the left face region and the right face region. In an embodiment, a set number of set sampling points are respectively selected from the left face region, and the same number of set sampling points are selected from the corresponding right face region to form a set sampling point pair, according to the set sampling point. For the depth of field information, the inverse tangent function is used to calculate the reference offset angle of each pair of set sampling point pairs, calculate the average value of the reference offset angle, and use the average value of the reference offset angle as the offset angle of the face. In an embodiment, the pixel points corresponding to the left eye corner and the right corner of the nose can be selected to form a set sampling point pair, and the set line of the left eye corner and the right corner corner adjacent to the side of the nose bridge can be respectively selected. Set the corresponding sampling point on the line perpendicular to the line connecting the eyes, and so on.

In an embodiment, there are many ways to determine the state of the user according to the facial information, which is not specifically limited. For example, the facial image of the user's face facing a plurality of preset angles may be pre-photographed and used as an image template. storage. In the case where it is necessary to determine the state of the user based on the face information, image matching may be performed based on the captured face image and the image template to determine the offset angle of the face.

In this embodiment, the start time of the user's head rotation and the time when the user's head stops rotating can be determined by comparing the face images corresponding to the two adjacent shooting times. In the case where it is detected that the user's head stops rotating, the offset angle of the face is determined based on the depth information of the face image at that time. In addition, in the case where it is detected that the user's head stops rotating, the trigger timer is started, the timing is started, and the timing is stopped when the user's head is detected to move again, to record the time at which the head stays at the position corresponding to the offset angle. .

Step 130: Determine a control indication according to the user state, and control the target application according to the control indication.

In an embodiment, the control indication is an operation indication corresponding to the control instruction of the target application, including but not limited to fast forward, backward, switch to the next file, switch to the previous file, and page turning. The user status corresponding to the preset control indication is preset, and the control indication is stored in the white list in association with the user status.

In this embodiment, after determining the user status, the mobile terminal queries the preset white list according to the user status, and may determine a control indication corresponding to the user status, and determine an instruction corresponding to the control indication, where the instruction may be used by the target application. Identify and execute, send the command to the target application. The target application, upon receiving the instruction, performs an operation corresponding to the instruction in response to the control indication corresponding to the instruction. For example, during the running of the target video application, it is detected that the user's head is shifted to the right to a set angle, and stays at the position corresponding to the set angle for 3 seconds (s), assuming that the set angle is smaller than Set the angle threshold and the dwell time belongs to the set time interval, then determine that the control instruction is to control the video fast forward for 5 minutes (the time is not limited to 5 minutes, the system default time can also be set by the user), and the control is sent. Indicates the corresponding command to the target video application to control the currently playing video file to fast forward for 5 minutes. For another example, during the running of the target video application, detecting that the user's head is shifted to the right to a set angle, and if the set angle exceeds the set angle threshold, determining that the control indication is to switch the video (ie, playing under An episode), sending the control to indicate the corresponding instruction to the target video application to control playback of the next episode of the current video.

The technical solution of the embodiment controls the 3D depth camera to acquire the face information by detecting the startup of the target application; determining the user state according to the face information; determining the control indication according to the user state, and determining the target application according to the control instruction Controlling, realizing tracking of the user's face based on the facial image having the depth information, thereby obtaining the motion state of the user's head, determining the corresponding control indication by the correspondence between the preset control indication and the user state, and further, according to the The control instruction controls the target application. Since the user image has deep information, more detailed information can be detected, the accuracy of the motion detection is improved, and the problem of the application being mis-responsive due to the user's accidental touch is avoided, and the human-machine is improved. The accuracy and convenience of the interaction enable the mobile terminal to "see" the user, improve the intelligence of human-computer interaction, and enrich the application scenario of the human-computer interaction function.

In an embodiment, when it is detected that the user uses the human-computer interaction function for the first time, the corresponding relationship between the user state and the control indication is displayed in a manner of guiding the interface to prompt the user to input the control action.

FIG. 2 is a flowchart of another human-computer interaction method according to an embodiment. As shown in FIG. 2, the method includes the following steps.

Step 210: Control a normal camera included in the 3D depth camera to acquire a two-dimensional image corresponding to the face according to a set period.

In this embodiment, the 3D depth camera includes a normal camera and an infrared camera. When it is detected that the user starts an application, the application identifier (which may be a package name or a process name, etc.) of the application is obtained, and the preset whitelist is queried according to the application identifier to determine whether the application is a target application. program. In the case where the application is the target application, the normal camera is controlled to be turned on, and the two-dimensional image corresponding to the face is photographed according to the set period. In an embodiment, after controlling the normal camera to be turned on, detecting whether a human face is included in the preview image, and if detecting that the preview image includes a human face, controlling the normal camera to capture the two-dimensional image corresponding to the face according to the set period, if It is detected that the face is not included in the preview screen, and the user is prompted to adjust the face gesture until the face is detected in the preview screen. It is determined whether the user turns the head by comparing the two-dimensional images of the adjacent shooting moments. When it is detected that the user turns the head, a two-dimensional image corresponding to the face of one frame is taken as the first image of the starting time. Sequentially acquiring the currently captured two-dimensional image, and comparing the currently captured two-dimensional image with the original image of the previous shooting time to determine the head motion stop time, and shooting a frame when detecting that the head motion is stopped The two-dimensional image corresponding to the face is recorded as the second image.

Step 220: Determine facial features corresponding to the two-dimensional image.

In this embodiment, the contour detection technology is used to detect the face region included in the two-dimensional image, and the contour of the face is determined. Further, the face area is determined according to the contour of the face.

This embodiment does not limit the meaning of the facial features, and the facial features may also be the proportion of the face pixels in the preview image. For example, determining a face area included in the two-dimensional image, thereby obtaining a maximum longitudinal resolution of a long side direction of the touch screen parallel to the mobile terminal in the face area, and acquiring a short side direction of the touch screen parallel to the mobile terminal in the face area The maximum lateral resolution, the size corresponding to the face region is obtained according to the maximum vertical resolution and the maximum lateral resolution, and the size corresponding to the face region is divided by the size of the touch screen to obtain the proportion of the face pixel in the preview image. .

Step 230: Determine whether the two-dimensional image satisfies the setting condition according to the facial feature. If the setting condition is satisfied, step 240 is performed, and if the setting condition is not met, step 210 is returned.

Determining a face area difference between the first image and the second image, and comparing the face area difference with a set threshold, and determining, according to the comparison result, whether the two-dimensional image satisfies a set condition. In an embodiment, if the face area difference is less than a set threshold, determining that the two-dimensional image does not satisfy the setting condition, preventing the user from detecting a small amount of head change and causing a false control situation, and improving the movement The control accuracy of the terminal, for example, can prevent the user from triggering a false alarm by sneezing while watching a video or reading an e-book. When the face area difference exceeds a set threshold, it is determined that the two-dimensional image satisfies the setting condition.

Step 240: Turn on the infrared camera included in the 3D depth camera, and take a facial image through the infrared camera and the common camera to turn off the infrared camera.

When the two-dimensional image satisfies the setting condition, the infrared camera included in the 3D depth camera is turned on, and the face information of the head motion stop time is captured by the infrared camera to obtain a depth image, and the image is captured by the ordinary camera. The two-dimensional image corresponding to at least one frame of the face forms a three-dimensional facial image from the depth image and the re-photographed two-dimensional image.

In this embodiment, in the process of user state detection, the end of the facial motion and the single facial motion is usually detected by an ordinary camera. The single facial motion may be a motion process including the above-described start time to the head motion stop timing, and the end point of the single facial motion is the head motion stop timing. When the end point is detected, the infrared camera is turned on to capture a three-dimensional facial image, and after the depth image is captured by the infrared camera, the infrared camera is turned off, and the power consumption of the mobile terminal can be reduced.

In an embodiment, the second image captured by the normal camera at the time of stopping the head motion and the depth image captured by the infrared camera may also constitute a three-dimensional facial image.

Step 250: Determine a user state according to the three-dimensional facial image.

Determining an offset angle of the face according to the depth information corresponding to the three-dimensional facial image, and recording a time when the head stays at the head motion stop position, the user state including the offset angle and the time at which the head stays at the head motion stop position .

The three-dimensional facial image is identified, and the position of the facial features in the three-dimensional image is determined, thereby determining the symmetry axis of the human face region and the face region. The face area is divided into a left face area and a right face area by the symmetry axis. Extracting a set number of feature points from the set position of the left face region, and determining a mirror feature point of the feature point in the right face region based on the symmetry axis, and configuring the sample point pair by the feature point and the mirror feature point . Obtain the depth of field information of each pair of set sampling point pairs, and set the distance between the feature points in the pair of sampling points and the mirrored feature points, and use the inverse tangent function to calculate the reference offset angle of each pair of set sampling point pairs. Take a pair of set sampling points as an example to illustrate the calculation scheme of the reference offset angle. FIG. 3 is a schematic diagram of a solution for calculating a reference offset angle according to an embodiment. As shown in FIG. 3, L1 and L2 are the distances between the feature point 320 and the mirrored feature point 330 to the 3D depth camera 310, respectively, which are feature points. 320 is the depth of field information corresponding to the mirrored feature point 330, and W is the distance between the feature point 320 and the mirrored feature point 330. Assuming that the user's head is deflected to the left, the axis of symmetry AB changes from the first position 340 to the corresponding second position 350, and the feature point 320 and the mirrored feature point 330 are symmetric about the axis of symmetry AB at the second position, with the axis of symmetry AB The offset angle is the reference offset angle α corresponding to the feature point 320 and the mirror feature point 330, and the reference offset angle α can be calculated by the following formula: α=arctan(L2-L1)/W.

In this embodiment, the reference offset angle of each pair of set sampling point pairs can be calculated by using the above formula, so that the offset angle of the face is determined according to the reference offset angle. For example, the average of the reference offset angles can be calculated as the offset angle of the face. For another example, the reference offset angle may be arranged in descending order, the maximum reference offset angle is used as the offset angle of the face, and the minimum reference offset angle or the reference offset angle at the middle of the queue may be used as the face. Offset angle.

Step 260: Query a preset whitelist according to the user status, and determine a control indication corresponding to the user status.

In this embodiment, the user state includes an offset angle of the face and a time at which the head stays at the position corresponding to the offset angle.

Step 270: Send the control instruction corresponding instruction to the target application.

The technical solution of the embodiment obtains a two-dimensional image corresponding to the face according to a set period by controlling a common camera included in the 3D depth camera, and opens the infrared included in the 3D depth camera if the two-dimensional image satisfies the setting condition. The camera captures the face information of the head movement stop moment by the infrared camera to obtain a depth image, and realizes the end point of the facial motion and the single facial motion by the ordinary camera first, and when the end point is detected, Turn on the infrared camera to capture 3D facial images, which can reduce the power consumption of the mobile terminal and extend the battery life. In addition, determining whether the two-dimensional image satisfies the setting condition can effectively prevent the erroneous detection from causing erroneous control of the target application, and improving the control accuracy of the mobile terminal.

FIG. 4 is a structural block diagram of a human-machine interaction apparatus according to an embodiment. The device may be implemented in software and/or hardware, and may be integrated into a mobile terminal, such as a mobile terminal having a 3D depth camera, configured to perform the human-computer interaction method provided by the embodiment. As shown in FIG. 4, the apparatus includes: an information acquisition module 410 configured to control a 3D depth camera to acquire facial information when detecting that the target application is activated, wherein the facial information includes a facial image having depth information; The state determination module 420 is configured to determine a user state according to the face information; the application control module 430 is configured to determine a control indication according to the user state, and control the target application according to the control indication.

The human-machine interaction device provided in this embodiment tracks the user's face based on the facial image having the depth information, thereby obtaining the motion state of the user's head, and determining the corresponding control by the correspondence between the preset control indication and the user state. Instructing, and further, controlling the target application according to the control instruction, since the user image has depth information, more detailed information can be detected, the accuracy of the motion detection is improved, and the application is prevented from being mis-responsive due to user error. The problem is to improve the accuracy and convenience of human-computer interaction, so that the mobile terminal can "see" the user, improve the intelligence of human-computer interaction, and enrich the application scenario of human-computer interaction function.

In an embodiment, the information acquisition module 410 includes: a two-dimensional image acquisition sub-module, configured to control the normal camera included in the 3D depth camera to acquire the face corresponding to the second period according to the set period when the target application is detected to be activated. a dimension image capturing sub-module configured to turn on an infrared camera included in the 3D depth camera and to capture a facial image through the infrared camera and the normal camera if the two-dimensional image satisfies a setting condition.

In an embodiment, the facial image capturing sub-module is further configured to turn off the infrared camera after the facial image is captured by the infrared camera and the normal camera.

In an embodiment, the apparatus further includes: a feature determining module, configured to determine a facial feature corresponding to the two-dimensional image after controlling a normal camera included in the 3D depth camera to acquire a two-dimensional image corresponding to the face according to a set period The condition determination module is configured to determine whether the two-dimensional image satisfies the setting condition according to the facial feature.

In an embodiment, the condition determining module is configured to: determine a difference in face area between the first image and the second image, wherein the first image is a two-dimensional image captured by the start time of the head motion, The two images are two-dimensional images captured by the head motion stop timing; the face area difference is compared with a set threshold, and whether the two-dimensional image satisfies the setting condition is determined according to the comparison result.

In an embodiment, the facial image capturing sub-module is configured to capture a facial image by the infrared camera and the ordinary camera by: capturing the facial information of the head motion stop moment by the infrared camera to obtain a depth image. The depth image and the second image constitute the face image.

In an embodiment, the state determining module 420 is configured to determine an offset angle of the face according to the depth information of the face image, and record a time at which the head stays at a position corresponding to the offset angle.

In an embodiment, the application control module 430 is configured to: query a preset whitelist according to the user status, and determine a control indication corresponding to the user status, where the control indication includes fast forward, backward, and switch Going to the next file, switching to the previous file, and turning the page; sending the instruction corresponding to the control instruction to the target application, wherein the instruction is used to indicate that the target application responds to the control indication, Target applications include video applications, audio applications, and e-books.

In an embodiment, the two-dimensional image acquisition sub-module is configured to: control a normal camera included in the 3D depth camera to be turned on, and detect whether a human face is included in the preview image; if it is detected that the preview image includes a human face, then control The normal camera acquires a two-dimensional image corresponding to the face according to a set period.

The embodiment further provides a storage medium comprising computer executable instructions for executing a human-computer interaction method when executed by a computer processor, the method comprising: detecting that the target application is launched a case where the control 3D depth camera acquires face information, wherein the face information includes a face image having depth information; determining a user state according to the face information; determining a control indication according to the user state, and according to the control instruction Control the target application.

Storage medium - any of at least one type of memory device or storage device. The term "storage medium" is intended to include: a mounting medium such as a Compact Disc Read-Only Memory (CD-ROM), a floppy disk or a tape device; a computer system memory or a random access memory such as a dynamic random Random Random Access Memory (DRAM), Double Data Rate Random Access Memory (DDR RAM), Static Random Access Memory (SRAM), Extended Data Output Random Extended Data Output Random Access Memory (EDO RAM), Rambus Random Access Memory (RAM), etc.; non-volatile memory such as flash memory, magnetic media (such as hard disk or light) Storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or multiple types of memory combinations. Additionally, the storage medium may be located in a first computer system in which the program is executed in the first computer system, or may be in a different second computer system, the second computer system being coupled to the first computer system via a network, such as the Internet. The second computer system can provide program instructions to the first computer for execution. The term "storage medium" can include two or more storage media that can reside in different locations (eg, in different computer systems connected through a network). A storage medium may store program instructions (eg, program instructions implemented as a computer program) executable by one or more processors.

The storage medium includes computer-executable instructions, and the computer-executable instructions are not limited to the human-computer interaction operations as described above, and may also perform related operations in the human-computer interaction method provided by any embodiment of the present application. operating.

The embodiment provides a mobile terminal, and the mobile terminal has an operating system, and the human-machine interaction device provided in this embodiment can be integrated into the mobile terminal. The mobile terminal can be a smart phone, a tablet (PAD), and a handheld game console. FIG. 5 is a structural block diagram of a mobile terminal according to an embodiment. As shown in FIG. 5, the mobile terminal includes a 3D depth camera 510, a memory 520, and a processor 530. The 3D depth camera 510 includes a normal camera and an infrared camera configured to capture a facial image having depth information; the memory 520 is configured to store a computer program, a facial image, and an association relationship between a user state and a control indication, and the like; the processor 530 It is arranged to read and execute the computer program stored in the memory 520. The processor 530, when executing the computer program, implements the following steps: controlling the 3D depth camera to acquire facial information, in the case that the target application is detected to be activated, wherein the facial information includes a facial image having depth information; The facial information determines a user state; determines a control indication according to the user state, and controls the target application according to the control indication. The 3D depth camera, memory and processor listed in the above examples are all components of the mobile terminal, and the mobile terminal may also include other components. Taking a smart phone as an example, the possible structure of the above mobile terminal will be described. FIG. 6 is a structural block diagram of a smart phone according to an embodiment. As shown in FIG. 6, the smart phone may include: a memory 601, a central processing unit (CPU) 602 (also referred to as a processor, hereinafter referred to as a CPU), a peripheral interface 603, and a radio frequency (RF) circuit. 605, audio circuit 606, speaker 611, touch screen 612, camera 613, power management chip 608, input/output (I/O) subsystem 609, other input/control devices 610, and external port 604, one or more of these components The communication bus or signal line 607 is in communication.

6 shows that the smartphone 600 is merely an example of a mobile terminal, and the smartphone 600 may have more or fewer components than those shown in FIG. 6, two or more components may be combined, or may have Different component configurations. The various components shown in FIG. 6 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The following describes the smart phone of the human-machine interaction device provided in this embodiment.

The memory 601 can be accessed by the CPU 602, the peripheral interface 603, etc., and the memory 601 can include a high speed random access memory, and can also include a nonvolatile memory, such as one or more magnetic disk storage devices, flash memory devices. Or other volatile solid-state storage devices. The computer program is stored in the memory 611, and may also store face information, a white list corresponding to the association relationship between the user state and the control indication, a white list corresponding to the target application, and the like.

Peripheral interface 603, which can connect the input and output peripherals of the device to CPU 602 and memory 601.

I/O subsystem 609, which can connect input and output peripherals on the device, such as touch 612 and other input/control devices 610, to peripheral interface 603. I/O subsystem 609 can include display controller 6091 and one or more input controllers 6092 that are configured to control other input/control devices 610. Wherein, one or more input controllers 6092 receive electrical signals from other input/control devices 610 or transmit electrical signals to other input/control devices 610, and other input/control devices 610 may include physical buttons (press buttons, rocker buttons, etc.) ), dial, slide switch, joystick, click wheel. It is worth noting that the input controller 6092 can be connected to any of the following: a keyboard, an infrared port, a Universal Serial Bus (USB) interface, and a pointing device such as a mouse.

The touch screen 612 is an input interface and an output interface between the user terminal and the user, and displays the visual output to the user. The visual output may include graphics, text, icons, videos, and the like.

The camera 613 may be a 3D depth camera. The three-dimensional image of the face of the face is acquired by the camera 613, and the three-dimensional image of the face is converted into an electrical signal, and stored in the memory 601 through the peripheral interface 603.

Display controller 6061 in I/O subsystem 609 receives an electrical signal from touch screen 612 or an electrical signal to touch screen 612. The touch screen 612 detects the contact on the touch screen, and the display controller 6091 converts the detected contact into an interaction with the user interface object displayed on the touch screen 612, ie, realizes human-computer interaction, and the user interface object displayed on the touch screen 612 may be running. The icon of the game, the icon of the network to the corresponding network, and the like. In an embodiment, the device may also include a light mouse, which is a touch sensitive surface that does not display a visual output, or an extension of a touch sensitive surface formed by the touch screen.

The RF circuit 605 is mainly configured to establish communication between the mobile phone and the wireless network (ie, the network side), and implement data reception and transmission between the mobile phone and the wireless network. For example, sending and receiving short messages, emails, and the like. In an embodiment, the RF circuit 605 receives and transmits an RF signal, also referred to as an electromagnetic signal, and the RF circuit 605 converts the electrical signal into an electromagnetic signal or converts the electromagnetic signal into an electrical signal, and through the electromagnetic signal and communication network And other devices to communicate. RF circuitry 605 may include known circuitry for performing these functions, including but not limited to an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, CODER-DECoder (CODEC) chipset, Subscriber Identity Module (SIM), etc.

The audio circuit 606 is primarily configured to receive audio data from the peripheral interface 603, convert the audio data into an electrical signal, and transmit the electrical signal to the speaker 611.

The speaker 611 is arranged to restore the voice signal received by the mobile phone from the wireless network through the RF circuit 605 to sound and play the sound to the user.

The power management chip 608 is configured to provide power and power management for the hardware connected to the CPU 602, the I/O subsystem, and the peripheral interface.

The mobile terminal provided in this embodiment tracks the user's face based on the facial image having the depth information, thereby obtaining the motion state of the user's head, and determining the corresponding control indication by the correspondence between the preset control indication and the user state. Further, according to the control instruction, the target application is controlled. Since the user image has depth information, more detailed information can be detected, the accuracy of the motion detection is improved, and the application incorrect response caused by the user's accidental touch is avoided. The accuracy and convenience of human-computer interaction are improved, so that the mobile terminal can "see" the user, improve the intelligence of human-computer interaction, and enrich the application scenario of the human-computer interaction function.

The human-machine interaction device, the storage medium, and the mobile terminal provided in the foregoing embodiments may perform the human-computer interaction method provided by any embodiment of the present application, and have the corresponding functional modules and beneficial effects of executing the method. For the technical details that are not described in detail in the above embodiments, refer to the human-computer interaction method provided by any embodiment of the present application.

Claims

A human-computer interaction method includes:

Controlling the three-dimensional 3D depth camera to acquire facial information, wherein the facial information includes a facial image having depth of field information, in case the target application is detected to be activated;

Determining a user status based on the facial information;

Determining a control indication according to the user state, and controlling the target application according to the control indication.
The method of claim 1, wherein controlling the 3D depth camera to acquire facial information comprises:

Controlling a normal camera included in the 3D depth camera to acquire a two-dimensional image corresponding to the face according to a set period;

When the two-dimensional image satisfies the setting condition, the infrared camera included in the 3D depth camera is turned on, and the facial image is captured by the infrared camera and the normal camera.
The method of claim 2, after the capturing of the facial image by the infrared camera and the normal camera, further comprising: turning off the infrared camera.
The method according to claim 2 or 3, after controlling the normal camera included in the 3D depth camera to acquire the two-dimensional image corresponding to the face according to the set period, the method further includes:

Determining a facial feature corresponding to the two-dimensional image;

Whether the two-dimensional image satisfies the setting condition is determined according to the facial feature.
The method according to claim 4, wherein determining whether the two-dimensional image satisfies the setting condition according to the facial feature comprises:

Determining a difference in face area between the first image and the second image, wherein the first image is a two-dimensional image captured at a start time of the head motion, and the second image is obtained by photographing a stop motion of the head motion Two-dimensional image

Comparing the face area difference with a set threshold, and determining whether the two-dimensional image satisfies the setting condition according to the comparison result.
The method according to claim 5, wherein the capturing of the facial image by the infrared camera and the ordinary camera comprises:

The face information of the head motion stop time is captured by the infrared camera to obtain a depth image, and the depth image and the second image constitute the face image.
The method of any of claims 1-6, wherein determining a user status based on the facial information comprises:

Determining the offset angle of the face based on the depth information of the face image, and recording the time at which the head stays at the position corresponding to the offset angle.
The method according to any one of claims 1 to 7, wherein determining the control indication according to the user state and controlling the target application according to the control indication comprises:

Determining, according to the user status, a preset white list, and determining a control indication corresponding to the user status, where the control indication includes fast forward, backward, switch to a next file, switch to a previous file, and page turning;

And transmitting, by the target application, to the target application, where the target application includes a video application, an audio application, and an e-book.
The method according to any one of claims 2-6, wherein the controlling the 3D depth camera to acquire the two-dimensional image corresponding to the face according to the set period comprises:

Controlling a normal camera included in the 3D depth camera to be turned on, and detecting whether a human face is included in the preview image;

If it is detected that the preview image includes a human face, the normal camera is controlled to acquire a two-dimensional image corresponding to the face according to the set period.
A human-machine interaction device includes:

An information acquiring module, configured to control a three-dimensional 3D depth camera to acquire facial information when detecting that the target application is activated, wherein the facial information includes a facial image having depth information;

a state determining module, configured to determine a user state according to the face information;

The application control module is configured to determine a control indication according to the user state, and control the target application according to the control indication.
The apparatus of claim 10, wherein the information acquisition module comprises:

The two-dimensional image acquisition sub-module is configured to, when the target application is detected to be activated, control the ordinary camera included in the 3D depth camera to acquire the two-dimensional image corresponding to the face according to the set period;

The facial image capturing sub-module is configured to open an infrared camera included in the 3D depth camera and capture a facial image through the infrared camera and the normal camera if the two-dimensional image satisfies a setting condition.
The apparatus of claim 11, wherein the facial image capture sub-module is further configured to turn off the infrared camera after capturing a facial image through the infrared camera and the normal camera.
The apparatus of claim 11 or 12, further comprising:

a feature determining module, configured to determine a facial feature corresponding to the two-dimensional image after the normal camera included in the control 3D depth camera acquires the two-dimensional image corresponding to the face according to the set period;

The condition determination module is configured to determine whether the two-dimensional image satisfies a set condition according to the facial feature.
The apparatus of claim 13 wherein said condition determination module is configured to:

Determining a difference in face area between the first image and the second image, wherein the first image is a two-dimensional image captured at a start time of the head motion, and the second image is obtained by photographing a stop motion of the head motion Two-dimensional image

Comparing the face area difference with a set threshold, and determining whether the two-dimensional image satisfies the setting condition according to the comparison result.
The apparatus of claim 14, wherein the facial image capturing sub-module is configured to capture a facial image through the infrared camera and the normal camera by:

The face information of the head motion stop time is captured by the infrared camera to obtain a depth image, and the depth image and the second image constitute the face image.
The apparatus according to any one of claims 10-15, wherein the state determining module is configured to: determine an offset angle of a face according to depth information of the face image, and record a head at the offset The time at which the angle corresponds to the position.
The apparatus of any of claims 10-16, wherein the application control module is configured to:

Determining, according to the user status, a preset white list, and determining a control indication corresponding to the user status, where the control indication includes fast forward, backward, switch to a next file, switch to a previous file, and page turning;

And transmitting, by the target application, to the target application, where the target application includes a video application, an audio application, and an e-book.
The method according to any one of claims 11 to 15, wherein the two-dimensional image acquisition sub-module is set to:

Controlling a normal camera included in the 3D depth camera to be turned on, and detecting whether a human face is included in the preview image;

If it is detected that the preview image includes a human face, the normal camera is controlled to acquire a two-dimensional image corresponding to the face according to the set period.
A computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the human-computer interaction method according to any one of claims 1 to 9.
A mobile terminal includes a three-dimensional 3D depth camera, a memory, a processor and a computer program stored on the memory and operable on the processor, the 3D depth camera including a normal camera and an infrared camera, configured to have a photographing A facial image of the depth of field information, the processor executing the computer program to implement the human-computer interaction method according to any one of claims 1 to 9.