WO2019134527A1 - Method and device for man-machine interaction, medium, and mobile terminal - Google Patents

Method and device for man-machine interaction, medium, and mobile terminal Download PDF

Info

Publication number
WO2019134527A1
WO2019134527A1 PCT/CN2018/122308 CN2018122308W WO2019134527A1 WO 2019134527 A1 WO2019134527 A1 WO 2019134527A1 CN 2018122308 W CN2018122308 W CN 2018122308W WO 2019134527 A1 WO2019134527 A1 WO 2019134527A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
camera
face
depth
facial
Prior art date
Application number
PCT/CN2018/122308
Other languages
French (fr)
Chinese (zh)
Inventor
陈岩
刘耀勇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2019134527A1 publication Critical patent/WO2019134527A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the embodiments of the present invention relate to the technical field of mobile terminals, for example, to a human-computer interaction method, device, medium, and mobile terminal.
  • mobile terminals With the development of mobile terminal technology, the use of mobile terminals is no longer limited to calling and sending information. More and more users install applications such as video players, music players and e-readers in mobile terminals. easy to use.
  • control of the application is usually a manual control mode.
  • the user usually needs to input some simple operations repeatedly, affecting the convenience of human-computer interaction, and is prone to false touches.
  • the embodiment of the present application provides a human-computer interaction method, device, medium, and mobile terminal, which can optimize a human-computer interaction solution and improve the convenience and accuracy of application control.
  • the embodiment of the present application provides a human-computer interaction method, including:
  • the embodiment of the present application further provides a human-machine interaction device, and the device includes:
  • An information acquiring module configured to control a 3D depth camera to acquire facial information when detecting that the target application is activated, wherein the facial information includes a facial image having depth information;
  • a state determining module configured to determine a user state according to the face information
  • the application control module is configured to determine a control indication according to the user state, and control the target application according to the control indication.
  • the embodiment of the present application further provides a computer readable storage medium storing a computer program, which is executed by a processor to implement the human-computer interaction method as described above.
  • the embodiment of the present application further provides a mobile terminal, including a 3D depth camera, a memory, a processor, and a computer program stored in the memory and operable on the processor, the 3D depth camera including a normal camera and an infrared camera, configured to shoot A face image having depth of field information, the processor implementing the human computer interaction method as described above when executing the computer program.
  • a mobile terminal including a 3D depth camera, a memory, a processor, and a computer program stored in the memory and operable on the processor, the 3D depth camera including a normal camera and an infrared camera, configured to shoot A face image having depth of field information, the processor implementing the human computer interaction method as described above when executing the computer program.
  • the human-computer interaction solution provided by the embodiment of the present application tracks the user's face based on the facial image with the depth information, thereby obtaining the motion state of the user's head, and determining the corresponding relationship by the correspondence between the preset control indication and the user state. Controlling the indication, and further controlling the target application according to the control indication. Since the user image has depth information, more detailed information can be detected, the accuracy of the motion detection is improved, and the application is prevented from being mistakenly caused by the user's accidental touch. The problem has improved the accuracy and convenience of human-computer interaction, enabling mobile terminals to "see" users, improve the intelligence of human-computer interaction, and enrich the application scenarios of human-computer interaction functions.
  • FIG. 1 is a flowchart of a human-computer interaction method according to an embodiment
  • FIG. 2 is a flowchart of another human-computer interaction method according to an embodiment
  • FIG. 3 is a schematic diagram of a solution for calculating a reference offset angle according to an embodiment
  • FIG. 4 is a structural block diagram of a human-machine interaction apparatus according to an embodiment
  • FIG. 5 is a structural block diagram of a mobile terminal according to an embodiment
  • FIG. 6 is a structural block diagram of a smart phone according to an embodiment.
  • FIG. 1 is a flowchart of a human-computer interaction method according to an embodiment.
  • the method can be performed by a human-machine interaction device, wherein the device can be implemented by software and/or hardware, and can generally be integrated in a mobile terminal, such as a mobile terminal having a 3D depth camera.
  • the method includes the following steps.
  • Step 110 Control the 3D depth camera to acquire facial information when detecting that the target application is started.
  • Target applications include video applications, audio applications, and e-books.
  • the target application may also be an application default by the system, and the target application is configured in the form of a configuration file in the mobile terminal before the mobile terminal leaves the factory.
  • the 3D depth camera can be used to capture an image with depth of field information, can detect a variety of user actions, and provides various control actions for the target application, enriching the types of control actions.
  • the 3D depth camera includes a depth camera based on structured light depth ranging and a depth camera based on Time Of Flight (TOF) ranging.
  • TOF Time Of Flight
  • a depth camera based on structured light depth ranging includes a conventional camera (eg, may be a Red Green Blue (RGB) camera) and an infrared camera (which may be an infrared camera).
  • the infrared camera projects a certain mode of the light structure into the scene to be photographed, and forms a three-dimensional image of the light strip modulated by the person or object in the scene on the surface of the person or object in the scene, and then detects the light by the ordinary camera.
  • a three-dimensional image of the strip can be obtained by a three-dimensional image.
  • the degree of distortion of the light bar depends on the relative position between the ordinary camera and the infrared camera and the surface profile or height of the person or object in the scene to be photographed.
  • the image coordinates of the two-dimensional distorted image of the light bar can reproduce the three-dimensional contour of the surface of the person or object in the scene, thereby obtaining Depth information.
  • Structured light depth ranging has high resolution and measurement accuracy, which can improve the accuracy of acquired depth information.
  • the 3D depth camera may also be a depth camera based on TOF ranging, and the phase change of the modulated infrared light emitted from the light emitting unit to the object and reflected back from the object is recorded by the sensor within a range of wavelengths.
  • the speed of light the depth of the entire scene can be obtained in real time.
  • the depth position of the person or object in the scene to be photographed is different, so the time taken to modulate the infrared light from the time of sending to receiving is different, so that the depth information of the scene can be obtained.
  • the depth camera based on TOF depth ranging calculates the depth information without being affected by the gray level and features of the surface of the object, and can quickly calculate the depth information, which has high real-time performance.
  • the face information includes a face image having depth information.
  • the state of the target application is monitored by the mobile terminal. If it is detected that the target application is started, the operation of opening the 3D depth camera is performed in parallel with the startup operation of the target application. After the 3D depth camera is turned on, the 3D depth camera is controlled to shoot the user.
  • a face image is obtained by photographing a face of the user through a 3D depth camera. If it is detected that the user's complete facial image is not acquired by the 3D depth camera, the user is prompted to adjust the facial gesture.
  • a prompt box may be displayed in the preview interface of the camera to prompt the user to align the face with the prompt box.
  • the method of controlling the 3D depth camera to capture the user may be to control the 3D depth camera to capture the face of the user according to the set period to obtain a multi-frame facial image.
  • Step 120 Determine a user status according to the facial information.
  • the user state corresponding to the preset control indication is preset, including but not limited to a control instruction such as turning the page or cutting the song by the user's left and right swinging heads, and the user turns the head to the set position and stays.
  • the state of the set time corresponds to the control instruction of the video fast forward, and the head offset angle of the user exceeds the set angle threshold corresponding to the control instruction of the video switching.
  • the user swings the head to the left and right, corresponding to the page turning instruction of the electronic book, that is, the user swings the head to the right corresponding to the control instruction "next page", and the user swings the head to the left corresponding to the control instruction "previous page”.
  • the offset angle of the head to the right to the set position is less than the set angle threshold, and if the dwell time at the set position belongs to the set time interval, the video played in the target video application is fast forwarded for the first time. length. If the offset angle of the user's head to the right to the set position is less than the set angle threshold, and the dwell time at the set position exceeds the set time threshold, the control video continues to fast forward until the user state is detected to change. The fast forward operation of the video is stopped.
  • the offset angle of the face is determined based on the depth information of the face image. Since the depth of field information reflects the spatial positional relationship of the face pixels, the depth of the face can be calculated by the depth of field information. In an embodiment, the position of both eyes in the facial image is identified, and the facial symmetry axis is determined from both eyes.
  • the face is facing the 3D depth camera, since the distance between the left face region and the right face region and the 3D depth camera are substantially the same, the set sampling points of the left face region and the right face region are respectively extracted, and the depth of field of the set sampling point is set. The information is basically the same.
  • the offset angle of the face can be calculated based on the triangular relationship of the depth information of the left face region and the right face region.
  • a set number of set sampling points are respectively selected from the left face region, and the same number of set sampling points are selected from the corresponding right face region to form a set sampling point pair, according to the set sampling point.
  • the inverse tangent function is used to calculate the reference offset angle of each pair of set sampling point pairs, calculate the average value of the reference offset angle, and use the average value of the reference offset angle as the offset angle of the face.
  • the pixel points corresponding to the left eye corner and the right corner of the nose can be selected to form a set sampling point pair, and the set line of the left eye corner and the right corner corner adjacent to the side of the nose bridge can be respectively selected. Set the corresponding sampling point on the line perpendicular to the line connecting the eyes, and so on.
  • the facial image of the user's face facing a plurality of preset angles may be pre-photographed and used as an image template. storage.
  • image matching may be performed based on the captured face image and the image template to determine the offset angle of the face.
  • the start time of the user's head rotation and the time when the user's head stops rotating can be determined by comparing the face images corresponding to the two adjacent shooting times.
  • the offset angle of the face is determined based on the depth information of the face image at that time.
  • the trigger timer is started, the timing is started, and the timing is stopped when the user's head is detected to move again, to record the time at which the head stays at the position corresponding to the offset angle.
  • Step 130 Determine a control indication according to the user state, and control the target application according to the control indication.
  • control indication is an operation indication corresponding to the control instruction of the target application, including but not limited to fast forward, backward, switch to the next file, switch to the previous file, and page turning.
  • the user status corresponding to the preset control indication is preset, and the control indication is stored in the white list in association with the user status.
  • the mobile terminal after determining the user status, queries the preset white list according to the user status, and may determine a control indication corresponding to the user status, and determine an instruction corresponding to the control indication, where the instruction may be used by the target application. Identify and execute, send the command to the target application. The target application, upon receiving the instruction, performs an operation corresponding to the instruction in response to the control indication corresponding to the instruction.
  • the control instruction is to control the video fast forward for 5 minutes (the time is not limited to 5 minutes, the system default time can also be set by the user), and the control is sent.
  • the control is sent.
  • the technical solution of the embodiment controls the 3D depth camera to acquire the face information by detecting the startup of the target application; determining the user state according to the face information; determining the control indication according to the user state, and determining the target application according to the control instruction Controlling, realizing tracking of the user's face based on the facial image having the depth information, thereby obtaining the motion state of the user's head, determining the corresponding control indication by the correspondence between the preset control indication and the user state, and further, according to the The control instruction controls the target application. Since the user image has deep information, more detailed information can be detected, the accuracy of the motion detection is improved, and the problem of the application being mis-responsive due to the user's accidental touch is avoided, and the human-machine is improved. The accuracy and convenience of the interaction enable the mobile terminal to "see" the user, improve the intelligence of human-computer interaction, and enrich the application scenario of the human-computer interaction function.
  • the corresponding relationship between the user state and the control indication is displayed in a manner of guiding the interface to prompt the user to input the control action.
  • FIG. 2 is a flowchart of another human-computer interaction method according to an embodiment. As shown in FIG. 2, the method includes the following steps.
  • Step 210 Control a normal camera included in the 3D depth camera to acquire a two-dimensional image corresponding to the face according to a set period.
  • the 3D depth camera includes a normal camera and an infrared camera.
  • the application identifier (which may be a package name or a process name, etc.) of the application is obtained, and the preset whitelist is queried according to the application identifier to determine whether the application is a target application. program.
  • the normal camera is controlled to be turned on, and the two-dimensional image corresponding to the face is photographed according to the set period.
  • controlling the normal camera to be turned on detecting whether a human face is included in the preview image, and if detecting that the preview image includes a human face, controlling the normal camera to capture the two-dimensional image corresponding to the face according to the set period, if It is detected that the face is not included in the preview screen, and the user is prompted to adjust the face gesture until the face is detected in the preview screen. It is determined whether the user turns the head by comparing the two-dimensional images of the adjacent shooting moments. When it is detected that the user turns the head, a two-dimensional image corresponding to the face of one frame is taken as the first image of the starting time.
  • Step 220 Determine facial features corresponding to the two-dimensional image.
  • the contour detection technology is used to detect the face region included in the two-dimensional image, and the contour of the face is determined. Further, the face area is determined according to the contour of the face.
  • the facial features may also be the proportion of the face pixels in the preview image. For example, determining a face area included in the two-dimensional image, thereby obtaining a maximum longitudinal resolution of a long side direction of the touch screen parallel to the mobile terminal in the face area, and acquiring a short side direction of the touch screen parallel to the mobile terminal in the face area.
  • the maximum lateral resolution, the size corresponding to the face region is obtained according to the maximum vertical resolution and the maximum lateral resolution, and the size corresponding to the face region is divided by the size of the touch screen to obtain the proportion of the face pixel in the preview image.
  • Step 230 Determine whether the two-dimensional image satisfies the setting condition according to the facial feature. If the setting condition is satisfied, step 240 is performed, and if the setting condition is not met, step 210 is returned.
  • the face area difference is less than a set threshold, determining that the two-dimensional image does not satisfy the setting condition, preventing the user from detecting a small amount of head change and causing a false control situation, and improving the movement
  • the control accuracy of the terminal for example, can prevent the user from triggering a false alarm by sneezing while watching a video or reading an e-book.
  • the face area difference exceeds a set threshold, it is determined that the two-dimensional image satisfies the setting condition.
  • Step 240 Turn on the infrared camera included in the 3D depth camera, and take a facial image through the infrared camera and the common camera to turn off the infrared camera.
  • the infrared camera included in the 3D depth camera is turned on, and the face information of the head motion stop time is captured by the infrared camera to obtain a depth image, and the image is captured by the ordinary camera.
  • the two-dimensional image corresponding to at least one frame of the face forms a three-dimensional facial image from the depth image and the re-photographed two-dimensional image.
  • the end of the facial motion and the single facial motion is usually detected by an ordinary camera.
  • the single facial motion may be a motion process including the above-described start time to the head motion stop timing, and the end point of the single facial motion is the head motion stop timing.
  • the infrared camera is turned on to capture a three-dimensional facial image, and after the depth image is captured by the infrared camera, the infrared camera is turned off, and the power consumption of the mobile terminal can be reduced.
  • the second image captured by the normal camera at the time of stopping the head motion and the depth image captured by the infrared camera may also constitute a three-dimensional facial image.
  • Step 250 Determine a user state according to the three-dimensional facial image.
  • the three-dimensional facial image is identified, and the position of the facial features in the three-dimensional image is determined, thereby determining the symmetry axis of the human face region and the face region.
  • the face area is divided into a left face area and a right face area by the symmetry axis. Extracting a set number of feature points from the set position of the left face region, and determining a mirror feature point of the feature point in the right face region based on the symmetry axis, and configuring the sample point pair by the feature point and the mirror feature point .
  • FIG. 3 is a schematic diagram of a solution for calculating a reference offset angle according to an embodiment.
  • L1 and L2 are the distances between the feature point 320 and the mirrored feature point 330 to the 3D depth camera 310, respectively, which are feature points.
  • 320 is the depth of field information corresponding to the mirrored feature point 330
  • W is the distance between the feature point 320 and the mirrored feature point 330.
  • the axis of symmetry AB changes from the first position 340 to the corresponding second position 350, and the feature point 320 and the mirrored feature point 330 are symmetric about the axis of symmetry AB at the second position, with the axis of symmetry AB
  • the reference offset angle of each pair of set sampling point pairs can be calculated by using the above formula, so that the offset angle of the face is determined according to the reference offset angle.
  • the average of the reference offset angles can be calculated as the offset angle of the face.
  • the reference offset angle may be arranged in descending order, the maximum reference offset angle is used as the offset angle of the face, and the minimum reference offset angle or the reference offset angle at the middle of the queue may be used as the face. Offset angle.
  • Step 260 Query a preset whitelist according to the user status, and determine a control indication corresponding to the user status.
  • the user state includes an offset angle of the face and a time at which the head stays at the position corresponding to the offset angle.
  • Step 270 Send the control instruction corresponding instruction to the target application.
  • the technical solution of the embodiment obtains a two-dimensional image corresponding to the face according to a set period by controlling a common camera included in the 3D depth camera, and opens the infrared included in the 3D depth camera if the two-dimensional image satisfies the setting condition.
  • the camera captures the face information of the head movement stop moment by the infrared camera to obtain a depth image, and realizes the end point of the facial motion and the single facial motion by the ordinary camera first, and when the end point is detected, Turn on the infrared camera to capture 3D facial images, which can reduce the power consumption of the mobile terminal and extend the battery life.
  • determining whether the two-dimensional image satisfies the setting condition can effectively prevent the erroneous detection from causing erroneous control of the target application, and improving the control accuracy of the mobile terminal.
  • FIG. 4 is a structural block diagram of a human-machine interaction apparatus according to an embodiment.
  • the device may be implemented in software and/or hardware, and may be integrated into a mobile terminal, such as a mobile terminal having a 3D depth camera, configured to perform the human-computer interaction method provided by the embodiment.
  • the apparatus includes: an information acquisition module 410 configured to control a 3D depth camera to acquire facial information when detecting that the target application is activated, wherein the facial information includes a facial image having depth information;
  • the state determination module 420 is configured to determine a user state according to the face information;
  • the application control module 430 is configured to determine a control indication according to the user state, and control the target application according to the control indication.
  • the human-machine interaction device tracks the user's face based on the facial image having the depth information, thereby obtaining the motion state of the user's head, and determining the corresponding control by the correspondence between the preset control indication and the user state. Instructing, and further, controlling the target application according to the control instruction, since the user image has depth information, more detailed information can be detected, the accuracy of the motion detection is improved, and the application is prevented from being mis-responsive due to user error.
  • the problem is to improve the accuracy and convenience of human-computer interaction, so that the mobile terminal can "see" the user, improve the intelligence of human-computer interaction, and enrich the application scenario of human-computer interaction function.
  • the information acquisition module 410 includes: a two-dimensional image acquisition sub-module, configured to control the normal camera included in the 3D depth camera to acquire the face corresponding to the second period according to the set period when the target application is detected to be activated. a dimension image capturing sub-module configured to turn on an infrared camera included in the 3D depth camera and to capture a facial image through the infrared camera and the normal camera if the two-dimensional image satisfies a setting condition.
  • the facial image capturing sub-module is further configured to turn off the infrared camera after the facial image is captured by the infrared camera and the normal camera.
  • the apparatus further includes: a feature determining module, configured to determine a facial feature corresponding to the two-dimensional image after controlling a normal camera included in the 3D depth camera to acquire a two-dimensional image corresponding to the face according to a set period
  • the condition determination module is configured to determine whether the two-dimensional image satisfies the setting condition according to the facial feature.
  • condition determining module is configured to: determine a difference in face area between the first image and the second image, wherein the first image is a two-dimensional image captured by the start time of the head motion, The two images are two-dimensional images captured by the head motion stop timing; the face area difference is compared with a set threshold, and whether the two-dimensional image satisfies the setting condition is determined according to the comparison result.
  • the facial image capturing sub-module is configured to capture a facial image by the infrared camera and the ordinary camera by: capturing the facial information of the head motion stop moment by the infrared camera to obtain a depth image.
  • the depth image and the second image constitute the face image.
  • the state determining module 420 is configured to determine an offset angle of the face according to the depth information of the face image, and record a time at which the head stays at a position corresponding to the offset angle.
  • the application control module 430 is configured to: query a preset whitelist according to the user status, and determine a control indication corresponding to the user status, where the control indication includes fast forward, backward, and switch Going to the next file, switching to the previous file, and turning the page; sending the instruction corresponding to the control instruction to the target application, wherein the instruction is used to indicate that the target application responds to the control indication,
  • Target applications include video applications, audio applications, and e-books.
  • the two-dimensional image acquisition sub-module is configured to: control a normal camera included in the 3D depth camera to be turned on, and detect whether a human face is included in the preview image; if it is detected that the preview image includes a human face, then control The normal camera acquires a two-dimensional image corresponding to the face according to a set period.
  • the embodiment further provides a storage medium comprising computer executable instructions for executing a human-computer interaction method when executed by a computer processor, the method comprising: detecting that the target application is launched a case where the control 3D depth camera acquires face information, wherein the face information includes a face image having depth information; determining a user state according to the face information; determining a control indication according to the user state, and according to the control instruction Control the target application.
  • Storage medium any of at least one type of memory device or storage device.
  • the term "storage medium” is intended to include: a mounting medium such as a Compact Disc Read-Only Memory (CD-ROM), a floppy disk or a tape device; a computer system memory or a random access memory such as a dynamic random Random Access Memory (DRAM), Double Data Rate Random Access Memory (DDR RAM), Static Random Access Memory (SRAM), Extended Data Output Random Extended Data Output Random Access Memory (EDO RAM), Rambus Random Access Memory (RAM), etc.; non-volatile memory such as flash memory, magnetic media (such as hard disk or light) Storage); registers or other similar types of memory elements, etc.
  • the storage medium may also include other types of memory or multiple types of memory combinations.
  • the storage medium may be located in a first computer system in which the program is executed in the first computer system, or may be in a different second computer system, the second computer system being coupled to the first computer system via a network, such as the Internet.
  • the second computer system can provide program instructions to the first computer for execution.
  • the term "storage medium" can include two or more storage media that can reside in different locations (eg, in different computer systems connected through a network).
  • a storage medium may store program instructions (eg, program instructions implemented as a computer program) executable by one or more processors.
  • the storage medium includes computer-executable instructions, and the computer-executable instructions are not limited to the human-computer interaction operations as described above, and may also perform related operations in the human-computer interaction method provided by any embodiment of the present application. operating.
  • the embodiment provides a mobile terminal, and the mobile terminal has an operating system, and the human-machine interaction device provided in this embodiment can be integrated into the mobile terminal.
  • the mobile terminal can be a smart phone, a tablet (PAD), and a handheld game console.
  • FIG. 5 is a structural block diagram of a mobile terminal according to an embodiment. As shown in FIG. 5, the mobile terminal includes a 3D depth camera 510, a memory 520, and a processor 530.
  • the 3D depth camera 510 includes a normal camera and an infrared camera configured to capture a facial image having depth information; the memory 520 is configured to store a computer program, a facial image, and an association relationship between a user state and a control indication, and the like; the processor 530 It is arranged to read and execute the computer program stored in the memory 520.
  • the processor 530 when executing the computer program, implements the following steps: controlling the 3D depth camera to acquire facial information, in the case that the target application is detected to be activated, wherein the facial information includes a facial image having depth information;
  • the facial information determines a user state; determines a control indication according to the user state, and controls the target application according to the control indication.
  • FIG. 6 is a structural block diagram of a smart phone according to an embodiment.
  • the smart phone may include: a memory 601, a central processing unit (CPU) 602 (also referred to as a processor, hereinafter referred to as a CPU), a peripheral interface 603, and a radio frequency (RF) circuit.
  • CPU central processing unit
  • RF radio frequency
  • the communication bus or signal line 607 is in communication.
  • the smartphone 600 is merely an example of a mobile terminal, and the smartphone 600 may have more or fewer components than those shown in FIG. 6, two or more components may be combined, or may have Different component configurations.
  • the various components shown in FIG. 6 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the following describes the smart phone of the human-machine interaction device provided in this embodiment.
  • the memory 601 can be accessed by the CPU 602, the peripheral interface 603, etc., and the memory 601 can include a high speed random access memory, and can also include a nonvolatile memory, such as one or more magnetic disk storage devices, flash memory devices. Or other volatile solid-state storage devices.
  • the computer program is stored in the memory 611, and may also store face information, a white list corresponding to the association relationship between the user state and the control indication, a white list corresponding to the target application, and the like.
  • Peripheral interface 603, which can connect the input and output peripherals of the device to CPU 602 and memory 601.
  • I/O subsystem 609 which can connect input and output peripherals on the device, such as touch 612 and other input/control devices 610, to peripheral interface 603.
  • I/O subsystem 609 can include display controller 6091 and one or more input controllers 6092 that are configured to control other input/control devices 610.
  • one or more input controllers 6092 receive electrical signals from other input/control devices 610 or transmit electrical signals to other input/control devices 610, and other input/control devices 610 may include physical buttons (press buttons, rocker buttons, etc.) ), dial, slide switch, joystick, click wheel.
  • the input controller 6092 can be connected to any of the following: a keyboard, an infrared port, a Universal Serial Bus (USB) interface, and a pointing device such as a mouse.
  • USB Universal Serial Bus
  • the touch screen 612 is an input interface and an output interface between the user terminal and the user, and displays the visual output to the user.
  • the visual output may include graphics, text, icons, videos, and the like.
  • the camera 613 may be a 3D depth camera.
  • the three-dimensional image of the face of the face is acquired by the camera 613, and the three-dimensional image of the face is converted into an electrical signal, and stored in the memory 601 through the peripheral interface 603.
  • Display controller 6061 in I/O subsystem 609 receives an electrical signal from touch screen 612 or an electrical signal to touch screen 612.
  • the touch screen 612 detects the contact on the touch screen, and the display controller 6091 converts the detected contact into an interaction with the user interface object displayed on the touch screen 612, ie, realizes human-computer interaction, and the user interface object displayed on the touch screen 612 may be running.
  • the icon of the game, the icon of the network to the corresponding network, and the like.
  • the device may also include a light mouse, which is a touch sensitive surface that does not display a visual output, or an extension of a touch sensitive surface formed by the touch screen.
  • the RF circuit 605 is mainly configured to establish communication between the mobile phone and the wireless network (ie, the network side), and implement data reception and transmission between the mobile phone and the wireless network. For example, sending and receiving short messages, emails, and the like.
  • the RF circuit 605 receives and transmits an RF signal, also referred to as an electromagnetic signal, and the RF circuit 605 converts the electrical signal into an electromagnetic signal or converts the electromagnetic signal into an electrical signal, and through the electromagnetic signal and communication network And other devices to communicate.
  • RF circuitry 605 may include known circuitry for performing these functions, including but not limited to an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, CODER-DECoder (CODEC) chipset, Subscriber Identity Module (SIM), etc.
  • an antenna system an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, CODER-DECoder (CODEC) chipset, Subscriber Identity Module (SIM), etc.
  • CODER-DECoder CODER-DECoder
  • SIM Subscriber Identity Module
  • the audio circuit 606 is primarily configured to receive audio data from the peripheral interface 603, convert the audio data into an electrical signal, and transmit the electrical signal to the speaker 611.
  • the speaker 611 is arranged to restore the voice signal received by the mobile phone from the wireless network through the RF circuit 605 to sound and play the sound to the user.
  • the power management chip 608 is configured to provide power and power management for the hardware connected to the CPU 602, the I/O subsystem, and the peripheral interface.
  • the mobile terminal provided in this embodiment tracks the user's face based on the facial image having the depth information, thereby obtaining the motion state of the user's head, and determining the corresponding control indication by the correspondence between the preset control indication and the user state. Further, according to the control instruction, the target application is controlled. Since the user image has depth information, more detailed information can be detected, the accuracy of the motion detection is improved, and the application incorrect response caused by the user's accidental touch is avoided. The accuracy and convenience of human-computer interaction are improved, so that the mobile terminal can "see" the user, improve the intelligence of human-computer interaction, and enrich the application scenario of the human-computer interaction function.
  • the human-machine interaction device, the storage medium, and the mobile terminal provided in the foregoing embodiments may perform the human-computer interaction method provided by any embodiment of the present application, and have the corresponding functional modules and beneficial effects of executing the method.
  • the human-computer interaction method provided by any embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephone Function (AREA)

Abstract

Disclosed in an embodiment of the present invention are a method and device for man-machine interaction, a medium, and a mobile terminal. The method comprises: upon detection of activation of a target application, controlling a 3D depth camera to acquire facial information; determining a user state according to the facial information; and determining a control instruction according to the user state, and controlling the target application according to the control instruction.

Description

人机交互方法、装置、介质及移动终端Human-computer interaction method, device, medium and mobile terminal
本申请要求在2018年01月03日提交中国专利局、申请号为201810005036.7的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201810005036.7, filed on Jan. 3, 2011, the entire disclosure of which is hereby incorporated by reference.
技术领域Technical field
本申请实施例涉及移动终端技术领域,例如涉及一种人机交互方法、装置、介质及移动终端。The embodiments of the present invention relate to the technical field of mobile terminals, for example, to a human-computer interaction method, device, medium, and mobile terminal.
背景技术Background technique
随着移动终端技术的发展,移动终端的用途不再限于打电话及发信息等方面,越来越多的用户在移动终端中安装视频播放器、音乐播放器及电子阅读器等应用程序,以方便使用。With the development of mobile terminal technology, the use of mobile terminals is no longer limited to calling and sending information. More and more users install applications such as video players, music players and e-readers in mobile terminals. easy to use.
相关技术中对应用程序的控制通常是采用手动控制方式,在应用程序的使用过程中,通常需要用户重复的输入一些简单操作,影响人机交互的便捷性,且容易出现误触的问题。In the related art, the control of the application is usually a manual control mode. During the use of the application, the user usually needs to input some simple operations repeatedly, affecting the convenience of human-computer interaction, and is prone to false touches.
发明内容Summary of the invention
本申请实施例提供一种人机交互方法、装置、介质及移动终端,可以优化人机交互方案,提高应用程序控制的便捷性及准确性。The embodiment of the present application provides a human-computer interaction method, device, medium, and mobile terminal, which can optimize a human-computer interaction solution and improve the convenience and accuracy of application control.
本申请实施例提供了一种人机交互方法,包括:The embodiment of the present application provides a human-computer interaction method, including:
在检测到目标应用程序启动的情况下,控制三维(3 Dimensions,3D)深度摄像头获取面部信息,其中,所述面部信息包括具有景深信息的面部图像;Controlling a three-dimensional (3 Dimensions, 3D) depth camera to acquire facial information, wherein the facial information includes a facial image having depth of field information, in the case where the target application is detected to be activated;
根据所述面部信息确定用户状态;Determining a user status based on the facial information;
根据所述用户状态确定控制指示,并根据所述控制指示对所述目标应用程序进行控制。Determining a control indication according to the user state, and controlling the target application according to the control indication.
本申请实施例还提供了一种人机交互装置,该装置包括:The embodiment of the present application further provides a human-machine interaction device, and the device includes:
信息获取模块,设置为在检测到目标应用程序启动的情况下,控制3D深度摄像头获取面部信息,其中,所述面部信息包括具有景深信息的面部图像;An information acquiring module, configured to control a 3D depth camera to acquire facial information when detecting that the target application is activated, wherein the facial information includes a facial image having depth information;
状态确定模块,设置为根据所述面部信息确定用户状态;a state determining module, configured to determine a user state according to the face information;
应用控制模块,设置为根据所述用户状态确定控制指示,并根据所述控制指示对所述目标应用程序进行控制。The application control module is configured to determine a control indication according to the user state, and control the target application according to the control indication.
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,该计算机程序被处理器执行时实现如上所述的人机交互方法。The embodiment of the present application further provides a computer readable storage medium storing a computer program, which is executed by a processor to implement the human-computer interaction method as described above.
本申请实施例还提供了一种移动终端,包括3D深度摄像头、存储器,处理器及存储在存储器上并可在处理器运行的计算机程序,该3D深度摄像头包括普通摄像头和红外摄像头,设置为拍摄具有景深信息的面部图像,所述处理器执行所述计算机程序时实现如上所述的人机交互方法。The embodiment of the present application further provides a mobile terminal, including a 3D depth camera, a memory, a processor, and a computer program stored in the memory and operable on the processor, the 3D depth camera including a normal camera and an infrared camera, configured to shoot A face image having depth of field information, the processor implementing the human computer interaction method as described above when executing the computer program.
本申请实施例提供的人机交互方案,基于具有景深信息的面部图像对用户面部进行跟踪,从而,得到用户头部的运动状态,由预先设置的控制指示与用户状态的对应关系,确定对应的控制指示,进而,根据该控制指示对目标应用程序进行控制,由于用户图像具有深度信息,可以检测到更多的细节信息,提高了动作检测的准确性,避免因用户误触导致应用程序误响应的问题,提升了人机交互的准确性与便捷性,使移动终端可以“看见”用户,提高了人机交互的智能性,丰富了人机交互功能的应用场景。The human-computer interaction solution provided by the embodiment of the present application tracks the user's face based on the facial image with the depth information, thereby obtaining the motion state of the user's head, and determining the corresponding relationship by the correspondence between the preset control indication and the user state. Controlling the indication, and further controlling the target application according to the control indication. Since the user image has depth information, more detailed information can be detected, the accuracy of the motion detection is improved, and the application is prevented from being mistakenly caused by the user's accidental touch. The problem has improved the accuracy and convenience of human-computer interaction, enabling mobile terminals to "see" users, improve the intelligence of human-computer interaction, and enrich the application scenarios of human-computer interaction functions.
附图说明DRAWINGS
图1是一实施例提供的一种人机交互方法的流程图;FIG. 1 is a flowchart of a human-computer interaction method according to an embodiment;
图2是一实施例提供的另一种人机交互方法的流程图;2 is a flowchart of another human-computer interaction method according to an embodiment;
图3是一实施例提供的一种计算参考偏移角度的方案示意图;FIG. 3 is a schematic diagram of a solution for calculating a reference offset angle according to an embodiment; FIG.
图4是一实施例提供的一种人机交互装置的结构框图;4 is a structural block diagram of a human-machine interaction apparatus according to an embodiment;
图5是一实施例提供的一种移动终端的结构框图;FIG. 5 is a structural block diagram of a mobile terminal according to an embodiment; FIG.
图6是一实施例提供的一种智能手机的结构框图。FIG. 6 is a structural block diagram of a smart phone according to an embodiment.
具体实施方式Detailed ways
下面结合附图和实施例对本申请进行说明。此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。The present application will be described below in conjunction with the accompanying drawings and embodiments. The specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. In addition, for the convenience of description, only some but not all of the structures related to the present application are shown in the drawings.
图1为一实施例提供的一种人机交互方法的流程图。该方法可以由人机交互装置来执行,其中,该装置可由软件和/或硬件实现,一般可集成在移动终端中,例如具有3D深度摄像头的移动终端。如图1所示,该方法包括如下步骤。FIG. 1 is a flowchart of a human-computer interaction method according to an embodiment. The method can be performed by a human-machine interaction device, wherein the device can be implemented by software and/or hardware, and can generally be integrated in a mobile terminal, such as a mobile terminal having a 3D depth camera. As shown in FIG. 1, the method includes the following steps.
步骤110、在检测到目标应用程序启动的情况下,控制3D深度摄像头获取面部信息。Step 110: Control the 3D depth camera to acquire facial information when detecting that the target application is started.
在初始化人机交互功能的情况下,提示用户输入待通过面部信息控制的应用程序,记为目标应用程序,并将目标应用程序存储于白名单内。目标应用程序包括视频应用、音频应用及电子书等。在一实施例中,目标应用程序还可以是系统默认的可以通过面部信息控制的应用,该目标应用程序以配置文件的形式在移动终端出厂前被配置于移动终端内。In the case of initializing the human-computer interaction function, the user is prompted to input an application to be controlled by the face information, recorded as a target application, and the target application is stored in the white list. Target applications include video applications, audio applications, and e-books. In an embodiment, the target application may also be an application default by the system, and the target application is configured in the form of a configuration file in the mobile terminal before the mobile terminal leaves the factory.
本实施例中,3D深度摄像头可以用于拍摄具有景深信息的图像,可以检测到多种用户动作,提供了针对目标应用程序的多种控制动作,丰富了控制动作的种类。在一实施例中,3D深度摄像头包括基于结构光深度测距的深度摄像头和基于飞行时间(Time Of Flight,TOF)测距的深度摄像头。In this embodiment, the 3D depth camera can be used to capture an image with depth of field information, can detect a variety of user actions, and provides various control actions for the target application, enriching the types of control actions. In an embodiment, the 3D depth camera includes a depth camera based on structured light depth ranging and a depth camera based on Time Of Flight (TOF) ranging.
例如,基于结构光深度测距的深度摄像头包括普通摄像头(例如,可以是红绿蓝(Red Green Blue,RGB)摄像头(camera))和红外摄像头(可以是红外线的(infrared)camera)。红外摄像头将一定模式的光结构投射到当前待拍摄的场景中,在场景中的人或物体表面形成由该场景中的人或物调制后的光条三维图像,再通过普通摄像头探测上述的光条三维图像即可获得光条二维畸变图像。光条的畸变程度取决于普通摄像头与红外摄像头之间的相对位置以及当前待拍摄的场景中的人或物体的表面形廓或高度。由于深度摄像头中的普通摄像头和红外摄像头之间的相对位置是一定的,因此,由该光条二维畸变图像的图像坐标便可重现场景中的人或物体的表面三维轮廓,从而可以获取深度信息。结构光深度测距具有较高的分辨率和测量精度,可以提升获取的深度信息的精确度。For example, a depth camera based on structured light depth ranging includes a conventional camera (eg, may be a Red Green Blue (RGB) camera) and an infrared camera (which may be an infrared camera). The infrared camera projects a certain mode of the light structure into the scene to be photographed, and forms a three-dimensional image of the light strip modulated by the person or object in the scene on the surface of the person or object in the scene, and then detects the light by the ordinary camera. A three-dimensional image of the strip can be obtained by a three-dimensional image. The degree of distortion of the light bar depends on the relative position between the ordinary camera and the infrared camera and the surface profile or height of the person or object in the scene to be photographed. Since the relative position between the ordinary camera and the infrared camera in the depth camera is constant, the image coordinates of the two-dimensional distorted image of the light bar can reproduce the three-dimensional contour of the surface of the person or object in the scene, thereby obtaining Depth information. Structured light depth ranging has high resolution and measurement accuracy, which can improve the accuracy of acquired depth information.
在一实施例中,3D深度摄像头还可以是基于TOF测距的深度摄像头,通过传感器记录从发光单元发出的调制红外光发射到物体,再从物体反射回来的相位变化,在一个波长的范围内根据光速,可以实时的获取整个场景深度距离。当前待拍摄的场景中的人或物体所处的深度位置不一样,因此调制红外光从发出到接收所用时间是不同的,如此,便可获取场景的深度信息。基于TOF深度 测距的深度摄像头计算深度信息时不受被摄物表面的灰度和特征的影响,且可以快速地计算深度信息,具有很高的实时性。In an embodiment, the 3D depth camera may also be a depth camera based on TOF ranging, and the phase change of the modulated infrared light emitted from the light emitting unit to the object and reflected back from the object is recorded by the sensor within a range of wavelengths. According to the speed of light, the depth of the entire scene can be obtained in real time. The depth position of the person or object in the scene to be photographed is different, so the time taken to modulate the infrared light from the time of sending to receiving is different, so that the depth information of the scene can be obtained. The depth camera based on TOF depth ranging calculates the depth information without being affected by the gray level and features of the surface of the object, and can quickly calculate the depth information, which has high real-time performance.
本实施例中,面部信息包括具有景深信息的面部图像。在本实施例中,由移动终端监测目标应用程序的状态。若检测到该目标应用程序被启动,则与该目标应用程序的启动操作并行执行打开3D深度摄像头的操作。在3D深度摄像头被开启后,控制3D深度摄像头对用户进行拍摄。本实施例中,通过3D深度摄像头对用户的面部进行拍摄得到面部图像。若检测到通过3D深度摄像头未获取到用户完整的面部图像,则提示用户调整面部姿态。在一实施例中,可以在相机的预览界面中显示一个提示框,以提示用户将面部对准该提示框。In this embodiment, the face information includes a face image having depth information. In this embodiment, the state of the target application is monitored by the mobile terminal. If it is detected that the target application is started, the operation of opening the 3D depth camera is performed in parallel with the startup operation of the target application. After the 3D depth camera is turned on, the 3D depth camera is controlled to shoot the user. In this embodiment, a face image is obtained by photographing a face of the user through a 3D depth camera. If it is detected that the user's complete facial image is not acquired by the 3D depth camera, the user is prompted to adjust the facial gesture. In an embodiment, a prompt box may be displayed in the preview interface of the camera to prompt the user to align the face with the prompt box.
在一实施例中,控制3D深度摄像头对用户进行拍摄的方式可以是按照设定的周期控制3D深度摄像头对用户的面部进行拍摄,得到多帧面部图像。In an embodiment, the method of controlling the 3D depth camera to capture the user may be to control the 3D depth camera to capture the face of the user according to the set period to obtain a multi-frame facial image.
步骤120、根据所述面部信息确定用户状态。Step 120: Determine a user status according to the facial information.
在一实施例中,预先设定与预设的控制指示对应的用户状态,包括但不限于用户左右摆动头部对应翻页或切歌等控制指示,用户将头部转至设定位置并停留设定时间的状态与视频快进的控制指示对应,以及,用户的头部偏移角度超过设定角度阈值与视频切换的控制指示对应。例如,用户左右摆动头部,对应于电子书的翻页指令,即用户向右摆动头部与控制指示“下一页”对应,用户向左摆动头部与控制指示“上一页”对应。又如,头部向右偏转至设定位置的偏移角度小于设定角度阈值,如果在该设置位置的停留时间属于设定时间区间,则控制目标视频应用中播放的视频快进第一时间长度。如果用户头部向右偏转至设定位置的偏移角度小于设定角度阈值,且在该设置位置的停留时间超过设定时间阈值,则控制视频持续快进,直至检测到用户状态发生变化,才停止对该视频的快进操作。In an embodiment, the user state corresponding to the preset control indication is preset, including but not limited to a control instruction such as turning the page or cutting the song by the user's left and right swinging heads, and the user turns the head to the set position and stays. The state of the set time corresponds to the control instruction of the video fast forward, and the head offset angle of the user exceeds the set angle threshold corresponding to the control instruction of the video switching. For example, the user swings the head to the left and right, corresponding to the page turning instruction of the electronic book, that is, the user swings the head to the right corresponding to the control instruction "next page", and the user swings the head to the left corresponding to the control instruction "previous page". For another example, the offset angle of the head to the right to the set position is less than the set angle threshold, and if the dwell time at the set position belongs to the set time interval, the video played in the target video application is fast forwarded for the first time. length. If the offset angle of the user's head to the right to the set position is less than the set angle threshold, and the dwell time at the set position exceeds the set time threshold, the control video continues to fast forward until the user state is detected to change. The fast forward operation of the video is stopped.
本实施例中,根据面部图像的景深信息确定脸部的偏移角度。由于景深信息反映了脸部像素点的空间位置关系,可以通过景深信息计算脸部的偏移角度。在一实施例中,识别面部图像中的双眼的位置,根据双眼确定脸部对称轴。在脸部正对3D深度摄像头时,由于左脸区域及右脸区域与3D深度摄像头的距离基本相同,所以分别提取左脸区域与右脸区域的设定采样点,该设定采样点的景深信息基本相同。若用户头部发生偏转,则左脸区域与右脸区域的景深信息会随之发生变化,使左脸区域与右脸区域处于不同的深度平面,进而,设定采 样点的景深信息不再相同。可以基于左脸区域及右脸区域的景深信息的三角关系计算脸部的偏移角度。在一实施例中,分别从左脸区域选择设定数量的设定采样点,并从对应的由右脸区域选择相同数量的设定采样点,构成设定采样点对,根据设定采样点对的景深信息,采用反正切函数分别计算每对设定采样点对的参考偏移角度,计算参考偏移角度的平均值,并将参考偏移角度的平均值作为脸部的偏移角度。在一实施例中,可以选择靠近鼻梁一侧的左眼角及右眼角对应的像素点构成设定采样点对,还可以分别从靠近鼻梁一侧的左眼角及右眼角所在的设定直线(该设定直线与双眼连线垂直)上对应的选取采样点等等。In this embodiment, the offset angle of the face is determined based on the depth information of the face image. Since the depth of field information reflects the spatial positional relationship of the face pixels, the depth of the face can be calculated by the depth of field information. In an embodiment, the position of both eyes in the facial image is identified, and the facial symmetry axis is determined from both eyes. When the face is facing the 3D depth camera, since the distance between the left face region and the right face region and the 3D depth camera are substantially the same, the set sampling points of the left face region and the right face region are respectively extracted, and the depth of field of the set sampling point is set. The information is basically the same. If the user's head is deflected, the depth information of the left face area and the right face area will change accordingly, so that the left face area and the right face area are in different depth planes, and thus the depth information of the set sampling point is no longer the same. . The offset angle of the face can be calculated based on the triangular relationship of the depth information of the left face region and the right face region. In an embodiment, a set number of set sampling points are respectively selected from the left face region, and the same number of set sampling points are selected from the corresponding right face region to form a set sampling point pair, according to the set sampling point. For the depth of field information, the inverse tangent function is used to calculate the reference offset angle of each pair of set sampling point pairs, calculate the average value of the reference offset angle, and use the average value of the reference offset angle as the offset angle of the face. In an embodiment, the pixel points corresponding to the left eye corner and the right corner of the nose can be selected to form a set sampling point pair, and the set line of the left eye corner and the right corner corner adjacent to the side of the nose bridge can be respectively selected. Set the corresponding sampling point on the line perpendicular to the line connecting the eyes, and so on.
在一实施例中,根据面部信息确定用户状态的方式有很多种,本申请并不作具体限定,例如,还可以预先拍摄用户的脸部朝向多个预设角度的面部图像,并作为图像模板进行存储。在需要根据面部信息确定用户状态的情况下,可以根据拍摄的面部图像与图像模板进行图像匹配,以确定脸部的偏移角度。In an embodiment, there are many ways to determine the state of the user according to the facial information, which is not specifically limited. For example, the facial image of the user's face facing a plurality of preset angles may be pre-photographed and used as an image template. storage. In the case where it is necessary to determine the state of the user based on the face information, image matching may be performed based on the captured face image and the image template to determine the offset angle of the face.
本实施例中,可以通过比较相邻两个拍摄时刻对应的面部图像确定用户头部转动的起始时刻及用户头部停止转动的时刻。在检测到用户头部停止转动的情况下,根据该时刻的面部图像的景深信息确定脸部的偏移角度。另外,在检测到用户头部停止转动的情况下,触发计时器启动,开始计时,并在检测到用户头部再次运动时停止计时,以记录头部在该偏移角度对应的位置停留的时间。In this embodiment, the start time of the user's head rotation and the time when the user's head stops rotating can be determined by comparing the face images corresponding to the two adjacent shooting times. In the case where it is detected that the user's head stops rotating, the offset angle of the face is determined based on the depth information of the face image at that time. In addition, in the case where it is detected that the user's head stops rotating, the trigger timer is started, the timing is started, and the timing is stopped when the user's head is detected to move again, to record the time at which the head stays at the position corresponding to the offset angle. .
步骤130、根据所述用户状态确定控制指示,并根据所述控制指示对所述目标应用程序进行控制。Step 130: Determine a control indication according to the user state, and control the target application according to the control indication.
在一实施例中,控制指示为与目标应用程序的控制指令对应的操作指示,包括但不限于快进、后退、切换至下一个文件、切换至上一个文件及翻页。预先设定与预设的控制指示对应的用户状态,并将控制指示与用户状态关联存储于白名单内。In an embodiment, the control indication is an operation indication corresponding to the control instruction of the target application, including but not limited to fast forward, backward, switch to the next file, switch to the previous file, and page turning. The user status corresponding to the preset control indication is preset, and the control indication is stored in the white list in association with the user status.
本实施例中,移动终端在确定用户状态后,根据该用户状态查询预先设置的白名单,可以确定与该用户状态对应的控制指示,确定该控制指示对应的指令,该指令可以被目标应用程序识别并执行,发送该指令至目标应用程序。目标应用程序在接收到该指令的情况下,执行该指令对应的操作,以响应该指令对应的控制指示。例如,在目标视频应用程序运行的过程中,检测到用户的头部向右偏移至设定角度,且在该设定角度对应的位置停留了3秒(s),假设该设定角度小于设定角度阈值且停留时间属于设定时间区间,则确定控制指示是 控制视频快进5分钟(该时间并不限于5分钟,可以是系统默认时间也可以由用户自行设定),发送该控制指示对应的指令至目标视频应用程序,以控制当前播放的视频文件快进5分钟。又如,在目标视频应用程序运行的过程中,检测到用户的头部向右偏移至设定角度,若该设定角度超过设定角度阈值,则确定控制指示是切换视频(即播放下一集),发送该控制指示对应的指令至目标视频应用程序,以控制播放当前视频的下一集。In this embodiment, after determining the user status, the mobile terminal queries the preset white list according to the user status, and may determine a control indication corresponding to the user status, and determine an instruction corresponding to the control indication, where the instruction may be used by the target application. Identify and execute, send the command to the target application. The target application, upon receiving the instruction, performs an operation corresponding to the instruction in response to the control indication corresponding to the instruction. For example, during the running of the target video application, it is detected that the user's head is shifted to the right to a set angle, and stays at the position corresponding to the set angle for 3 seconds (s), assuming that the set angle is smaller than Set the angle threshold and the dwell time belongs to the set time interval, then determine that the control instruction is to control the video fast forward for 5 minutes (the time is not limited to 5 minutes, the system default time can also be set by the user), and the control is sent. Indicates the corresponding command to the target video application to control the currently playing video file to fast forward for 5 minutes. For another example, during the running of the target video application, detecting that the user's head is shifted to the right to a set angle, and if the set angle exceeds the set angle threshold, determining that the control indication is to switch the video (ie, playing under An episode), sending the control to indicate the corresponding instruction to the target video application to control playback of the next episode of the current video.
本实施例的技术方案,通过在检测到目标应用程序启动的情况下,控制3D深度摄像头获取面部信息;根据该面部信息确定用户状态;根据用户状态确定控制指示,并根据控制指示对目标应用程序进行控制,实现基于具有景深信息的面部图像对用户面部进行跟踪,从而,得到用户头部的运动状态,由预先设置的控制指示与用户状态的对应关系,确定对应的控制指示,进而,根据该控制指示对目标应用程序进行控制,由于用户图像具有深度信息,可以检测到更多的细节信息,提高了动作检测的准确性,避免因用户误触导致应用程序误响应的问题,提升了人机交互的准确性与便捷性,使移动终端可以“看见”用户,提高了人机交互的智能性,丰富了人机交互功能的应用场景。The technical solution of the embodiment controls the 3D depth camera to acquire the face information by detecting the startup of the target application; determining the user state according to the face information; determining the control indication according to the user state, and determining the target application according to the control instruction Controlling, realizing tracking of the user's face based on the facial image having the depth information, thereby obtaining the motion state of the user's head, determining the corresponding control indication by the correspondence between the preset control indication and the user state, and further, according to the The control instruction controls the target application. Since the user image has deep information, more detailed information can be detected, the accuracy of the motion detection is improved, and the problem of the application being mis-responsive due to the user's accidental touch is avoided, and the human-machine is improved. The accuracy and convenience of the interaction enable the mobile terminal to "see" the user, improve the intelligence of human-computer interaction, and enrich the application scenario of the human-computer interaction function.
在一实施例中,在检测到用户首次使用该人机交互功能的情况下,以引导界面的方式展示用户状态与控制指示的对应关系,以提示用户可以输入的控制动作。In an embodiment, when it is detected that the user uses the human-computer interaction function for the first time, the corresponding relationship between the user state and the control indication is displayed in a manner of guiding the interface to prompt the user to input the control action.
图2是一实施例提供的另一种人机交互方法的流程图。如图2所示,该方法包括如下步骤。FIG. 2 is a flowchart of another human-computer interaction method according to an embodiment. As shown in FIG. 2, the method includes the following steps.
步骤210、控制3D深度摄像头包括的普通摄像头按照设定周期获取脸部对应的二维图像。Step 210: Control a normal camera included in the 3D depth camera to acquire a two-dimensional image corresponding to the face according to a set period.
本实施例中,3D深度摄像头包括普通摄像头及红外摄像头。在检测到用户开启某一应用程序的情况下,获取该应用程序的应用标识(可以是包名或进程名等),根据该应用标识查询预设的白名单,判断该应用程序是否为目标应用程序。在该应用程序是目标应用程序的情况下,控制普通摄像头开启,并按照设定周期拍摄脸部对应的二维图像。在一实施例中,在控制普通摄像头开启后,检测预览画面中是否包括人脸,如果检测到预览画面中包括人脸,则控制普通摄像头按照设定周期拍摄脸部对应的二维图像,如果检测到预览画面中不包括人脸,提示用户调整脸部姿态,直至在预览画面中检测到人脸。通过比较相邻 拍摄时刻的二维图像,确定用户是否转动头部。在检测到用户转动头部的情况下,拍摄一帧脸部对应的二维图像,作为起始时刻的第一图像。顺序获取当前拍摄的二维图像,并将当前拍摄的二维图像与上一拍摄时刻的原始图像进行比较,以确定头部运动停止时刻,在检测到头部运动停止的情况下,拍摄一帧脸部对应的二维图像,记为第二图像。In this embodiment, the 3D depth camera includes a normal camera and an infrared camera. When it is detected that the user starts an application, the application identifier (which may be a package name or a process name, etc.) of the application is obtained, and the preset whitelist is queried according to the application identifier to determine whether the application is a target application. program. In the case where the application is the target application, the normal camera is controlled to be turned on, and the two-dimensional image corresponding to the face is photographed according to the set period. In an embodiment, after controlling the normal camera to be turned on, detecting whether a human face is included in the preview image, and if detecting that the preview image includes a human face, controlling the normal camera to capture the two-dimensional image corresponding to the face according to the set period, if It is detected that the face is not included in the preview screen, and the user is prompted to adjust the face gesture until the face is detected in the preview screen. It is determined whether the user turns the head by comparing the two-dimensional images of the adjacent shooting moments. When it is detected that the user turns the head, a two-dimensional image corresponding to the face of one frame is taken as the first image of the starting time. Sequentially acquiring the currently captured two-dimensional image, and comparing the currently captured two-dimensional image with the original image of the previous shooting time to determine the head motion stop time, and shooting a frame when detecting that the head motion is stopped The two-dimensional image corresponding to the face is recorded as the second image.
步骤220、确定所述二维图像对应的面部特征。Step 220: Determine facial features corresponding to the two-dimensional image.
本实施例中,采用轮廓检测技术对该二维图像包含的人脸区域进行检测,确定人脸轮廓,进而,根据人脸轮廓确定脸部面积。In this embodiment, the contour detection technology is used to detect the face region included in the two-dimensional image, and the contour of the face is determined. Further, the face area is determined according to the contour of the face.
本实施例并不限定面部特征的含义,面部特征还可以是人脸像素点在预览画面中的占比。例如,确定该二维图像包含的人脸区域,从而,获取人脸区域中平行于移动终端的触摸屏长边方向的最大纵向分辨率,获取人脸区域中平行于移动终端的触摸屏短边方向的最大横向分辨率,根据该最大纵向分辨率与最大横向分辨率得到人脸区域对应的尺寸,将该人脸区域对应的尺寸与触摸屏的尺寸相除得到人脸像素点在预览画面中的占比。This embodiment does not limit the meaning of the facial features, and the facial features may also be the proportion of the face pixels in the preview image. For example, determining a face area included in the two-dimensional image, thereby obtaining a maximum longitudinal resolution of a long side direction of the touch screen parallel to the mobile terminal in the face area, and acquiring a short side direction of the touch screen parallel to the mobile terminal in the face area The maximum lateral resolution, the size corresponding to the face region is obtained according to the maximum vertical resolution and the maximum lateral resolution, and the size corresponding to the face region is divided by the size of the touch screen to obtain the proportion of the face pixel in the preview image. .
步骤230、根据所述面部特征判断所述二维图像是否满足设定条件,若满足设定条件,则执行步骤240,若不满足设定条件,则返回执行步骤210。Step 230: Determine whether the two-dimensional image satisfies the setting condition according to the facial feature. If the setting condition is satisfied, step 240 is performed, and if the setting condition is not met, step 210 is returned.
确定上述第一图像与第二图像的脸部面积差值,并将该脸部面积差值与设定阈值进行比较,根据比较结果判断该二维图像是否满足设定条件。在一实施例中,在该脸部面积差值小于设定阈值的情况下,确定该二维图像不满足设定条件,避免用户小幅度的头部变化被检测而出现误控制情形,提高移动终端的控制精确度,例如,可以避免用户在观看视频或读电子书的情况下,因为打个喷嚏而触发误控制的问题。在该脸部面积差值超过设定阈值的情况下,确定该二维图像满足设定条件。Determining a face area difference between the first image and the second image, and comparing the face area difference with a set threshold, and determining, according to the comparison result, whether the two-dimensional image satisfies a set condition. In an embodiment, if the face area difference is less than a set threshold, determining that the two-dimensional image does not satisfy the setting condition, preventing the user from detecting a small amount of head change and causing a false control situation, and improving the movement The control accuracy of the terminal, for example, can prevent the user from triggering a false alarm by sneezing while watching a video or reading an e-book. When the face area difference exceeds a set threshold, it is determined that the two-dimensional image satisfies the setting condition.
步骤240、开启所述3D深度摄像头包括的红外摄像头,并通过所述红外摄像头及普通摄像头拍摄面部图像,关闭所述红外摄像头。Step 240: Turn on the infrared camera included in the 3D depth camera, and take a facial image through the infrared camera and the common camera to turn off the infrared camera.
在该二维图像满足设定条件的情况下,打开该3D深度摄像头包括的红外摄像头,通过所述红外摄像头对头部运动停止时刻的面部信息进行拍摄,得到深度图像,并通过普通摄像头再拍摄至少一帧脸部对应的二维图像,由该深度图像及重新拍摄的二维图像构成三维面部图像。When the two-dimensional image satisfies the setting condition, the infrared camera included in the 3D depth camera is turned on, and the face information of the head motion stop time is captured by the infrared camera to obtain a depth image, and the image is captured by the ordinary camera. The two-dimensional image corresponding to at least one frame of the face forms a three-dimensional facial image from the depth image and the re-photographed two-dimensional image.
本实施例中,在用户状态检测的过程中,通常是通过普通摄像头检测脸部运动以及单次脸部运动的终点。单次脸部运动可以是包括由上述起始时刻到头部运动停止时刻的运动过程,并且单次脸部运动的终点为头部运动停止时刻。在检测到该终点的情况下,开启红外摄像头,拍摄三维脸部图像,在通过红外摄像头拍摄得到深度图像后,关闭该红外摄像头,可以降低移动终端的功耗。In this embodiment, in the process of user state detection, the end of the facial motion and the single facial motion is usually detected by an ordinary camera. The single facial motion may be a motion process including the above-described start time to the head motion stop timing, and the end point of the single facial motion is the head motion stop timing. When the end point is detected, the infrared camera is turned on to capture a three-dimensional facial image, and after the depth image is captured by the infrared camera, the infrared camera is turned off, and the power consumption of the mobile terminal can be reduced.
在一实施例中,还可以由普通摄像头在头部运动停止时刻拍摄的第二图像与该红外摄像头拍摄的深度图像构成三维面部图像。In an embodiment, the second image captured by the normal camera at the time of stopping the head motion and the depth image captured by the infrared camera may also constitute a three-dimensional facial image.
步骤250、根据该三维面部图像确定用户状态。Step 250: Determine a user state according to the three-dimensional facial image.
根据该三维面部图像对应的景深信息确定脸部的偏移角度,并记录头部在头部运动停止位置停留的时间,用户状态包括该偏移角度及头部在头部运动停止位置停留的时间。Determining an offset angle of the face according to the depth information corresponding to the three-dimensional facial image, and recording a time when the head stays at the head motion stop position, the user state including the offset angle and the time at which the head stays at the head motion stop position .
识别该三维面部图像,确定三维图像中五官的位置,从而确定人脸区域以及人脸区域的对称轴。以该对称轴将人脸区域分为左脸区域和右脸区域。由左脸区域的设定位置提取设定数量的特征点,并以该对称轴为基准,确定该特征点在右脸区域的镜像特征点,由特征点及镜像特征点构成设定采样点对。获取每对设定采样点对的景深信息,以及设定采样点对中的特征点与镜像特征点的距离,采用反正切函数计算每对设定采样点对的参考偏移角度。以一对设定采样点为例,说明参考偏移角度的计算方案。图3是一实施例提供的一种计算参考偏移角度的方案示意图,如图3所示,L1和L2分别为特征点320和镜像特征点330到3D深度摄像头310的距离,即为特征点320和镜像特征点330对应的景深信息,W为特征点320和镜像特征点330之间的距离。假设用户的头部向左偏转,则对称轴AB由第一位置340变为对应第二位置350,并且特征点320和镜像特征点330关于第二位置处的对称轴AB对称,以对称轴AB的偏移角度作为特征点320和镜像特征点330对应的参考偏移角度α,可以采用下述公式计算参考偏移角度α:α=arctan(L2-L1)/W。The three-dimensional facial image is identified, and the position of the facial features in the three-dimensional image is determined, thereby determining the symmetry axis of the human face region and the face region. The face area is divided into a left face area and a right face area by the symmetry axis. Extracting a set number of feature points from the set position of the left face region, and determining a mirror feature point of the feature point in the right face region based on the symmetry axis, and configuring the sample point pair by the feature point and the mirror feature point . Obtain the depth of field information of each pair of set sampling point pairs, and set the distance between the feature points in the pair of sampling points and the mirrored feature points, and use the inverse tangent function to calculate the reference offset angle of each pair of set sampling point pairs. Take a pair of set sampling points as an example to illustrate the calculation scheme of the reference offset angle. FIG. 3 is a schematic diagram of a solution for calculating a reference offset angle according to an embodiment. As shown in FIG. 3, L1 and L2 are the distances between the feature point 320 and the mirrored feature point 330 to the 3D depth camera 310, respectively, which are feature points. 320 is the depth of field information corresponding to the mirrored feature point 330, and W is the distance between the feature point 320 and the mirrored feature point 330. Assuming that the user's head is deflected to the left, the axis of symmetry AB changes from the first position 340 to the corresponding second position 350, and the feature point 320 and the mirrored feature point 330 are symmetric about the axis of symmetry AB at the second position, with the axis of symmetry AB The offset angle is the reference offset angle α corresponding to the feature point 320 and the mirror feature point 330, and the reference offset angle α can be calculated by the following formula: α=arctan(L2-L1)/W.
本实施例中,采用上述公式可以计算每对设定采样点对的参考偏移角度,从而,根据该参考偏移角度确定脸部的偏移角度。例如,可以计算参考偏移角度的平均值,作为脸部的偏移角度。又如,可以对该参考偏移角度进行降序排列,将最大参考偏移角度作为脸部的偏移角度,还可以将最小参考偏移角度或位于队列中间位置的参考偏移角度作为脸部的偏移角度。In this embodiment, the reference offset angle of each pair of set sampling point pairs can be calculated by using the above formula, so that the offset angle of the face is determined according to the reference offset angle. For example, the average of the reference offset angles can be calculated as the offset angle of the face. For another example, the reference offset angle may be arranged in descending order, the maximum reference offset angle is used as the offset angle of the face, and the minimum reference offset angle or the reference offset angle at the middle of the queue may be used as the face. Offset angle.
步骤260、根据该用户状态查询预先设置的白名单,确定与该用户状态对应的控制指示。Step 260: Query a preset whitelist according to the user status, and determine a control indication corresponding to the user status.
本实施例中,用户状态包括脸部的偏移角度及头部在该偏移角度对应的位置停留的时间。In this embodiment, the user state includes an offset angle of the face and a time at which the head stays at the position corresponding to the offset angle.
步骤270、将该控制指示对应的指令发送至所述目标应用程序。Step 270: Send the control instruction corresponding instruction to the target application.
本实施例的技术方案,通过控制3D深度摄像头包括的普通摄像头按照设定周期获取脸部对应的二维图像,在该二维图像满足设定条件的情况下,打开该3D深度摄像头包括的红外摄像头,通过所述红外摄像头对头部运动停止时刻的面部信息进行拍摄,得到深度图像,实现先通过普通摄像头检测脸部运动以及单次脸部运动的终点,在检测到该终点的情况下,开启红外摄像头,以拍摄三维脸部图像,可以降低移动终端的功耗,延长续航时间。另外,判断二维图像是否满足设定条件,可以有效地避免误检测导致对目标应用程序的误控制,提高了移动终端的控制准确度。The technical solution of the embodiment obtains a two-dimensional image corresponding to the face according to a set period by controlling a common camera included in the 3D depth camera, and opens the infrared included in the 3D depth camera if the two-dimensional image satisfies the setting condition. The camera captures the face information of the head movement stop moment by the infrared camera to obtain a depth image, and realizes the end point of the facial motion and the single facial motion by the ordinary camera first, and when the end point is detected, Turn on the infrared camera to capture 3D facial images, which can reduce the power consumption of the mobile terminal and extend the battery life. In addition, determining whether the two-dimensional image satisfies the setting condition can effectively prevent the erroneous detection from causing erroneous control of the target application, and improving the control accuracy of the mobile terminal.
图4是一实施例提供的一种人机交互装置的结构框图。该装置可以用过软件和/或硬件实现,可被集成于移动终端内,例如具有3D深度摄像头的移动终端,设置为执行本实施例提供的人机交互方法。如图4所示,该装置包括:信息获取模块410,设置为在检测到目标应用程序启动的情况下,控制3D深度摄像头获取面部信息,其中,所述面部信息包括具有景深信息的面部图像;状态确定模块420,设置为根据所述面部信息确定用户状态;应用控制模块430,设置为根据所述用户状态确定控制指示,并根据所述控制指示对所述目标应用程序进行控制。FIG. 4 is a structural block diagram of a human-machine interaction apparatus according to an embodiment. The device may be implemented in software and/or hardware, and may be integrated into a mobile terminal, such as a mobile terminal having a 3D depth camera, configured to perform the human-computer interaction method provided by the embodiment. As shown in FIG. 4, the apparatus includes: an information acquisition module 410 configured to control a 3D depth camera to acquire facial information when detecting that the target application is activated, wherein the facial information includes a facial image having depth information; The state determination module 420 is configured to determine a user state according to the face information; the application control module 430 is configured to determine a control indication according to the user state, and control the target application according to the control indication.
本实施例提供的人机交互装置,基于具有景深信息的面部图像对用户面部进行跟踪,从而,得到用户头部的运动状态,由预先设置的控制指示与用户状态的对应关系,确定对应的控制指示,进而,根据该控制指示对目标应用程序进行控制,由于用户图像具有深度信息,可以检测到更多的细节信息,提高了动作检测的准确性,避免因用户误触导致应用程序误响应的问题,提升了人机交互的准确性与便捷性,使移动终端可以“看见”用户,提高了人机交互的智能性,丰富了人机交互功能的应用场景。The human-machine interaction device provided in this embodiment tracks the user's face based on the facial image having the depth information, thereby obtaining the motion state of the user's head, and determining the corresponding control by the correspondence between the preset control indication and the user state. Instructing, and further, controlling the target application according to the control instruction, since the user image has depth information, more detailed information can be detected, the accuracy of the motion detection is improved, and the application is prevented from being mis-responsive due to user error. The problem is to improve the accuracy and convenience of human-computer interaction, so that the mobile terminal can "see" the user, improve the intelligence of human-computer interaction, and enrich the application scenario of human-computer interaction function.
在一实施例中,信息获取模块410包括:二维图像获取子模块,设置为在 检测到目标应用程序启动的情况下,控制3D深度摄像头包括的普通摄像头按照设定周期获取脸部对应的二维图像;面部图像拍摄子模块,设置为在所述二维图像满足设定条件的情况下,开启所述3D深度摄像头包括的红外摄像头,并通过所述红外摄像头及普通摄像头拍摄面部图像。In an embodiment, the information acquisition module 410 includes: a two-dimensional image acquisition sub-module, configured to control the normal camera included in the 3D depth camera to acquire the face corresponding to the second period according to the set period when the target application is detected to be activated. a dimension image capturing sub-module configured to turn on an infrared camera included in the 3D depth camera and to capture a facial image through the infrared camera and the normal camera if the two-dimensional image satisfies a setting condition.
在一实施例中,面部图像拍摄子模块还设置为在通过所述红外摄像头及普通摄像头拍摄面部图像之后,关闭所述红外摄像头。In an embodiment, the facial image capturing sub-module is further configured to turn off the infrared camera after the facial image is captured by the infrared camera and the normal camera.
在一实施例中,上述装置还包括:特征确定模块,设置为在控制3D深度摄像头包括的普通摄像头按照设定周期获取脸部对应的二维图像之后,确定所述二维图像对应的面部特征;条件判断模块,设置为根据所述面部特征判断所述二维图像是否满足设定条件。In an embodiment, the apparatus further includes: a feature determining module, configured to determine a facial feature corresponding to the two-dimensional image after controlling a normal camera included in the 3D depth camera to acquire a two-dimensional image corresponding to the face according to a set period The condition determination module is configured to determine whether the two-dimensional image satisfies the setting condition according to the facial feature.
在一实施例中,条件判断模块是设置为:确定第一图像与第二图像的脸部面积差值,其中,所述第一图像为头部运动起始时刻拍摄得到的二维图像,第二图像为头部运动停止时刻拍摄得到的二维图像;将所述脸部面积差值与设定阈值进行比较,根据比较结果判断所述二维图像是否满足设定条件。In an embodiment, the condition determining module is configured to: determine a difference in face area between the first image and the second image, wherein the first image is a two-dimensional image captured by the start time of the head motion, The two images are two-dimensional images captured by the head motion stop timing; the face area difference is compared with a set threshold, and whether the two-dimensional image satisfies the setting condition is determined according to the comparison result.
在一实施例中,面部图像拍摄子模块是设置为通过如下方式实现通过所述红外摄像头及普通摄像头拍摄面部图像:通过所述红外摄像头对头部运动停止时刻的面部信息进行拍摄,得到深度图像,所述深度图像与所述第二图像构成所述面部图像。In an embodiment, the facial image capturing sub-module is configured to capture a facial image by the infrared camera and the ordinary camera by: capturing the facial information of the head motion stop moment by the infrared camera to obtain a depth image. The depth image and the second image constitute the face image.
在一实施例中,状态确定模块420是设置为:根据所述面部图像的景深信息确定脸部的偏移角度,并记录头部在所述偏移角度对应的位置停留的时间。In an embodiment, the state determining module 420 is configured to determine an offset angle of the face according to the depth information of the face image, and record a time at which the head stays at a position corresponding to the offset angle.
在一实施例中,应用控制模块430是设置为:根据所述用户状态查询预先设置的白名单,确定与所述用户状态对应的控制指示,其中,所述控制指示包括快进、后退、切换至下一个文件、切换至上一个文件及翻页;将所述控制指示对应的指令发送至所述目标应用程序,其中,所述指令用于指示所述目标应用程序响应所述控制指示,所述目标应用程序包括视频应用、音频应用及电子书。In an embodiment, the application control module 430 is configured to: query a preset whitelist according to the user status, and determine a control indication corresponding to the user status, where the control indication includes fast forward, backward, and switch Going to the next file, switching to the previous file, and turning the page; sending the instruction corresponding to the control instruction to the target application, wherein the instruction is used to indicate that the target application responds to the control indication, Target applications include video applications, audio applications, and e-books.
在一实施例中,二维图像获取子模块是设置为:控制3D深度摄像头包括的普通摄像头开启,并检测预览画面中是否包括人脸;如果检测到所述预览画面中包括人脸,则控制所述普通摄像头按照设定周期获取脸部对应的二维图像。In an embodiment, the two-dimensional image acquisition sub-module is configured to: control a normal camera included in the 3D depth camera to be turned on, and detect whether a human face is included in the preview image; if it is detected that the preview image includes a human face, then control The normal camera acquires a two-dimensional image corresponding to the face according to a set period.
本实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种人机交互方法,该方法包括:在检测到目标应用程序启动的情况下,控制3D深度摄像头获取面部信息,其中,所述面部信息包括具有景深信息的面部图像;根据所述面部信息确定用户状态;根据所述用户状态确定控制指示,并根据所述控制指示对所述目标应用程序进行控制。The embodiment further provides a storage medium comprising computer executable instructions for executing a human-computer interaction method when executed by a computer processor, the method comprising: detecting that the target application is launched a case where the control 3D depth camera acquires face information, wherein the face information includes a face image having depth information; determining a user state according to the face information; determining a control indication according to the user state, and according to the control instruction Control the target application.
存储介质——任何的至少一种类型的存储器设备或存储设备。术语“存储介质”旨在包括:安装介质,例如紧凑型光盘只读储存器(Compact Disc Read-Only Memory,CD-ROM)、软盘或磁带装置;计算机系统存储器或随机存取存储器,诸如动态随机存取存储器(Dynamic Random Access Memory,DRAM)、双倍数据速率随机存取存储器(Double Data Rate Random Access Memory,DDR RAM)、静态随机存取存储器(Static Random Access Memory,SRAM)、扩展数据输出随机存取存储器(Extended Data Output Random Access Memory,EDO RAM),兰巴斯(Rambus)随机存取存储器(Random Access Memory,RAM)等;非易失性存储器,诸如闪存、磁介质(例如硬盘或光存储);寄存器或其它相似类型的存储器元件等。存储介质可以还包括其它类型的存储器或多种类型的存储器组合。另外,存储介质可以位于程序在第一计算机系统中被执行的第一计算机系统中,或者可以位于不同的第二计算机系统中,第二计算机系统通过网络(诸如因特网)连接到第一计算机系统。第二计算机系统可以提供程序指令给第一计算机用于执行。术语“存储介质”可以包括可以驻留在不同位置中(例如在通过网络连接的不同计算机系统中)的两个或更多存储介质。存储介质可以存储可由一个或多个处理器执行的程序指令(例如程序指令实现为计算机程序)。Storage medium - any of at least one type of memory device or storage device. The term "storage medium" is intended to include: a mounting medium such as a Compact Disc Read-Only Memory (CD-ROM), a floppy disk or a tape device; a computer system memory or a random access memory such as a dynamic random Random Random Access Memory (DRAM), Double Data Rate Random Access Memory (DDR RAM), Static Random Access Memory (SRAM), Extended Data Output Random Extended Data Output Random Access Memory (EDO RAM), Rambus Random Access Memory (RAM), etc.; non-volatile memory such as flash memory, magnetic media (such as hard disk or light) Storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or multiple types of memory combinations. Additionally, the storage medium may be located in a first computer system in which the program is executed in the first computer system, or may be in a different second computer system, the second computer system being coupled to the first computer system via a network, such as the Internet. The second computer system can provide program instructions to the first computer for execution. The term "storage medium" can include two or more storage media that can reside in different locations (eg, in different computer systems connected through a network). A storage medium may store program instructions (eg, program instructions implemented as a computer program) executable by one or more processors.
本实施例所提供的一种包含计算机可执行指令的存储介质,计算机可执行指令不限于如上所述的人机交互操作,还可以执行本申请任意实施例所提供的人机交互方法中的相关操作。The storage medium includes computer-executable instructions, and the computer-executable instructions are not limited to the human-computer interaction operations as described above, and may also perform related operations in the human-computer interaction method provided by any embodiment of the present application. operating.
本实施例提供了一种移动终端,该移动终端内具有操作系统,该移动终端中可集成本实施例提供的人机交互装置。其中,移动终端可以为智能手机、平板电脑(PAD)及掌上游戏机等。图5是一实施例提供的一种移动终端的结构框图。如图5所示,该移动终端包括3D深度摄像头510、存储器520及处理器530。该3D深度摄像头510,包括普通摄像头和红外摄像头,设置为拍摄具有景深信 息的面部图像;该存储器520,设置为存储计算机程序、面部图像和用户状态与控制指示的关联关系等;该处理器530设置为读取并执行该存储器520中存储的计算机程序。该处理器530在执行该计算机程序时实现以下步骤:在检测到目标应用程序启动的情况下,控制3D深度摄像头获取面部信息,其中,所述面部信息包括具有景深信息的面部图像;根据所述面部信息确定用户状态;根据所述用户状态确定控制指示,并根据所述控制指示对所述目标应用程序进行控制。上述示例中列举的3D深度摄像头、存储器及处理器均为移动终端的部分元器件,所述移动终端还可以包括其它元器件。以智能手机为例,说明上述移动终端可能的结构。图6是一实施例提供的一种智能手机的结构框图。如图6所示,该智能手机可以包括:存储器601、中央处理器(Central Processing Unit,CPU)602(又称处理器,以下简称CPU)、外设接口603、射频(Radio Frequency,RF)电路605、音频电路606、扬声器611、触摸屏612、摄像头613、电源管理芯片608、输入/输出(I/O)子系统609、其他输入/控制设备610以及外部端口604,这些部件通过一个或多个通信总线或信号线607来通信。The embodiment provides a mobile terminal, and the mobile terminal has an operating system, and the human-machine interaction device provided in this embodiment can be integrated into the mobile terminal. The mobile terminal can be a smart phone, a tablet (PAD), and a handheld game console. FIG. 5 is a structural block diagram of a mobile terminal according to an embodiment. As shown in FIG. 5, the mobile terminal includes a 3D depth camera 510, a memory 520, and a processor 530. The 3D depth camera 510 includes a normal camera and an infrared camera configured to capture a facial image having depth information; the memory 520 is configured to store a computer program, a facial image, and an association relationship between a user state and a control indication, and the like; the processor 530 It is arranged to read and execute the computer program stored in the memory 520. The processor 530, when executing the computer program, implements the following steps: controlling the 3D depth camera to acquire facial information, in the case that the target application is detected to be activated, wherein the facial information includes a facial image having depth information; The facial information determines a user state; determines a control indication according to the user state, and controls the target application according to the control indication. The 3D depth camera, memory and processor listed in the above examples are all components of the mobile terminal, and the mobile terminal may also include other components. Taking a smart phone as an example, the possible structure of the above mobile terminal will be described. FIG. 6 is a structural block diagram of a smart phone according to an embodiment. As shown in FIG. 6, the smart phone may include: a memory 601, a central processing unit (CPU) 602 (also referred to as a processor, hereinafter referred to as a CPU), a peripheral interface 603, and a radio frequency (RF) circuit. 605, audio circuit 606, speaker 611, touch screen 612, camera 613, power management chip 608, input/output (I/O) subsystem 609, other input/control devices 610, and external port 604, one or more of these components The communication bus or signal line 607 is in communication.
图6示智能手机600仅仅是移动终端的一个范例,并且智能手机600可以具有比图6中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图6中所示出的多种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。6 shows that the smartphone 600 is merely an example of a mobile terminal, and the smartphone 600 may have more or fewer components than those shown in FIG. 6, two or more components may be combined, or may have Different component configurations. The various components shown in FIG. 6 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
下面就本实施例提供的人机交互装置的智能手机进行描述。The following describes the smart phone of the human-machine interaction device provided in this embodiment.
存储器601,所述存储器601可以被CPU602、外设接口603等访问,所述存储器601可以包括高速随机存取存储器,还可以包括非易失性存储器,例如一个或多个磁盘存储器件、闪存器件、或其他易失性固态存储器件。在存储器611中存储计算机程序,还可以存储面部信息、用户状态与控制指示的关联关系对应的白名单,以及目标应用程序对应的白名单等。The memory 601 can be accessed by the CPU 602, the peripheral interface 603, etc., and the memory 601 can include a high speed random access memory, and can also include a nonvolatile memory, such as one or more magnetic disk storage devices, flash memory devices. Or other volatile solid-state storage devices. The computer program is stored in the memory 611, and may also store face information, a white list corresponding to the association relationship between the user state and the control indication, a white list corresponding to the target application, and the like.
外设接口603,所述外设接口603可以将设备的输入和输出外设连接到CPU602和存储器601。 Peripheral interface 603, which can connect the input and output peripherals of the device to CPU 602 and memory 601.
I/O子系统609,所述I/O子系统609可以将设备上的输入输出外设,例如触摸612和其他输入/控制设备610,连接到外设接口603。I/O子系统609可以包括显示控制器6091和设置为控制其他输入/控制设备610的一个或多个输入控制器6092。其中,一个或多个输入控制器6092从其他输入/控制设备610接收 电信号或者向其他输入/控制设备610发送电信号,其他输入/控制设备610可以包括物理按钮(按压按钮、摇臂按钮等)、拨号盘、滑动开关、操纵杆、点击滚轮。值得说明的是,输入控制器6092可以与以下任一个连接:键盘、红外端口、通用串行总线(Universal Serial Bus,USB)接口以及诸如鼠标的指示设备。I/O subsystem 609, which can connect input and output peripherals on the device, such as touch 612 and other input/control devices 610, to peripheral interface 603. I/O subsystem 609 can include display controller 6091 and one or more input controllers 6092 that are configured to control other input/control devices 610. Wherein, one or more input controllers 6092 receive electrical signals from other input/control devices 610 or transmit electrical signals to other input/control devices 610, and other input/control devices 610 may include physical buttons (press buttons, rocker buttons, etc.) ), dial, slide switch, joystick, click wheel. It is worth noting that the input controller 6092 can be connected to any of the following: a keyboard, an infrared port, a Universal Serial Bus (USB) interface, and a pointing device such as a mouse.
触摸屏612,所述触摸屏612是用户终端与用户之间的输入接口和输出接口,将可视输出显示给用户,可视输出可以包括图形、文本、图标、视频等。The touch screen 612 is an input interface and an output interface between the user terminal and the user, and displays the visual output to the user. The visual output may include graphics, text, icons, videos, and the like.
摄像头613,可以是3D深度摄像头,通过所述摄像头613获取人脸的面部三维图像,并将面部三维图像转换为电信号,通过外设接口603存储于存储器601。The camera 613 may be a 3D depth camera. The three-dimensional image of the face of the face is acquired by the camera 613, and the three-dimensional image of the face is converted into an electrical signal, and stored in the memory 601 through the peripheral interface 603.
I/O子系统609中的显示控制器6061从触摸屏612接收电信号或者向触摸屏612发送电信号。触摸屏612检测触摸屏上的接触,显示控制器6091将检测到的接触转换为与显示在触摸屏612上的用户界面对象的交互,即实现人机交互,显示在触摸屏612上的用户界面对象可以是运行游戏的图标、联网到相应网络的图标等。在一实施例中,设备还可以包括光鼠,光鼠是不显示可视输出的触摸敏感表面,或者是由触摸屏形成的触摸敏感表面的延伸。Display controller 6061 in I/O subsystem 609 receives an electrical signal from touch screen 612 or an electrical signal to touch screen 612. The touch screen 612 detects the contact on the touch screen, and the display controller 6091 converts the detected contact into an interaction with the user interface object displayed on the touch screen 612, ie, realizes human-computer interaction, and the user interface object displayed on the touch screen 612 may be running. The icon of the game, the icon of the network to the corresponding network, and the like. In an embodiment, the device may also include a light mouse, which is a touch sensitive surface that does not display a visual output, or an extension of a touch sensitive surface formed by the touch screen.
RF电路605,主要设置为建立手机与无线网络(即网络侧)的通信,实现手机与无线网络的数据接收和发送。例如收发短信息、电子邮件等。在一实施例中,RF电路605接收并发送RF信号,RF信号也称为电磁信号,RF电路605将电信号转换为电磁信号或将电磁信号转换为电信号,并且通过该电磁信号与通信网络以及其他设备进行通信。RF电路605可以包括用于执行这些功能的已知电路,RF电路605包括但不限于天线系统、RF收发机、一个或多个放大器、调谐器、一个或多个振荡器、数字信号处理器、编译码器(COder-DECoder,CODEC)芯片组、用户标识模块(Subscriber Identity Module,SIM)等等。The RF circuit 605 is mainly configured to establish communication between the mobile phone and the wireless network (ie, the network side), and implement data reception and transmission between the mobile phone and the wireless network. For example, sending and receiving short messages, emails, and the like. In an embodiment, the RF circuit 605 receives and transmits an RF signal, also referred to as an electromagnetic signal, and the RF circuit 605 converts the electrical signal into an electromagnetic signal or converts the electromagnetic signal into an electrical signal, and through the electromagnetic signal and communication network And other devices to communicate. RF circuitry 605 may include known circuitry for performing these functions, including but not limited to an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, CODER-DECoder (CODEC) chipset, Subscriber Identity Module (SIM), etc.
音频电路606,主要设置为从外设接口603接收音频数据,将该音频数据转换为电信号,并且将该电信号发送给扬声器611。The audio circuit 606 is primarily configured to receive audio data from the peripheral interface 603, convert the audio data into an electrical signal, and transmit the electrical signal to the speaker 611.
扬声器611,设置为将手机通过RF电路605从无线网络接收的语音信号,还原为声音并向用户播放该声音。The speaker 611 is arranged to restore the voice signal received by the mobile phone from the wireless network through the RF circuit 605 to sound and play the sound to the user.
电源管理芯片608,设置为为CPU602、I/O子系统及外设接口所连接的硬件进行供电及电源管理。The power management chip 608 is configured to provide power and power management for the hardware connected to the CPU 602, the I/O subsystem, and the peripheral interface.
本实施例提供的移动终端,基于具有景深信息的面部图像对用户面部进行跟踪,从而,得到用户头部的运动状态,由预先设置的控制指示与用户状态的对应关系,确定对应的控制指示,进而,根据该控制指示对目标应用程序进行控制,由于用户图像具有深度信息,可以检测到更多的细节信息,提高了动作检测的准确性,避免因用户误触导致应用程序误响应的问题,提升了人机交互的准确性与便捷性,使移动终端可以“看见”用户,提高了人机交互的智能性,丰富了人机交互功能的应用场景。The mobile terminal provided in this embodiment tracks the user's face based on the facial image having the depth information, thereby obtaining the motion state of the user's head, and determining the corresponding control indication by the correspondence between the preset control indication and the user state. Further, according to the control instruction, the target application is controlled. Since the user image has depth information, more detailed information can be detected, the accuracy of the motion detection is improved, and the application incorrect response caused by the user's accidental touch is avoided. The accuracy and convenience of human-computer interaction are improved, so that the mobile terminal can "see" the user, improve the intelligence of human-computer interaction, and enrich the application scenario of the human-computer interaction function.
上述实施例中提供的人机交互装置、存储介质及移动终端可执行本申请任意实施例所提供的人机交互方法,具备执行该方法相应的功能模块和有益效果。未在上述实施例中详尽描述的技术细节,可参见本申请任意实施例所提供的人机交互方法。The human-machine interaction device, the storage medium, and the mobile terminal provided in the foregoing embodiments may perform the human-computer interaction method provided by any embodiment of the present application, and have the corresponding functional modules and beneficial effects of executing the method. For the technical details that are not described in detail in the above embodiments, refer to the human-computer interaction method provided by any embodiment of the present application.

Claims (20)

  1. 一种人机交互方法,包括:A human-computer interaction method includes:
    在检测到目标应用程序启动的情况下,控制三维3D深度摄像头获取面部信息,其中,所述面部信息包括具有景深信息的面部图像;Controlling the three-dimensional 3D depth camera to acquire facial information, wherein the facial information includes a facial image having depth of field information, in case the target application is detected to be activated;
    根据所述面部信息确定用户状态;Determining a user status based on the facial information;
    根据所述用户状态确定控制指示,并根据所述控制指示对所述目标应用程序进行控制。Determining a control indication according to the user state, and controlling the target application according to the control indication.
  2. 根据权利要求1所述的方法,其中,控制3D深度摄像头获取面部信息,包括:The method of claim 1, wherein controlling the 3D depth camera to acquire facial information comprises:
    控制3D深度摄像头包括的普通摄像头按照设定周期获取脸部对应的二维图像;Controlling a normal camera included in the 3D depth camera to acquire a two-dimensional image corresponding to the face according to a set period;
    在所述二维图像满足设定条件的情况下,开启所述3D深度摄像头包括的红外摄像头,并通过所述红外摄像头及所述普通摄像头拍摄面部图像。When the two-dimensional image satisfies the setting condition, the infrared camera included in the 3D depth camera is turned on, and the facial image is captured by the infrared camera and the normal camera.
  3. 根据权利要求2所述的方法,在所述通过所述红外摄像头及所述普通摄像头拍摄面部图像之后,还包括:关闭所述红外摄像头。The method of claim 2, after the capturing of the facial image by the infrared camera and the normal camera, further comprising: turning off the infrared camera.
  4. 根据权利要求2或3所述的方法,在控制3D深度摄像头包括的普通摄像头按照设定周期获取脸部对应的二维图像之后,还包括:The method according to claim 2 or 3, after controlling the normal camera included in the 3D depth camera to acquire the two-dimensional image corresponding to the face according to the set period, the method further includes:
    确定所述二维图像对应的面部特征;Determining a facial feature corresponding to the two-dimensional image;
    根据所述面部特征判断所述二维图像是否满足所述设定条件。Whether the two-dimensional image satisfies the setting condition is determined according to the facial feature.
  5. 根据权利要求4所述的方法,其中,根据所述面部特征判断所述二维图像是否所述满足设定条件,包括:The method according to claim 4, wherein determining whether the two-dimensional image satisfies the setting condition according to the facial feature comprises:
    确定第一图像与第二图像的脸部面积差值,其中,所述第一图像为头部运动起始时刻拍摄得到的二维图像,所述第二图像为头部运动停止时刻拍摄得到的二维图像;Determining a difference in face area between the first image and the second image, wherein the first image is a two-dimensional image captured at a start time of the head motion, and the second image is obtained by photographing a stop motion of the head motion Two-dimensional image
    将所述脸部面积差值与设定阈值进行比较,根据比较结果判断所述二维图像是否满足所述设定条件。Comparing the face area difference with a set threshold, and determining whether the two-dimensional image satisfies the setting condition according to the comparison result.
  6. 根据权利要求5所述的方法,其中,通过所述红外摄像头及所述普通摄像头拍摄面部图像,包括:The method according to claim 5, wherein the capturing of the facial image by the infrared camera and the ordinary camera comprises:
    通过所述红外摄像头对头部运动停止时刻的面部信息进行拍摄,得到深度图像,所述深度图像与所述第二图像构成所述面部图像。The face information of the head motion stop time is captured by the infrared camera to obtain a depth image, and the depth image and the second image constitute the face image.
  7. 根据权利要求1-6任一项所述的方法,其中,根据所述面部信息确定用户状态,包括:The method of any of claims 1-6, wherein determining a user status based on the facial information comprises:
    根据所述面部图像的景深信息确定脸部的偏移角度,并记录头部在所述偏移角度对应的位置停留的时间。Determining the offset angle of the face based on the depth information of the face image, and recording the time at which the head stays at the position corresponding to the offset angle.
  8. 根据权利要求1至7中任一项所述的方法,其中,根据所述用户状态确定控制指示,并根据所述控制指示对所述目标应用程序进行控制,包括:The method according to any one of claims 1 to 7, wherein determining the control indication according to the user state and controlling the target application according to the control indication comprises:
    根据所述用户状态查询预先设置的白名单,确定与所述用户状态对应的控制指示,其中,所述控制指示包括快进、后退、切换至下一个文件、切换至上一个文件及翻页;Determining, according to the user status, a preset white list, and determining a control indication corresponding to the user status, where the control indication includes fast forward, backward, switch to a next file, switch to a previous file, and page turning;
    将所述控制指示对应的指令发送至所述目标应用程序,其中,所述指令用于指示所述目标应用程序响应所述控制指示,所述目标应用程序包括视频应用、音频应用及电子书。And transmitting, by the target application, to the target application, where the target application includes a video application, an audio application, and an e-book.
  9. 根据权利要求2-6任一项所述的方法,其中,所述控制3D深度摄像头包括的普通摄像头按照设定周期获取脸部对应的二维图像包括:The method according to any one of claims 2-6, wherein the controlling the 3D depth camera to acquire the two-dimensional image corresponding to the face according to the set period comprises:
    控制3D深度摄像头包括的普通摄像头开启,并检测预览画面中是否包括人脸;Controlling a normal camera included in the 3D depth camera to be turned on, and detecting whether a human face is included in the preview image;
    如果检测到所述预览画面中包括人脸,则控制所述普通摄像头按照设定周期获取脸部对应的二维图像。If it is detected that the preview image includes a human face, the normal camera is controlled to acquire a two-dimensional image corresponding to the face according to the set period.
  10. 一种人机交互装置,包括:A human-machine interaction device includes:
    信息获取模块,设置为在检测到目标应用程序启动的情况下,控制三维3D深度摄像头获取面部信息,其中,所述面部信息包括具有景深信息的面部图像;An information acquiring module, configured to control a three-dimensional 3D depth camera to acquire facial information when detecting that the target application is activated, wherein the facial information includes a facial image having depth information;
    状态确定模块,设置为根据所述面部信息确定用户状态;a state determining module, configured to determine a user state according to the face information;
    应用控制模块,设置为根据所述用户状态确定控制指示,并根据所述控制指示对所述目标应用程序进行控制。The application control module is configured to determine a control indication according to the user state, and control the target application according to the control indication.
  11. 根据权利要求10所述的装置,其中,所述信息获取模块包括:The apparatus of claim 10, wherein the information acquisition module comprises:
    二维图像获取子模块,设置为在检测到目标应用程序启动的情况下,控制3D深度摄像头包括的普通摄像头按照设定周期获取脸部对应的二维图像;The two-dimensional image acquisition sub-module is configured to, when the target application is detected to be activated, control the ordinary camera included in the 3D depth camera to acquire the two-dimensional image corresponding to the face according to the set period;
    面部图像拍摄子模块,设置为在所述二维图像满足设定条件的情况下,开启所述3D深度摄像头包括的红外摄像头,并通过所述红外摄像头及所述普通摄像头拍摄面部图像。The facial image capturing sub-module is configured to open an infrared camera included in the 3D depth camera and capture a facial image through the infrared camera and the normal camera if the two-dimensional image satisfies a setting condition.
  12. 根据权利要求11所述的装置,其中,所述面部图像拍摄子模块还设置为在通过所述红外摄像头及所述普通摄像头拍摄面部图像之后,关闭所述红外摄像头。The apparatus of claim 11, wherein the facial image capture sub-module is further configured to turn off the infrared camera after capturing a facial image through the infrared camera and the normal camera.
  13. 根据权利要求11或12所述的装置,还包括:The apparatus of claim 11 or 12, further comprising:
    特征确定模块,设置为在控制3D深度摄像头包括的普通摄像头按照设定周期获取脸部对应的二维图像之后,确定所述二维图像对应的面部特征;a feature determining module, configured to determine a facial feature corresponding to the two-dimensional image after the normal camera included in the control 3D depth camera acquires the two-dimensional image corresponding to the face according to the set period;
    条件判断模块,设置为根据所述面部特征判断所述二维图像是否满足设定条件。The condition determination module is configured to determine whether the two-dimensional image satisfies a set condition according to the facial feature.
  14. 根据权利要求13所述的装置,其中,所述条件判断模块是设置为:The apparatus of claim 13 wherein said condition determination module is configured to:
    确定第一图像与第二图像的脸部面积差值,其中,所述第一图像为头部运动起始时刻拍摄得到的二维图像,所述第二图像为头部运动停止时刻拍摄得到的二维图像;Determining a difference in face area between the first image and the second image, wherein the first image is a two-dimensional image captured at a start time of the head motion, and the second image is obtained by photographing a stop motion of the head motion Two-dimensional image
    将所述脸部面积差值与设定阈值进行比较,根据比较结果判断所述二维图像是否满足所述设定条件。Comparing the face area difference with a set threshold, and determining whether the two-dimensional image satisfies the setting condition according to the comparison result.
  15. 根据权利要求14所述的装置,其中,所述面部图像拍摄子模块是设置为通过如下方式实现通过所述红外摄像头及所述普通摄像头拍摄面部图像:The apparatus of claim 14, wherein the facial image capturing sub-module is configured to capture a facial image through the infrared camera and the normal camera by:
    通过所述红外摄像头对头部运动停止时刻的面部信息进行拍摄,得到深度图像,所述深度图像与所述第二图像构成所述面部图像。The face information of the head motion stop time is captured by the infrared camera to obtain a depth image, and the depth image and the second image constitute the face image.
  16. 根据权利要求10-15任一项所述的装置,其中,所述状态确定模块是设置为:根据所述面部图像的景深信息确定脸部的偏移角度,并记录头部在所述偏移角度对应的位置停留的时间。The apparatus according to any one of claims 10-15, wherein the state determining module is configured to: determine an offset angle of a face according to depth information of the face image, and record a head at the offset The time at which the angle corresponds to the position.
  17. 根据权利要求10-16任一项所述的装置,其中,所述应用控制模块是设置为:The apparatus of any of claims 10-16, wherein the application control module is configured to:
    根据所述用户状态查询预先设置的白名单,确定与所述用户状态对应的控制指示,其中,所述控制指示包括快进、后退、切换至下一个文件、切换至上一个文件及翻页;Determining, according to the user status, a preset white list, and determining a control indication corresponding to the user status, where the control indication includes fast forward, backward, switch to a next file, switch to a previous file, and page turning;
    将所述控制指示对应的指令发送至所述目标应用程序,其中,所述指令用于指示所述目标应用程序响应所述控制指示,所述目标应用程序包括视频应用、音频应用及电子书。And transmitting, by the target application, to the target application, where the target application includes a video application, an audio application, and an e-book.
  18. 根据权利要求11-15任一项所述的方法,其中,所述二维图像获取子模块是设置为:The method according to any one of claims 11 to 15, wherein the two-dimensional image acquisition sub-module is set to:
    控制3D深度摄像头包括的普通摄像头开启,并检测预览画面中是否包括人脸;Controlling a normal camera included in the 3D depth camera to be turned on, and detecting whether a human face is included in the preview image;
    如果检测到所述预览画面中包括人脸,则控制所述普通摄像头按照设定周期获取脸部对应的二维图像。If it is detected that the preview image includes a human face, the normal camera is controlled to acquire a two-dimensional image corresponding to the face according to the set period.
  19. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至9中任一项所述的人机交互方法。A computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the human-computer interaction method according to any one of claims 1 to 9.
  20. 一种移动终端,包括三维3D深度摄像头、存储器,处理器及存储在所述存储器上并可在所述处理器运行的计算机程序,所述3D深度摄像头包括普通摄像头和红外摄像头,设置为拍摄具有景深信息的面部图像,所述处理器执行所述计算机程序时实现如权利要求1至9中任一项所述的人机交互方法。A mobile terminal includes a three-dimensional 3D depth camera, a memory, a processor and a computer program stored on the memory and operable on the processor, the 3D depth camera including a normal camera and an infrared camera, configured to have a photographing A facial image of the depth of field information, the processor executing the computer program to implement the human-computer interaction method according to any one of claims 1 to 9.
PCT/CN2018/122308 2018-01-03 2018-12-20 Method and device for man-machine interaction, medium, and mobile terminal WO2019134527A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810005036.7 2018-01-03
CN201810005036.7A CN108241434B (en) 2018-01-03 2018-01-03 Man-machine interaction method, device and medium based on depth of field information and mobile terminal

Publications (1)

Publication Number Publication Date
WO2019134527A1 true WO2019134527A1 (en) 2019-07-11

Family

ID=62699338

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/122308 WO2019134527A1 (en) 2018-01-03 2018-12-20 Method and device for man-machine interaction, medium, and mobile terminal

Country Status (2)

Country Link
CN (1) CN108241434B (en)
WO (1) WO2019134527A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241434B (en) * 2018-01-03 2020-01-14 Oppo广东移动通信有限公司 Man-machine interaction method, device and medium based on depth of field information and mobile terminal
CN109240570A (en) * 2018-08-29 2019-01-18 维沃移动通信有限公司 A kind of page turning method, device and terminal
US11048375B2 (en) * 2018-09-18 2021-06-29 Alibaba Group Holding Limited Multimodal 3D object interaction system
CN110956603B (en) * 2018-09-25 2023-04-21 Oppo广东移动通信有限公司 Detection method and device for edge flying spot of depth image and electronic equipment
CN111367598B (en) * 2018-12-26 2023-11-10 三六零科技集团有限公司 Method and device for processing action instruction, electronic equipment and computer readable storage medium
CN110502110B (en) * 2019-08-07 2023-08-11 北京达佳互联信息技术有限公司 Method and device for generating feedback information of interactive application program
CN110662129A (en) * 2019-09-26 2020-01-07 联想(北京)有限公司 Control method and electronic equipment
CN111126163A (en) * 2019-11-28 2020-05-08 星络智能科技有限公司 Intelligent panel, interaction method based on face angle detection and storage medium
CN113091227B (en) * 2020-01-08 2022-11-01 佛山市云米电器科技有限公司 Air conditioner control method, cloud server, air conditioner control system and storage medium
CN111327888B (en) * 2020-03-04 2022-09-30 广州腾讯科技有限公司 Camera control method and device, computer equipment and storage medium
CN111583355B (en) * 2020-05-09 2024-01-23 维沃移动通信有限公司 Face image generation method and device, electronic equipment and readable storage medium
CN112529770B (en) * 2020-12-07 2024-01-26 维沃移动通信有限公司 Image processing method, device, electronic equipment and readable storage medium
CN115086095A (en) * 2021-03-10 2022-09-20 Oppo广东移动通信有限公司 Equipment control method and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120132930A (en) * 2011-05-30 2012-12-10 김호진 Display device and display method based on user motion
US20130044135A1 (en) * 2011-08-19 2013-02-21 Hon Hai Precision Industry Co., Ltd. Electronic book and method for controlling display of files
CN103218124A (en) * 2013-04-12 2013-07-24 北京国铁华晨通信信息技术有限公司 Depth-camera-based menu control method and system
CN106648042A (en) * 2015-11-04 2017-05-10 重庆邮电大学 Identification control method and apparatus
CN107506752A (en) * 2017-09-18 2017-12-22 艾普柯微电子(上海)有限公司 Face identification device and method
CN108241434A (en) * 2018-01-03 2018-07-03 广东欧珀移动通信有限公司 Man-machine interaction method, device, medium and mobile terminal based on depth of view information

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9628843B2 (en) * 2011-11-21 2017-04-18 Microsoft Technology Licensing, Llc Methods for controlling electronic devices using gestures
CN103268153B (en) * 2013-05-31 2016-07-06 南京大学 Based on the man-machine interactive system of computer vision and exchange method under demo environment
CN107479801B (en) * 2017-07-31 2020-06-02 Oppo广东移动通信有限公司 Terminal display method and device based on user expression and terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120132930A (en) * 2011-05-30 2012-12-10 김호진 Display device and display method based on user motion
US20130044135A1 (en) * 2011-08-19 2013-02-21 Hon Hai Precision Industry Co., Ltd. Electronic book and method for controlling display of files
CN103218124A (en) * 2013-04-12 2013-07-24 北京国铁华晨通信信息技术有限公司 Depth-camera-based menu control method and system
CN106648042A (en) * 2015-11-04 2017-05-10 重庆邮电大学 Identification control method and apparatus
CN107506752A (en) * 2017-09-18 2017-12-22 艾普柯微电子(上海)有限公司 Face identification device and method
CN108241434A (en) * 2018-01-03 2018-07-03 广东欧珀移动通信有限公司 Man-machine interaction method, device, medium and mobile terminal based on depth of view information

Also Published As

Publication number Publication date
CN108241434B (en) 2020-01-14
CN108241434A (en) 2018-07-03

Similar Documents

Publication Publication Date Title
WO2019134527A1 (en) Method and device for man-machine interaction, medium, and mobile terminal
US11640235B2 (en) Additional object display method and apparatus, computer device, and storage medium
US9361512B2 (en) Identification of a gesture
TWI564791B (en) Broadcast control system, method, computer program product and computer readable medium
WO2020103526A1 (en) Photographing method and device, storage medium and terminal device
US20200293754A1 (en) Task execution method, terminal device, and computer readable storage medium
WO2019218880A1 (en) Interaction recognition method and apparatus, storage medium, and terminal device
US10062393B2 (en) Method for recording sound of video-recorded object and mobile terminal
US20170192500A1 (en) Method and electronic device for controlling terminal according to eye action
JP6100286B2 (en) Gesture detection based on information from multiple types of sensors
US11276183B2 (en) Relocalization method and apparatus in camera pose tracking process, device, and storage medium
US11785331B2 (en) Shooting control method and terminal
WO2013000381A1 (en) Method for controlling state of mobile terminal and mobile terminal
CN108766438B (en) Man-machine interaction method and device, storage medium and intelligent terminal
US9275275B2 (en) Object tracking in a video stream
WO2019183784A1 (en) Method and electronic device for video recording
CN108616775B (en) Method and device for intelligently capturing picture during video playing, storage medium and intelligent terminal
KR20140104753A (en) Image preview using detection of body parts
US11102409B2 (en) Electronic device and method for obtaining images
KR20170098102A (en) Method, storage medium and electronic device for providing a plurality of images
KR20210124313A (en) Interactive object driving method, apparatus, device and recording medium
US9250723B2 (en) Method and apparatus for stroke acquisition and ultrasonic electronic stylus
CN112650405A (en) Electronic equipment interaction method and electronic equipment
WO2019218879A1 (en) Photographing interaction method and apparatus, storage medium and terminal device
US20170168582A1 (en) Click response processing method, electronic device and system for motion sensing control

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18898162

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18898162

Country of ref document: EP

Kind code of ref document: A1