WO2020244160A1 - 终端设备控制方法、装置、计算机设备及可读存储介质 - Google Patents

终端设备控制方法、装置、计算机设备及可读存储介质 Download PDF

Info

Publication number
WO2020244160A1
WO2020244160A1 PCT/CN2019/118974 CN2019118974W WO2020244160A1 WO 2020244160 A1 WO2020244160 A1 WO 2020244160A1 CN 2019118974 W CN2019118974 W CN 2019118974W WO 2020244160 A1 WO2020244160 A1 WO 2020244160A1
Authority
WO
WIPO (PCT)
Prior art keywords
state change
change information
preset
information
key feature
Prior art date
Application number
PCT/CN2019/118974
Other languages
English (en)
French (fr)
Inventor
车宏伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020244160A1 publication Critical patent/WO2020244160A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • This application relates to the field of electronic communication technology, and in particular to a terminal equipment control method, device, computer equipment and non-volatile readable storage medium.
  • An embodiment of the present application provides a terminal device control method, the method includes: acquiring an image to be recognized, and performing face detection on the image to be recognized; determining whether a face image is detected; if a face image is detected, Acquiring the initial state information of the preset key feature points of the face image; determining the state change information of the preset key feature points of the face image based on the initial state information; and when the preset key feature points
  • the state change information of is the first preset state change information in the preset state change information database
  • the control instruction corresponding to the first preset state change information is triggered to execute a corresponding control operation.
  • An embodiment of the present application provides a terminal device control device.
  • the device includes: a detection module for acquiring an image to be recognized and performing face detection on the image to be recognized; a judgment module for determining whether a person is detected Face image; an acquisition module for acquiring initial state information of preset key feature points of the face image when a face image is detected; a determining module for determining the face image based on the initial state information State change information of the preset key feature point; and a control module for triggering the state change information of the preset key feature point when the state change information of the preset key feature point is the first preset state change information in the preset state change information database
  • the control instruction corresponding to the first preset state change information executes a corresponding control operation.
  • An embodiment of the present application provides a computer device that includes a processor and a memory, and a number of computer-readable instructions are stored on the memory.
  • the processor is used to execute the computer-readable instructions stored in the memory, The steps of the terminal device control method described above.
  • An embodiment of the present application provides a non-volatile readable storage medium having computer readable instructions stored thereon, and when the computer readable instructions are executed by a processor, the steps of the terminal device control method as described above are realized.
  • the above-mentioned terminal equipment control method, device, computer equipment and non-volatile readable storage medium realize the control of the terminal equipment by recognizing the user’s facial expression change or head deflection state, freeing the user’s hands, compared to the traditional manual operation method, and Terminal device interaction is more lively and interesting, which improves user experience.
  • Fig. 1 is a flow chart of the steps of a terminal device control method in an embodiment of the application.
  • Fig. 2 is a flowchart of steps of a terminal device control method in another embodiment of the application.
  • Fig. 3 is a functional module diagram of a terminal device control apparatus in an embodiment of the application.
  • Figure 4 is a schematic diagram of a computer device in an embodiment of the application.
  • the expression interaction method of the present application is applied to one or more computer devices.
  • the computer device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor and an Application Specific Integrated Circuit (ASIC) , Field-Programmable Gate Array (FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • embedded equipment etc.
  • the computer device may be a computing device such as a desktop computer, a notebook computer, a tablet computer, a server, and a mobile phone.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • FIG. 1 is a flowchart of the steps of a preferred embodiment of the terminal device control method of the present application. According to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
  • the terminal device control method specifically includes the following steps.
  • Step S11 Obtain an image to be recognized, and perform face detection on the image to be recognized.
  • the image to be recognized may be obtained by communicating with a camera (such as the camera of the computer device).
  • the image to be recognized may include a non-face image, so the image to be recognized needs to be Face detection to identify a face image containing a human face in the image to be identified.
  • a convolutional neural network model can be established and trained to realize face detection on the image to be recognized.
  • the face detection on the image to be recognized can be implemented in the following manner: a face sample database can be constructed first and a convolutional neural network model for face detection can be established, the face sample database includes Face information of multiple people, each person’s face information can include multiple angles, and each angle of face information can have multiple pictures; input the face image in the face sample database to the convolutional neural network
  • the model uses the default parameters of the convolutional neural network model for convolutional neural network training; according to the intermediate training results, the initial weight, training rate, number of iterations, etc.
  • the network parameters of the default parameters are continuously adjusted until the optimal convolutional neural network is obtained.
  • the convolutional neural network model with the optimal network parameters is used as the final recognition model.
  • the finally obtained convolutional neural network model can be used for face detection.
  • the image to be recognized can be input to the finally obtained convolutional neural network model, and the output of the model is the face detection result.
  • Step S12 Determine whether a face image is detected.
  • Step S13 If a face image is detected, the initial state information of the preset key feature points of the face image is acquired.
  • the preset key feature points of the face image may be composed of parts such as eyes, nose, and mouth.
  • the initial state information may include initial position information or initial expression information; when the initial state information is the initial position information of the preset key feature points, it can be implemented according to the motion state information of the face image. Control operation; when the initial state information is the initial expression information of the preset key feature points, corresponding control operations can be executed according to the expression change information of the face image.
  • the initial state information of the preset key feature points of the face image is the initial state information of the preset key feature points of the face image. location information.
  • the position information of the preset key feature points of the face image can be determined from the face image by an integral projection method or a face alignment algorithm (for example: ASM algorithm, AAM algorithm, STASM algorithm, etc.). Since eyes are the more prominent facial features in human faces, the eyes can be accurately located first, and other organs of the face, such as eyebrows, mouth, nose, etc., can be more accurately located based on the potential distribution relationship.
  • the location of the preset key feature points is performed by corresponding to the crests or troughs generated under different integral projection methods.
  • integral projection is divided into vertical projection and horizontal projection.
  • Let f(x, y) represent the gray value of the image (x, y), and the horizontal integral projection in the image [y1, y2] and [x1, x2] area M h (y) and vertical integral projection M v (x) are respectively expressed as:
  • the horizontal integral projection is to accumulate the gray values of all pixels in a row before displaying
  • the vertical integral projection is to accumulate the gray values of all pixels in a column before displaying.
  • the left and right boundary of the face image can be located. After positioning the left and right boundaries, binarize the face image to be recognized, and perform horizontal integral projection and vertical integral projection respectively.
  • the eyebrows and eyes are the relatively black areas in the face image, which correspond to the first two minimum points on the horizontal integral projection curve.
  • the first minimum point corresponds to the position of the eyebrows on the vertical axis, denoted as y brow
  • the second minimum point corresponds to the position of the eye on the vertical axis, denoted as y eye
  • the third pole corresponds to the position of the nose on the vertical axis, recorded as y nose
  • the fourth minimum point corresponds to the position of the mouth on the vertical axis, recorded as y month .
  • the initial state information of the preset key feature points of the face image is the initial state of the preset key feature points of the face image.
  • Emoji information may have the following manifestations: facial movements when happy: the corners of the mouth are raised, the cheeks are wrinkled, the eyelids are contracted, and "crow's feet" are formed at the tails of the eyes. Facial features when sad: squinting, tightening of eyebrows, pulling down of the corners of the mouth, lifting or tightening of the chin. Facial features in fear: open mouth and eyes, raised eyebrows, and open nostrils. Facial features in anger: drooping eyebrows, tight forehead, tight eyelids and lips.
  • Facial features when disgusted sniffing, raising upper lip, drooping eyebrows, squinting. Facial features when surprised: drooping jaw, relaxed lips and mouth, eyes widened, eyelids and eyebrows slightly raised. Facial features when contemptuous: one side of the mouth is raised, sneers or triumphant smiles, etc.
  • the feature vector to be identified may include a shape feature vector and/or a texture feature vector.
  • the shape feature vector in the preset key feature point is extracted; when the feature vector to be identified is a texture feature vector, all the feature vectors are extracted.
  • the probability of similarity between the face image and each of the preset expressions may be determined by the following method: obtaining the distance value between the feature vector to be recognized and the preset feature vector of each preset expression According to the distance value, determine the similarity probability that the facial image and the preset expression corresponding to the distance value belong to the same kind of expression.
  • the distance value may be a generalized Mahalanobis distance.
  • the distance value between the feature vector to be recognized and the preset feature vector of the preset expression can be determined by the following formula:
  • y is the feature vector to be recognized
  • x j is the preset feature vector of the j-th preset expression in the preset expression library
  • M is the target metric matrix
  • j is an integer greater than or equal to 1
  • d M (y, x j ) is the distance value between the feature vector to be recognized and the preset feature vector of the j-th preset expression in the preset expression library
  • (yx j ) is the prediction of the feature vector to be recognized and the j-th preset expression Set the difference of the feature vector
  • (yx j ) T is the transpose of the difference between the feature vector to be recognized and the preset feature vector of the j-th preset expression.
  • the similarity probability that the preset expression corresponding to the face image and the distance value belongs to the same kind of expression can be determined by the following formula:
  • p is the similarity probability that the preset expression corresponding to the face image and the distance value belongs to the same kind of expression
  • D is the distance value
  • b is the offset.
  • Step S14 Determine state change information of preset key feature points of the face image based on the initial state information.
  • the state change information of the preset key feature points of the face image may be determined based on the initial state information.
  • the state change information is based on the initial state information, for example, starting from the initial state information, state change information within a preset time.
  • Step S15 When the state change information of the preset key feature point is the first preset state change information in the preset state change information database, trigger the control instruction corresponding to the first preset state change information to execute the corresponding Control operation.
  • the control instruction corresponding to the first preset state change information is triggered , And then the terminal device will execute a corresponding control operation according to the control instruction.
  • the terminal device executes the previous page control instruction, and when the acquired state change information of the preset key feature point is When the state change information is the action of the head deflecting to the right, the terminal device executes the next page control instruction, and when the acquired state change information of the preset key feature point is a nod, the terminal device performs playback or Pause instruction.
  • the step S15 may further include: when the state change information of the preset key feature point is the first preset state change information in the preset state change information database When the state change information of the preset key feature point is valid state change information; and when the state change information of the preset key feature point is valid state change information, trigger the first preset state change
  • the control instruction corresponding to the information executes the corresponding control operation.
  • the average deflection speed and/or deflection angle of the face image in the state change process is acquired, so as The average deflection speed and/or the deflection angle determine whether the state change information of the preset key feature point is valid state change information. For example, if the initial state information is head movement, the average head deflection speed and/or deflection angle during the state change can be obtained to determine the current state change information of the preset key feature points Whether it is valid status change information.
  • a preset speed can be set Value to avoid wrong control of terminal equipment. For example, it can be judged whether the average head movement speed during the current state change is less than the first preset speed value, and if the average head movement speed is less than the first preset speed value, then the current time of the preset key feature points can be determined
  • the state change information is the effective state change information, and the corresponding control instruction is generated based on the effective state change information.
  • the preset speed value may have a positive or negative deviation of 30%.
  • the first angle threshold may be set to a larger angle value than the deflection angle generated by the user's usual communication.
  • the current state change information of the preset key feature points can also be determined by simultaneously determining whether the average head movement speed is less than the first preset speed value and whether the deflection angle of the head is greater than or equal to the first angle threshold. Whether it is valid status change information.
  • the expression duration of the face image in the state change process is acquired to determine according to the expression duration Whether the state change information of the preset key feature point is valid state change information. For example, it is possible to determine whether the current state change information of the preset key feature point is valid state change information by obtaining whether the duration of the facial expression during the state change is greater than or equal to a preset time. If the duration of the facial expression during the current state change is greater than or equal to the preset time, it is determined that the current state change information is valid state change information, and the corresponding control command is generated based on the effective state change information. When the duration of the facial expression during the state change is less than the preset time, it is determined that the current state change information is invalid state change information, and no corresponding control instruction is generated.
  • it may also be based on the difference between the time node of the current state change information of the preset key feature point and the time node of the control instruction generated by the preset key feature point last time, To determine whether the current state change information of the preset key feature point is valid state change information. For example, acquiring the occurrence time of the state change information of the preset key feature point, and determining whether the difference between the occurrence time of the state change information of the preset key feature point and the occurrence time of the previous state change information is greater than or equal to a preset time.
  • the current state change information is determined to be valid state change information, and generated based on the effective state change information Corresponding control instruction; if the difference between the occurrence time of this state change information and the occurrence time of the previous state change information is less than the preset time, it is determined that the current state change information is invalid state change information, and no corresponding Control instruction.
  • the terminal device control method shown in FIG. 2 further includes step S16 and step S17.
  • Step S16 Configure up, down, left, and right detection boundary information of the preset key feature points to establish a feature point detection frame.
  • the feature point detection frame is used to detect state change information of the preset key feature points.
  • a feature point detection frame can be established.
  • facial behavior information head deflection, facial expression
  • Step S17 Associate multiple preset state change information of the preset key feature points with multiple preset control instructions.
  • a mapping relationship between a plurality of preset state change information and a plurality of preset control instructions of the terminal device may be established in advance.
  • the preset first state change information is associated with the first preset control instruction of the terminal device in advance
  • the second preset state change information is associated with the second preset control instruction of the terminal device
  • the third preset The state change information is associated with the third preset control instruction of the terminal device.
  • the preset control instruction may be some commonly used instructions in the terminal device, such as next page, previous page, play, pause, left mouse button, right mouse button, etc.
  • the preset first state change information is that the head deflection to the right corresponds to the next page control instruction
  • the preset second state change information is that the head deflection to the left corresponds to the previous page control instruction
  • the preset third state change information is nod.
  • the preset first state change information is that the facial expression changes to a happy expression corresponding to the next page of control instructions
  • the preset second state change information is the facial expression changes to a sad expression corresponding to the previous page Control instruction.
  • the above terminal device control method realizes the control of the terminal device by recognizing the user's facial expression change or head deflection state, freeing the user's hands, compared with the traditional manual operation method, interacting with the terminal device is more vivid and interesting, and improving the user experience.
  • FIG. 3 is a functional module diagram of a preferred embodiment of a terminal device control apparatus of this application.
  • the terminal device control apparatus 10 may include a configuration module 101, an association module 102, a detection module 103, a judgment module 104, an acquisition module 105, a determination module 106, and a control module 107.
  • the configuration module 101 is configured to configure the upper, lower, left, and right detection boundary information of the preset key feature points to establish a feature point detection frame.
  • the feature point detection frame is used to detect state change information of the preset key feature points.
  • the configuration module 101 can establish and obtain a feature point detection frame by configuring the detection boundary information of the key feature points on the upper and lower left.
  • facial behavior information head deflection, facial expression
  • it is necessary to ensure the key features of the face The point always falls within the detection frame of the feature point to avoid affecting the detection accuracy.
  • the associating module 102 is configured to associate a plurality of preset state change information of the preset key feature points with a plurality of preset control instructions.
  • the association module 102 may pre-establish a mapping relationship between multiple preset state change information and multiple preset control instructions of the terminal device. For example, the associating module 102 previously associates the preset first state change information with the first preset control instruction of the terminal device, and associates the second preset state change information with the second preset control instruction of the terminal device. Associating the third preset state change information with the third preset control instruction of the terminal device.
  • the preset control instruction may be some commonly used instructions in the terminal device, such as next page, previous page, play, pause, left mouse button, right mouse button, etc.
  • the preset first state change information is that the head deflection to the right corresponds to the next page control instruction
  • the preset second state change information is that the head deflection to the left corresponds to the previous page control instruction
  • the preset third state change information is nod.
  • the preset first state change information is that the facial expression changes to a happy expression corresponding to the next page of control instructions
  • the preset second state change information is the facial expression changes to a sad expression corresponding to the previous page Control instruction.
  • the detection module 103 is configured to obtain an image to be recognized, and perform face detection on the image to be recognized.
  • the detection module 103 may obtain the image to be recognized by communicating with a camera (such as the camera of the computer device).
  • the image to be recognized may include a non-face image, so the Face detection is performed on the image to be recognized, so as to recognize a face image containing a human face in the image to be recognized.
  • the detection module 103 may implement face detection on the image to be recognized by establishing and training a convolutional neural network model.
  • the face detection on the image to be recognized can be implemented in the following manner: a face sample database can be constructed first and a convolutional neural network model for face detection can be established, the face sample database includes Face information of multiple people, each person’s face information can include multiple angles, and each angle of face information can have multiple pictures; input the face image in the face sample database to the convolutional neural network
  • the model uses the default parameters of the convolutional neural network model for convolutional neural network training; according to the intermediate training results, the initial weight, training rate, number of iterations, etc.
  • the network parameters of the default parameters are continuously adjusted until the optimal convolutional neural network is obtained.
  • the convolutional neural network model with the optimal network parameters is used as the final recognition model.
  • the finally obtained convolutional neural network model can be used for face detection.
  • the detection module 103 can input the image to be recognized into the finally obtained convolutional neural network model, and the output of the model is the face detection result.
  • the judgment module 104 is used to judge whether a face image is detected.
  • the judgment module 104 may judge whether a face image is detected according to the output of the convolutional neural network model. If a face image is detected, subsequent key feature point recognition is performed. If no face image is detected, face detection is performed on the image to be recognized again.
  • the acquiring module 105 is configured to acquire initial state information of preset key feature points of the face image when the face image is detected.
  • the preset key feature points of the face image may be composed of parts such as eyes, nose, and mouth.
  • the initial state information may include initial position information or initial expression information; when the initial state information is the initial position information of the preset key feature points, it can be implemented according to the motion state information of the face image. Control operation; when the initial state information is the initial expression information of the preset key feature points, corresponding control operations can be executed according to the expression change information of the face image.
  • the initial state information of the preset key feature points of the face image is the initial state information of the preset key feature points of the face image. location information.
  • the position information of the preset key feature points of the face image can be determined from the face image by an integral projection method or a face alignment algorithm (for example, ASM algorithm, AAM algorithm, STASM algorithm, etc.). Since eyes are the more prominent facial features in human faces, the eyes can be accurately located first, and other organs of the face, such as eyebrows, mouth, nose, etc., can be more accurately located based on the potential distribution relationship.
  • the location of the preset key feature points is performed by corresponding to the crests or troughs generated under different integral projection methods.
  • integral projection is divided into vertical projection and horizontal projection.
  • Let f(x, y) represent the gray value of the image (x, y), and the horizontal integral projection in the image [y1, y2] and [x1, x2] area M h (y) and vertical integral projection M v (x) are respectively expressed as:
  • the horizontal integral projection is to accumulate the gray values of all pixels in a row before displaying
  • the vertical integral projection is to accumulate the gray values of all pixels in a column before displaying.
  • the left and right boundary of the face image can be located. After positioning the left and right boundaries, binarize the face image to be recognized, and perform horizontal integral projection and vertical integral projection respectively.
  • the eyebrows and eyes are the relatively black areas in the face image, which correspond to the first two minimum points on the horizontal integral projection curve.
  • the first minimum point corresponds to the position of the eyebrows on the vertical axis, denoted as y brow
  • the second minimum point corresponds to the position of the eye on the vertical axis, denoted as y eye
  • the third pole corresponds to the position of the nose on the vertical axis, recorded as y nose
  • the fourth minimum point corresponds to the position of the mouth on the vertical axis, recorded as y month .
  • the initial state information of the preset key feature points of the face image is the initial state information of the preset key feature points of the face image.
  • Emoji information may have the following manifestations: facial movements when happy: the corners of the mouth are raised, the cheeks are wrinkled, the eyelids are contracted, and "crow's feet" are formed at the tails of the eyes. Facial features when sad: squinting, tightening of eyebrows, pulling down of the corners of the mouth, lifting or tightening of the chin. Facial features in fear: open mouth and eyes, raised eyebrows, and open nostrils. Facial features in anger: drooping eyebrows, tight forehead, tight eyelids and lips.
  • Facial features when disgusted sniffing, raising upper lip, drooping eyebrows, squinting. Facial features when surprised: drooping jaw, relaxed lips and mouth, eyes widened, eyelids and eyebrows slightly raised. Facial features when contemptuous: one side of the mouth is raised, sneers or smug smiles, etc.
  • the acquiring module 105 may extract the feature vector to be recognized of the preset key feature point, and determine the feature vector according to the feature vector to be recognized and the preset feature vector of each preset expression in the preset expression library.
  • the face image and each of the preset expressions belong to a similar probability, and then the facial expression information is obtained according to the calculated similarity probability.
  • the feature vector to be recognized may include a shape feature vector and/or a texture feature vector.
  • the shape feature vector in the preset key feature point is extracted; when the feature vector to be identified is a texture feature vector, all the feature vectors are extracted.
  • the acquiring module 105 may determine the probability of similarity between the face image and each of the preset expressions in the following manner: acquiring the feature vector to be recognized and the preset feature vector of each preset expression According to the distance value, determine the similarity probability that the face image and the preset expression corresponding to the distance value belong to the same kind of expression.
  • the distance value may be a generalized Mahalanobis distance.
  • the distance value between the feature vector to be recognized and the preset feature vector of the preset expression can be determined by the following formula:
  • y is the feature vector to be recognized
  • x j is the preset feature vector of the j-th preset expression in the preset expression library
  • M is the target metric matrix
  • j is an integer greater than or equal to 1
  • d M (y, x j ) is the distance value between the feature vector to be recognized and the preset feature vector of the j-th preset expression in the preset expression library
  • (yx j ) is the prediction of the feature vector to be recognized and the j-th preset expression Set the difference of the feature vector
  • (yx j ) T is the transpose of the difference between the feature vector to be recognized and the preset feature vector of the j-th preset expression.
  • the acquiring module 105 may determine the similarity probability that the preset expression corresponding to the face image and the distance value belongs to the same expression by using the following formula:
  • p is the similarity probability that the preset expression corresponding to the face image and the distance value belongs to the same kind of expression
  • D is the distance value
  • b is the offset.
  • the determining module 106 is configured to determine the state change information of the preset key feature points of the face image based on the initial state information.
  • the determining module 106 may determine the preset key feature points of the face image based on the initial state information Status change information.
  • the state change information is based on the initial state information, for example, starting from the initial state information, state change information within a preset time.
  • the control module 107 triggers the execution of the control instruction corresponding to the first preset state change information The corresponding control operation.
  • the control module 107 triggers the first preset state change Information corresponding to a control instruction, and the terminal device will perform a corresponding control operation according to the control instruction. For example, when the acquired state change information of the preset key feature point is an action of turning the head to the left, the terminal device executes the previous page control instruction, and when the acquired state change information of the preset key feature point is When the state change information is the action of the head deflecting to the right, the terminal device executes the next page control instruction, and when the acquired state change information of the preset key feature point is a nod, the terminal device performs playback or Pause instruction.
  • the determining module 106 needs to determine whether the state change information of the preset key feature point is a valid state change message, which can be specifically implemented in the following manner: When the state change information of the feature point is the first preset state change information in the preset state change information database, judging whether the state change information of the preset key feature point is valid state change information; and when the prediction When the state change information of the key feature point is the effective state change information, the control instruction corresponding to the first preset state change information is triggered to execute the corresponding control operation.
  • the average deflection speed and/or deflection angle of the face image in the state change process is acquired, so as The average deflection speed and/or the deflection angle determine whether the state change information of the preset key feature point is valid state change information. For example, if the initial state information is head movement, the average head deflection speed and/or deflection angle during the state change can be obtained to determine the current state change information of the preset key feature points Whether it is valid status change information.
  • a preset speed can be set Value to avoid wrong control of terminal equipment. For example, it can be judged whether the average head movement speed during the current state change is less than the first preset speed value, and if the average head movement speed is less than the first preset speed value, then the current time of the preset key feature points can be determined
  • the state change information is the effective state change information, and the corresponding control instruction is generated based on the effective state change information.
  • the preset speed value may have a positive or negative deviation of 30%.
  • the determining module 106 may also determine whether the current state change information of the preset key feature point is valid state change information by determining whether the deflection angle of the head is greater than or equal to the first angle threshold. If the deflection angle of the head is greater than or equal to the first angle threshold, it is determined that the current state change information of the preset key feature point is effective state change information, and the corresponding control instruction is generated based on the effective state change information. If the deflection of the head If the angle is less than the first angle threshold, it is determined that the current state change information of the preset key feature point is invalid state change information.
  • the first angle threshold may be set to a larger angle value than the deflection angle generated by the user's usual communication.
  • the current state change information of the preset key feature points can also be determined by simultaneously determining whether the average head movement speed is less than the first preset speed value and whether the deflection angle of the head is greater than or equal to the first angle threshold. Whether it is valid status change information.
  • the expression duration of the face image in the state change process can be obtained to determine the expression duration according to the expression duration. It is determined whether the state change information of the preset key feature point is valid state change information. For example, it is possible to determine whether the current state change information of the preset key feature point is valid state change information by obtaining whether the duration of the facial expression during the state change is greater than or equal to a preset time. If the duration of the facial expression during the current state change is greater than or equal to the preset time, it is determined that the current state change information is valid state change information, and the corresponding control command is generated based on the effective state change information. When the duration of the facial expression during the state change is less than the preset time, it is determined that the current state change information is invalid state change information, and no corresponding control instruction is generated.
  • the determination module 106 may also be based on the difference between the time node of the current state change information of the preset key feature point and the time node of the control instruction generated by the preset key feature point last time. To determine whether the current state change information of the preset key feature point is valid state change information. For example, acquiring the occurrence time of the state change information of the preset key feature point, and determining whether the difference between the occurrence time of the state change information of the preset key feature point and the occurrence time of the previous state change information is greater than or equal to a preset time.
  • the current state change information is determined to be valid state change information, and generated based on the effective state change information Corresponding control instruction; if the difference between the occurrence time of this state change information and the occurrence time of the previous state change information is less than the preset time, it is determined that the current state change information is invalid state change information, and no corresponding Control instruction.
  • the above-mentioned terminal device control device realizes control of the terminal device by recognizing the user's facial expression change or head deflection state, freeing the user's hands, compared with the traditional manual operation method, interacting with the terminal device more vividly and interestingly, and improving the user experience.
  • Fig. 4 is a schematic diagram of a preferred embodiment of the computer equipment of this application.
  • the computer device 1 includes a memory 20, a processor 30, and computer readable instructions 40 stored in the memory 20 and running on the processor 30, such as a terminal device control program.
  • the processor 30 executes the computer-readable instruction 40, the steps in the above embodiment of the terminal device control method are implemented, such as steps S11 to S15 shown in FIG. 1 or steps S11 to S17 shown in FIG.
  • the processor 30 executes the computer-readable instructions 40, the functions of the modules in the foregoing terminal device control apparatus embodiments, such as modules 101 to 107 in FIG. 3, are realized.
  • the computer-readable instructions 40 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 20 and executed by the processor 30, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 40 in the computer device 1.
  • the computer-readable instruction 40 may be divided into the configuration module 101, the association module 102, the detection module 103, the judgment module 104, the acquisition module 105, the determination module 106, and the control module 107 in FIG. Refer to the second embodiment for the specific functions of each module.
  • the computer device 1 may be a computing device such as a desktop computer, a notebook, a palmtop computer, a mobile phone, a tablet computer, and a cloud server.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, a mobile phone, a tablet computer, and a cloud server.
  • the schematic diagram is only an example of the computer device 1 and does not constitute a limitation on the computer device 1. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. Components, for example, the computer device 1 may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 30 may be a central processing unit (Central Processing Unit, CPU), other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor, or the processor 30 may also be any conventional processor, etc.
  • the processor 30 is the control center of the computer device 1 and connects the entire computer device 1 with various interfaces and lines. Parts.
  • the memory 20 may be used to store the computer-readable instructions 40 and/or modules/units, and the processor 30 can run or execute the computer-readable instructions and/or modules/units stored in the memory 20, and
  • the data stored in the memory 20 is called to realize various functions of the computer device 1.
  • the memory 20 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data) created in accordance with the use of the computer device 1 and the like are stored.
  • the memory 20 may include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card (Flash Card), At least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card (Flash Card), At least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • the integrated modules/units of the computer device 1 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a non-volatile readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions.
  • the computer-readable instructions can be stored in a non-volatile memory. In the read storage medium, when the computer-readable instructions are executed by the processor, the steps of the foregoing method embodiments can be implemented.
  • the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the non-volatile readable medium may include: any entity or device capable of carrying the computer readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory).
  • the functional units in the various embodiments of the present application may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional modules.

Abstract

一种终端设备控制方法,涉及人脸识别领域,该方法包括:获取待识别图像,并对待识别图像进行人脸检测(S11);判断是否检测到人脸图像(S12);若检测到人脸图像,则获取人脸图像的预设关键特征点的初始状态信息(S13);基于初始状态信息确定人脸图像的预设关键特征点的状态变化信息(S14);当预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发第一预设状态变化信息对应的控制指令执行相应的控制操作(S15)。还提供一种终端设备控制装置、计算机设备及非易失性可读存储介质。可实现与终端设备交互更生动有趣,提高了用户使用体验。

Description

终端设备控制方法、装置、计算机设备及可读存储介质
本申请要求于2019年06月05日提交中国专利局,申请号为201910487841.2发明名称为“终端设备控制方法、装置、计算机装置及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及电子通信技术领域,具体涉及一种终端设备控制方法、装置、计算机设备及非易失性可读存储介质。
背景技术
随着通讯技术的发展,电脑、手机等设备的使用已经越来越普及。目前对电脑、手机的操作均是通过按键操作或者触摸操作完成,然而,无论是按键操作还是触摸操作均需要通过手动操作完成。这种手动操作方式太过单一,可能会对用户使用过程造成不便,且应用范围有限,影响用户使用体验。
发明内容
鉴于以上内容,有必要提出一种终端设备控制方法、装置、计算机设备及非易失性可读存储介质,其无需通过手动操作即可实现对终端设备进行控制,提高了用户使用体验。
本申请一实施方式提供一种终端设备控制方法,所述方法包括:获取待识别图像,并对所述待识别图像进行人脸检测;判断是否检测到人脸图像;若检测到人脸图像,则获取所述人脸图像的预设关键特征点的初始状态信息;基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息;及当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
本申请一实施方式提供一种终端设备控制装置,所述装置包括:检测模块,用于获取待识别图像,并对所述待识别图像进行人脸检测;判断模块,用于判断是否检测到人脸图像;获取模块,用于在检测到人脸图像时,获取所述人脸图像的预设关键特征点的初始状态信息;确定模块,用于基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息;及控制模块,用于在所述预设关键特征点的状态变化信息为预设状态变化信息库 中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
本申请一实施方式提供一种计算机设备,所述计算机设备包括处理器及存储器,所述存储器上存储有若干计算机可读指令,所述处理器用于执行存储器中存储的计算机可读指令时实现如前面所述的终端设备控制方法的步骤。
本申请一实施方式提供一种非易失性可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如前面所述的终端设备控制方法的步骤。
上述终端设备控制方法、装置、计算机设备及非易失性可读存储介质,通过识别用户的表情变化或者头部偏转状态来实现控制终端设备,解放用户的双手,相对传统的手动操作方式,与终端设备交互更生动有趣,提高了用户使用体验。
附图说明
图1为本申请一实施例中终端设备控制方法的步骤流程图。
图2为本申请另一实施例中终端设备控制方法的步骤流程图。
图3为本申请一实施例中终端设备控制装置的功能模块图。
图4为本申请一实施例中计算机设备示意图。
如下具体实施方式将结合上述附图进一步说明本申请。
具体实施方式
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施例对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本申请,所描述的实施方式仅仅是本申请一部分实施方式,而不是全部的实施方式。基于本申请中的实施方式,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施方式,都属于本申请保护的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施方式的目的,不是旨在于限制本申请。
优选地,本申请的表情交互方法应用在一个或者多个计算机设备中。所述计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不 限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本电脑、平板电脑、服务器、手机等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
实施例一:
图1是本申请终端设备控制方法较佳实施例的步骤流程图。根据不同的需求,所述流程图中步骤的顺序可以改变,某些步骤可以省略。
参阅图1所示,所述终端设备控制方法具体包括以下步骤。
步骤S11、获取待识别图像,并对所述待识别图像进行人脸检测。
在一实施方式中,可以通过与摄像头(比如所述计算机设备的摄像头)进行通信来获取待识别图像,所述待识别图像可能包含有非人脸图像,故需要对所述待识别图像进行人脸检测,以识别出所述待识别图像中包含有人脸的人脸图像。
在一实施方式中,可以通过建立并训练一卷积神经网络模型来实现对所述待识别图像进行人脸检测。具体地,可以通过以下方式来实现对所述待识别图像进行人脸检测:可以先构建人脸样本数据库并建立一用于进行人脸检测的卷积神经网络模型,所述人脸样本数据库包含多个人的人脸信息,每个人的人脸信息可以包括多种角度,每种角度的人脸信息可以有多张图片;将人脸样本数据库中的人脸图像输入至所述卷积神经网络模型,使用卷积神经网络模型的默认参数进行卷积神经网络训练;根据训练中间结果,对默认参数的初始权值、训练速率、迭代次数等进行不断调整,直到得到最优的卷积神经网络模型的网络参数,最后将具有最优网络参数的卷积神经网络模型作为最终的识别模型,训练完成后,即可利用该最终得到的卷积神经网络模型进行人脸检测。
可以理解的,可以将所述待识别图像输入至该最终得到的卷积神经网络模型,模型的输出即为人脸检测结果。
步骤S12、判断是否检测到人脸图像。
在一实施方式中,可以根据所述卷积神经网络模型的输出来判断是否检测到人脸图像。若检测到人脸图像,则跳转至步骤S13。若未检测到人脸图像,则返回至步骤S11。
步骤S13,若检测到人脸图像,则获取所述人脸图像的预设关键特征点的初始状态信息。
在一实施方式中,所述人脸图像的预设关键特征点可以由眼睛、鼻子、嘴巴等部分构成。所述初始状态信息可以包括初始位置信息或者初始表情信息;当所述初始状态信息为所述预设关键特征点的初始位置信息时,可以实现根据所述人脸图像的运动状态信息执行相应的控制操作;当所述初始状态信息为所述预设关键特征点的初始表情信息时,可以实现根 据所述人脸图像的表情变化信息执行相应的控制操作。
在一实施方式中,当所述初始状态信息为初始位置信息时,所述人脸图像的预设关键特征点的初始状态信息即为所述人脸图像的预设关键特征点的初始状态的位置信息。所述人脸图像的预设关键特征点的位置信息可以通过积分投影方式或者人脸对齐算法(比如:ASM算法、AAM算法、STASM算法等)从人脸图像中确定出。由于眼睛是人脸当中比较突出的人脸特征,可以先对眼睛进行精确定位,则脸部其他器官,如:眼眉、嘴巴、鼻子等,可以由潜在的分布关系得出比较准确的定位。
举例而言,预设关键特征点的位置定位通过对应于不同积分投影方式下产生的波峰或波谷进行。其中,积分投影分为垂直投影和水平投影,设f(x,y)表示图像(x,y)处的灰度值,在图像[y1,y2]和[x1,x2]区域的水平积分投影M h(y)和垂直积分投影M v(x)分别表示为:
Figure PCTCN2019118974-appb-000001
其中,水平积分投影即将一行所有像素点的灰度值进行累加后再显示,而垂直积分投影即将一列所有像素点的灰度值进行累加后再显示。通过定位两个波谷点x1、x2从人脸图像中把横轴[x1,x2]区域的图像截取出来,即可实现人脸图像左右边界的定位。对左右边界定位后二值化待识别人脸图像,分别进行水平积分投影和垂直积分投影。
进一步的,利用对人脸图像的先验知识可知,眉毛和眼睛是人脸图像中较近的黑色区域,其对应着水平积分投影曲线上的前两个极小值点。第一个极小值点对应的是眉毛在纵轴上的位置,记做y brow,第二个极小值点对应的是眼睛在纵轴上的位置,记做y eye,第三个极小值点对应的是鼻子在纵轴上的位置,记做y nose,第四个极小值点对应的是嘴巴在纵轴上的位置,记做y month。同样,人脸图像中心对称轴两侧出现两个极小值点,分别对应左右眼在横轴上的位置,记做x left-eye、x right-eye;眉毛在横轴上的位置和眼睛相同;嘴巴和鼻子在横轴上的位置为(x left-eye+x right-eye)/2。
在一实施方式中,当所述初始状态信息为初始表情信息时,所述人脸图像的预设关键特征点的初始状态信息即为所述人脸图像的预设关键特征点的初始状态的表情信息。所述人脸表情信息比如可以具有以下表现形式:高兴时的面部动作:嘴角翘起、面颊上抬起皱、眼睑收缩、眼睛尾部会形成“鱼尾纹”。伤心时的面部特征:眯眼、眉毛收紧、嘴角下拉、下巴抬起或收紧。害怕时的面部特征:嘴巴和眼睛张开、眉毛上扬、鼻孔张大。愤怒时的面部特征:眉毛下垂、前额紧皱、眼睑和嘴唇紧张。厌恶时的面部特征:嗤鼻、上嘴唇上抬、眉毛下垂、眯眼。惊讶时的面部特征:下颚下垂、嘴唇和嘴巴放松、眼睛张大、眼睑和眉毛微抬。轻蔑时的面部特征:嘴角一侧抬起、作讥笑或得意笑状等。
可以通过提取所述预设关键特征点的待识别特征向量,并根据所述待识别特征向量和预设表情库中的每个预设表情的预设特征向量,确定所述人脸图像与每个所述预设表情属于相似概率,进而根据计算得到的相似概率来得到人脸表情信息。其中所述待识别特征向量可以包括形状特征向量和/或纹理特征向量。
在一实施方式中,当所述待识别特征向量为形状特征向量时,则提取所述预设关键特征点中的形状特征向量;当所述待识别特征向量为纹理特征向量时,则提取所述预设关键特征点中的纹理特征向量;当所述待识别特征向量为形状特征向量和纹理特征向量时,则提取所述预设关键特征点中的形状特征向量和纹理特征向量。
在一实施方式中,可以通过以下方式来确定所述人脸图像与每个所述预设表情的相似概率:获取待识别特征向量和每个预设表情的预设特征向量之间的距离值;根据距离值确定所述人脸图像与距离值对应的预设表情属于同种表情的相似概率。其中,所述距离值可以为广义马氏距离。可以通过如下公式确定待识别特征向量和预设表情的预设特征向量之间的距离值:
d M(y,x j)=(y-x j) T*M*(y-x j);
其中,y为待识别特征向量,x j为预设表情库中的第j个预设表情的预设特征向量,M为目标度量矩阵;j为大于或者等于1的整数;d M(y,x j)为待识别特征向量和预设表情库中的第j个预设表情的预设特征向量之间的距离值;(y-x j)为待识别特征向量与第j个预设表情的预设特征向量的差值;(y-x j) T为待识别特征向量与第j个预设表情的预设特征向量的差值的转置。
在一个实施例中,可以通过如下公式确定所述人脸图像与距离值对应的预设表情属于同种表情的相似概率:
p={1+exp[D-b]} -1
其中,p为所述人脸图像与距离值对应的预设表情属于同种表情的相似概率;D为距离值;b为偏置量。
步骤S14、基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息。
在一实施方式中,当获取到所述人脸图像的预设关键特征点的初始状态信息后,可以基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息。所述状态变化信息是以所述初始状态信息为基准,比如从所述初始状态信息开始计时,预设时间内的状态变化信息。
步骤S15、当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设 状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
在一实施方式中,当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令,进而所述终端设备会根据所述控制指令执行相应的控制操作。比如,当获取到的所述预设关键特征点的状态变化信息为头部向左偏转的动作时,所述终端设备执行上一页控制指令,当获取到的所述预设关键特征点的状态变化信息为头部向右偏转的动作时,所述终端设备执行下一页控制指令,当获取到的所述预设关键特征点的状态变化信息为点头时,所述终端设备执行播放或者暂停指令。
在一实施方式中,为了提高操作准确性,所述步骤S15可进一步包括:当所述预设关键特征点的状态变化信息为所述预设状态变化信息库中的第一预设状态变化信息时,判断所述预设关键特征点的状态变化信息是否为有效状态变化信息;及当所述预设关键特征点的状态变化信息为有效状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
在一实施方式中,当所述初始状态信息为所述预设关键特征点的初始位置信息时,获取所述人脸图像在状态变化过程中的平均偏转速度和/或偏转角度,以根据所述平均偏转速度和/或所述偏转角度判断所述预设关键特征点的状态变化信息是否为有效状态变化信息。比如,若所述初始状态信息为头部运动动作,则可以通过获取在状态变化过程中的头部平均偏转速度和/或偏转角度,来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息。
举例而言,正常情况下,当一用户偏头与人沟通、偏头观看事务或者沟通点头进行确认时,其头部运动速度一般比较快,为了避免发生误控制,可以设定一预设速度值来避免发生误控制终端设备。比如,可以判断本次状态变化过程中头部运动平均速度是否小于第一预设速度值,若头部运动平均速度小于第一预设速度值,则判定所述预设关键特征点的本次状态变化信息为有效状态变化信息,基于该有效状态变化信息生成对应的控制指令,若头部运动平均速度不小于第一预设速度值,则判定所述预设关键特征点的本次状态变化信息为无效状态变化信息,不产生对应的控制指令。所述预设速度值可以具有30%的正负偏差。
在一实施方式中,还可以通过判断头部的偏转角度是否大于等于第一角度阈值来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息,若头部的偏转角度大于等于第一角度阈值,则判定所述预设关键特征点的本次状态变化信息为有效状态变化信息,基于该有效状态变化信息生成对应的控制指令,若头部的偏转角度小于第一角度阈值,则判定所述预设关键特征点的本次状态变化信息为无效状态变化信息。所述第一角度阈值 可以设置成比用户平常沟通所产生的偏转角度更大的角度值。
可以理解的,还可以通过同时判断头部运动平均速度是否小于第一预设速度值且头部的偏转角度是否大于等于第一角度阈值来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息。
在一实施方式中,当所述初始状态信息为所述预设关键特征点的初始表情信息时,获取所述人脸图像在状态变化过程中的表情持续时间,以根据所述表情持续时间判断所述预设关键特征点的状态变化信息是否为有效状态变化信息。比如,可以通过获取在状态变化过程中的脸部表情的持续时间是否大于等于预设时间,来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息。若在本次状态变化过程中的脸部表情的持续时间大于等于预设时间,则判断本次状态变化信息为有效状态变化信息,基于该有效状态变化信息生成对应的控制指令,若在本次状态变化过程中的脸部表情的持续时间小于预设时间,则判断本次状态变化信息为无效状态变化信息,不产生对应的控制指令。
在一实施方式中,还可以基于所述预设关键特征点的本次状态变化信息的时间节点与上一次通过所述预设关键特征点而产生的控制指令的时间节点之间的差值,来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息。比如,获取所述预设关键特征点的状态变化信息的发生时刻,判断所述预设关键特征点的状态变化信息的发生时间与上一状态变化信息的发生时间的差值是否大于等于预设时间。若本次状态变化信息的发生时间与上一状态变化信息的发生时间的差值大于等于所述预设时间,则判定本次状态变化信息为有效状态变化信息,并基于该有效状态变化信息生成对应的控制指令;若本次状态变化信息的发生时间与上一状态变化信息的发生时间的差值小于所述预设时间,则判定本次状态变化信息为无效状态变化信息,不产生对应的控制指令。
请参阅图2,与图1示出的终端设备控制方法相比,图2示处的终端设备控制方法还包括步骤S16及步骤S17。
步骤S16、配置所述预设关键特征点的上下左右检测边界信息,以建立得到一特征点检测框。
在一实施方式中,所述特征点检测框即用于检测所述预设关键特征点的状态变化信息。通过配置上下左关键特征点的检测边界信息,可以建立得到一特征点检测框,在进行人脸行为信息(头部偏转、面部表情)检测时,需确保人脸的关键特征点始终落入在所述特征点检测框内,避免影响检测准确性。
步骤S17、将所述预设关键特征点的多个预设状态变化信息与多个预设控制指令相关联。
在一实施方式中,可以预先建立多个预设状态变化信息与终端设备的多个预设控制指令的映射关系。比如,预先将预设第一状态变化信息与终端设备的第一预设控制指令相关联,将第二预设状态变化信息与终端设备的第二预设控制指令相关联,将第三预设状态变化信息与终端设备的第三预设控制指令相关联。所述预设控制指令可以为终端设备中的一些常用指令,例如下一页、上一页、播放、暂停、鼠标左键、鼠标右键等。比如预设第一状态变化信息为头部向右偏转对应下一页控制指令,预设第二状态变化信息为头部向左偏转对应上一页控制指令,预设第三状态变化信息为点头对应播放或者暂停控制指令;再比如预设第一状态变化信息为面部无表情转变到高兴表情对应下一页控制指令,预设第二状态变化信息为面部无表情转变到伤心表情对应上一页控制指令。
上述终端设备控制方法,通过识别用户的表情变化或者头部偏转状态来实现控制终端设备,解放用户的双手,相对传统的手动操作方式,与终端设备交互更生动有趣,提高了用户使用体验。
实施例二:
图3为本申请终端设备控制装置较佳实施例的功能模块图。
参阅图3所示,所述终端设备控制装置10可以包括配置模块101、关联模块102、检测模块103、判断模块104、获取模块105、确定模块106及控制模块107。
所述配置模块101用于配置所述预设关键特征点的上下左右检测边界信息,以建立得到一特征点检测框。
在一实施方式中,所述特征点检测框即用于检测所述预设关键特征点的状态变化信息。所述配置模块101通过配置上下左关键特征点的检测边界信息,可以建立得到一特征点检测框,在进行人脸行为信息(头部偏转、面部表情)检测时,需确保人脸的关键特征点始终落入在所述特征点检测框内,避免影响检测准确性。
所述关联模块102用于将所述预设关键特征点的多个预设状态变化信息与多个预设控制指令相关联。
在一实施方式中,所述关联模块102可以预先建立多个预设状态变化信息与终端设备的多个预设控制指令的映射关系。比如,所述关联模块102预先将预设第一状态变化信息与终端设备的第一预设控制指令相关联,将第二预设状态变化信息与终端设备的第二预设控制指令相关联,将第三预设状态变化信息与终端设备的第三预设控制指令相关联。所述预设控制指令可以为终端设备中的一些常用指令,例如下一页、上一页、播放、暂停、鼠标左键、鼠标右键等。比如预设第一状态变化信息为头部向右偏转对应下一页控制指令,预设第二状态变化信息为头部向左偏转对应上一页控制指令,预设第三状态变化信息为点头对 应播放或者暂停控制指令;再比如预设第一状态变化信息为面部无表情转变到高兴表情对应下一页控制指令,预设第二状态变化信息为面部无表情转变到伤心表情对应上一页控制指令。
所述检测模块103用于获取待识别图像,并对所述待识别图像进行人脸检测。
在一实施方式中,所述检测模块103可以通过与摄像头(比如所述计算机设备的摄像头)进行通信来获取待识别图像,所述待识别图像可能包含有非人脸图像,故需要对所述待识别图像进行人脸检测,以识别出所述待识别图像中包含有人脸的人脸图像。
在一实施方式中,所述检测模块103可以通过建立并训练一卷积神经网络模型来实现对所述待识别图像进行人脸检测。具体地,可以通过以下方式来实现对所述待识别图像进行人脸检测:可以先构建人脸样本数据库并建立一用于进行人脸检测的卷积神经网络模型,所述人脸样本数据库包含多个人的人脸信息,每个人的人脸信息可以包括多种角度,每种角度的人脸信息可以有多张图片;将人脸样本数据库中的人脸图像输入至所述卷积神经网络模型,使用卷积神经网络模型的默认参数进行卷积神经网络训练;根据训练中间结果,对默认参数的初始权值、训练速率、迭代次数等进行不断调整,直到得到最优的卷积神经网络模型的网络参数,最后将具有最优网络参数的卷积神经网络模型作为最终的识别模型,训练完成后,即可利用该最终得到的卷积神经网络模型进行人脸检测。
可以理解的,所述检测模块103可以将所述待识别图像输入至该最终得到的卷积神经网络模型,模型的输出即为人脸检测结果。
所述判断模块104用于判断是否检测到人脸图像。
在一实施方式中,所述判断模块104可以根据所述卷积神经网络模型的输出来判断是否检测到人脸图像。若检测到人脸图像,则进行后续关键特征点识别。若未检测到人脸图像,则重新对所述待识别图像进行人脸检测。
所述获取模块105用于在检测到人脸图像时,获取所述人脸图像的预设关键特征点的初始状态信息。
在一实施方式中,所述人脸图像的预设关键特征点可以由眼睛、鼻子、嘴巴等部分构成。所述初始状态信息可以包括初始位置信息或者初始表情信息;当所述初始状态信息为所述预设关键特征点的初始位置信息时,可以实现根据所述人脸图像的运动状态信息执行相应的控制操作;当所述初始状态信息为所述预设关键特征点的初始表情信息时,可以实现根据所述人脸图像的表情变化信息执行相应的控制操作。
在一实施方式中,当所述初始状态信息为初始位置信息时,所述人脸图像的预设关键特征点的初始状态信息即为所述人脸图像的预设关键特征点的初始状态的位置信息。所述人 脸图像的预设关键特征点的位置信息可以通过积分投影方式或者人脸对齐算法(比如:ASM算法、AAM算法、STASM算法等)从人脸图像中确定出。由于眼睛是人脸当中比较突出的人脸特征,可以先对眼睛进行精确定位,则脸部其他器官,如:眼眉、嘴巴、鼻子等,可以由潜在的分布关系得出比较准确的定位。
举例而言,预设关键特征点的位置定位通过对应于不同积分投影方式下产生的波峰或波谷进行。其中,积分投影分为垂直投影和水平投影,设f(x,y)表示图像(x,y)处的灰度值,在图像[y1,y2]和[x1,x2]区域的水平积分投影M h(y)和垂直积分投影M v(x)分别表示为:
Figure PCTCN2019118974-appb-000002
其中,水平积分投影即将一行所有像素点的灰度值进行累加后再显示,而垂直积分投影即将一列所有像素点的灰度值进行累加后再显示。通过定位两个波谷点x1、x2从人脸图像中把横轴[x1,x2]区域的图像截取出来,即可实现人脸图像左右边界的定位。对左右边界定位后二值化待识别人脸图像,分别进行水平积分投影和垂直积分投影。
进一步的,利用对人脸图像的先验知识可知,眉毛和眼睛是人脸图像中较近的黑色区域,其对应着水平积分投影曲线上的前两个极小值点。第一个极小值点对应的是眉毛在纵轴上的位置,记做y brow,第二个极小值点对应的是眼睛在纵轴上的位置,记做y eye,第三个极小值点对应的是鼻子在纵轴上的位置,记做y nose,第四个极小值点对应的是嘴巴在纵轴上的位置,记做y month。同样,人脸图像中心对称轴两侧出现两个极小值点,分别对应左右眼在横轴上的位置,记做x left-eye、x right-eye;眉毛在横轴上的位置和眼睛相同;嘴巴和鼻子在横轴上的位置为(x left-eye+x right-eye)/2。
在一实施方式中,当所述初始状态信息为初始表情信息时,所述人脸图像的预设关键特征点的初始状态信息即为所述人脸图像的预设关键特征点的初始状态的表情信息。所述人脸表情信息比如可以具有以下表现形式:高兴时的面部动作:嘴角翘起、面颊上抬起皱、眼睑收缩、眼睛尾部会形成“鱼尾纹”。伤心时的面部特征:眯眼、眉毛收紧、嘴角下拉、下巴抬起或收紧。害怕时的面部特征:嘴巴和眼睛张开、眉毛上扬、鼻孔张大。愤怒时的面部特征:眉毛下垂、前额紧皱、眼睑和嘴唇紧张。厌恶时的面部特征:嗤鼻、上嘴唇上抬、眉毛下垂、眯眼。惊讶时的面部特征:下颚下垂、嘴唇和嘴巴放松、眼睛张大、眼睑和眉毛微抬。轻蔑时的面部特征:嘴角一侧抬起、作讥笑或得意笑状等。
所述获取模块105可以通过提取所述预设关键特征点的待识别特征向量,并根据所述待识别特征向量和预设表情库中的每个预设表情的预设特征向量,确定所述人脸图像与每个所述预设表情属于相似概率,进而根据计算得到的相似概率来得到人脸表情信息。其中 所述待识别特征向量可以包括形状特征向量和/或纹理特征向量。
在一实施方式中,当所述待识别特征向量为形状特征向量时,则提取所述预设关键特征点中的形状特征向量;当所述待识别特征向量为纹理特征向量时,则提取所述预设关键特征点中的纹理特征向量;当所述待识别特征向量为形状特征向量和纹理特征向量时,则提取所述预设关键特征点中的形状特征向量和纹理特征向量。
在一实施方式中,所述获取模块105可以通过以下方式来确定所述人脸图像与每个所述预设表情的相似概率:获取待识别特征向量和每个预设表情的预设特征向量之间的距离值;根据距离值确定所述人脸图像与距离值对应的预设表情属于同种表情的相似概率。其中,所述距离值可以为广义马氏距离。可以通过如下公式确定待识别特征向量和预设表情的预设特征向量之间的距离值:
d M(y,x j)=(y-x j) T*M*(y-x j);
其中,y为待识别特征向量,x j为预设表情库中的第j个预设表情的预设特征向量,M为目标度量矩阵;j为大于或者等于1的整数;d M(y,x j)为待识别特征向量和预设表情库中的第j个预设表情的预设特征向量之间的距离值;(y-x j)为待识别特征向量与第j个预设表情的预设特征向量的差值;(y-x j) T为待识别特征向量与第j个预设表情的预设特征向量的差值的转置。
在一个实施例中,所述获取模块105可以通过如下公式确定所述人脸图像与距离值对应的预设表情属于同种表情的相似概率:
p={1+exp[D-b]} -1
其中,p为所述人脸图像与距离值对应的预设表情属于同种表情的相似概率;D为距离值;b为偏置量。
所述确定模块106用于基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息。
在一实施方式中,当获取到所述人脸图像的预设关键特征点的初始状态信息后,所述确定模块106可以基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息。所述状态变化信息是以所述初始状态信息为基准,比如从所述初始状态信息开始计时,预设时间内的状态变化信息。
当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,所述控制模块107触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
在一实施方式中,当所述预设关键特征点的状态变化信息为预设状态变化信息库中的 第一预设状态变化信息时,所述控制模块107触发所述第一预设状态变化信息对应的控制指令,进而所述终端设备会根据所述控制指令执行相应的控制操作。比如,当获取到的所述预设关键特征点的状态变化信息为头部向左偏转的动作时,所述终端设备执行上一页控制指令,当获取到的所述预设关键特征点的状态变化信息为头部向右偏转的动作时,所述终端设备执行下一页控制指令,当获取到的所述预设关键特征点的状态变化信息为点头时,所述终端设备执行播放或者暂停指令。
在一实施方式中,为了提高操作准确性,所述确定模块106需确定所述预设关键特征点的状态变化信息是否为有效状态变化消息,具体可以通过以下方式实现:当所述预设关键特征点的状态变化信息为所述预设状态变化信息库中的第一预设状态变化信息时,判断所述预设关键特征点的状态变化信息是否为有效状态变化信息;及当所述预设关键特征点的状态变化信息为有效状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
在一实施方式中,当所述初始状态信息为所述预设关键特征点的初始位置信息时,获取所述人脸图像在状态变化过程中的平均偏转速度和/或偏转角度,以根据所述平均偏转速度和/或所述偏转角度判断所述预设关键特征点的状态变化信息是否为有效状态变化信息。比如,若所述初始状态信息为头部运动动作,则可以通过获取在状态变化过程中的头部平均偏转速度和/或偏转角度,来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息。
举例而言,正常情况下,当一用户偏头与人沟通、偏头观看事务或者沟通点头进行确认时,其头部运动速度一般比较快,为了避免发生误控制,可以设定一预设速度值来避免发生误控制终端设备。比如,可以判断本次状态变化过程中头部运动平均速度是否小于第一预设速度值,若头部运动平均速度小于第一预设速度值,则判定所述预设关键特征点的本次状态变化信息为有效状态变化信息,基于该有效状态变化信息生成对应的控制指令,若头部运动平均速度不小于第一预设速度值,则判定所述预设关键特征点的本次状态变化信息为无效状态变化信息,不产生对应的控制指令。所述预设速度值可以具有30%的正负偏差。
在一实施方式中,所述确定模块106还可以通过判断头部的偏转角度是否大于等于第一角度阈值来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息,若头部的偏转角度大于等于第一角度阈值,则判定所述预设关键特征点的本次状态变化信息为有效状态变化信息,基于该有效状态变化信息生成对应的控制指令,若头部的偏转角度小于第一角度阈值,则判定所述预设关键特征点的本次状态变化信息为无效状态变化信息。所述第一角度阈值可以设置成比用户平常沟通所产生的偏转角度更大的角度值。
可以理解的,还可以通过同时判断头部运动平均速度是否小于第一预设速度值且头部的偏转角度是否大于等于第一角度阈值来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息。
在一实施方式中,当所述初始状态信息为所述预设关键特征点的初始表情信息时,可以获取所述人脸图像在状态变化过程中的表情持续时间,以根据所述表情持续时间判断所述预设关键特征点的状态变化信息是否为有效状态变化信息。比如,可以通过获取在状态变化过程中的脸部表情的持续时间是否大于等于预设时间,来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息。若在本次状态变化过程中的脸部表情的持续时间大于等于预设时间,则判断本次状态变化信息为有效状态变化信息,基于该有效状态变化信息生成对应的控制指令,若在本次状态变化过程中的脸部表情的持续时间小于预设时间,则判断本次状态变化信息为无效状态变化信息,不产生对应的控制指令。
在一实施方式中,所述确定模块106还可以基于所述预设关键特征点的本次状态变化信息的时间节点与上一次通过所述预设关键特征点而产生的控制指令的时间节点之间的差值,来判断所述预设关键特征点的本次状态变化信息是否为有效状态变化信息。比如,获取所述预设关键特征点的状态变化信息的发生时刻,判断所述预设关键特征点的状态变化信息的发生时间与上一状态变化信息的发生时间的差值是否大于等于预设时间。若本次状态变化信息的发生时间与上一状态变化信息的发生时间的差值大于等于所述预设时间,则判定本次状态变化信息为有效状态变化信息,并基于该有效状态变化信息生成对应的控制指令;若本次状态变化信息的发生时间与上一状态变化信息的发生时间的差值小于所述预设时间,则判定本次状态变化信息为无效状态变化信息,不产生对应的控制指令。
上述终端设备控制装置,通过识别用户的表情变化或者头部偏转状态来实现控制终端设备,解放用户的双手,相对传统的手动操作方式,与终端设备交互更生动有趣,提高了用户使用体验。
图4为本申请计算机设备较佳实施例的示意图。
所述计算机设备1包括存储器20、处理器30以及存储在所述存储器20中并可在所述处理器30上运行的计算机可读指令40,例如终端设备控制程序。所述处理器30执行所述计算机可读指令40时实现上述终端设备控制方法实施例中的步骤,例如图1所示的步骤S11~S15或图2所示的步骤S11~S17。或者,所述处理器30执行所述计算机可读指令40时实现上述终端设备控制装置实施例中各模块的功能,例如图3中的模块101~107。
示例性的,所述计算机可读指令40可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器20中,并由所述处理器30执行,以完成本申请。所述 一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,所述指令段用于描述所述计算机可读指令40在所述计算机设备1中的执行过程。例如,所述计算机可读指令40可以被分割成图3中的配置模块101、关联模块102、检测模块103、判断模块104、获取模块105、确定模块106及控制模块107。各模块具体功能参见实施例二。
所述计算机设备1可以是桌上型计算机、笔记本、掌上电脑、手机、平板电脑及云端服务器等计算设备。本领域技术人员可以理解,所述示意图仅仅是计算机设备1的示例,并不构成对计算机设备1的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述计算机设备1还可以包括输入输出设备、网络接入设备、总线等。
所称处理器30可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者所述处理器30也可以是任何常规的处理器等,所述处理器30是所述计算机设备1的控制中心,利用各种接口和线路连接整个计算机设备1的各个部分。
所述存储器20可用于存储所述计算机可读指令40和/或模块/单元,所述处理器30通过运行或执行存储在所述存储器20内的计算机可读指令和/或模块/单元,以及调用存储在存储器20内的数据,实现所述计算机设备1的各种功能。所述存储器20可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机设备1的使用所创建的数据(比如音频数据)等。此外,存储器20可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。
所述计算机设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性可读存储介质中,所述计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述非易失性可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、 磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。
在本申请所提供的几个实施例中,应该理解到,所揭露的计算机设备和方法,可以通过其它的方式实现。例如,以上所描述的计算机设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
另外,在本申请各个实施例中的各功能单元可以集成在相同处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在相同单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。计算机设备权利要求中陈述的多个单元或计算机设备也可以由同一个单元或计算机设备通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种终端设备控制方法,其特征在于,所述方法包括:
    获取待识别图像,并对所述待识别图像进行人脸检测;
    判断是否检测到人脸图像;
    若检测到人脸图像,则获取所述人脸图像的预设关键特征点的初始状态信息;
    基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息;及
    当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
  2. 如权利要求1所述的终端设备控制方法,其特征在于,所述获取待识别图像的步骤之前还包括:
    配置所述预设关键特征点的上下左右检测边界信息,以建立得到一特征点检测框;及
    将所述预设关键特征点的多个预设状态变化信息与多个预设控制指令相关联。
  3. 如权利要求1或2所述的终端设备控制方法,其特征在于,所述对所述待识别图像进行人脸检测的步骤包括:
    根据预设多个人脸样本训练得到用于进行人脸检测的卷积神经网络模型;及
    利用所述卷积神经网络模型对所述待识别图像进行人脸检测。
  4. 如权利要求1或2所述的终端设备控制方法,其特征在于,所述初始状态信息包括初始位置信息或者初始表情信息;当所述初始状态信息为所述预设关键特征点的初始位置信息时,以根据所述人脸图像的运动状态信息执行相应的控制操作;当所述初始状态信息为所述预设关键特征点的初始表情信息时,以根据所述人脸图像的表情变化信息执行相应的控制操作。
  5. 如权利要求4所述的终端设备控制方法,其特征在于,所述当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作的步骤包括:
    当所述预设关键特征点的状态变化信息为所述预设状态变化信息库中的第一预设状态变化信息时,判断所述预设关键特征点的状态变化信息是否为有效状态变化信息;及
    当所述预设关键特征点的状态变化信息为有效状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
  6. 如权利要求5所述的终端设备控制方法,其特征在于,所述判断所述预设关键特征 点的状态变化信息是否为有效状态变化信息的步骤包括:
    当所述初始状态信息为所述预设关键特征点的初始位置信息时,获取所述人脸图像在状态变化过程中的平均偏转速度和/或偏转角度,以根据所述平均偏转速度和/或所述偏转角度判断所述预设关键特征点的状态变化信息是否为有效状态变化信息;及
    当所述初始状态信息为所述预设关键特征点的初始表情信息时,获取所述人脸图像在状态变化过程中的表情持续时间,以根据所述表情持续时间判断所述预设关键特征点的状态变化信息是否为有效状态变化信息。
  7. 如权利要求5所述的终端设备控制方法,其特征在于,所述判断所述预设关键特征点的状态变化信息是否为有效状态变化信息的步骤包括:
    获取所述预设关键特征点的状态变化信息的发生时刻;
    判断所述预设关键特征点的状态变化信息的发生时刻与上一状态变化信息的发生时刻的差值是否大于等于预设时间;及
    根据所述判断结果确定所述预设关键特征点的状态变化信息是否为有效状态变化信息。
  8. 一种终端设备控制装置,其特征在于,所述装置包括:
    检测模块,用于获取待识别图像,并对所述待识别图像进行人脸检测;
    判断模块,用于判断是否检测到人脸图像;
    获取模块,用于在检测到人脸图像时,获取所述人脸图像的预设关键特征点的初始状态信息;
    确定模块,用于基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息;及
    控制模块,用于在所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
  9. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器用于存储计算机可读指令,所述处理器执行所述计算机可读指令以实现以下步骤:
    获取待识别图像,并对所述待识别图像进行人脸检测;
    判断是否检测到人脸图像;
    若检测到人脸图像,则获取所述人脸图像的预设关键特征点的初始状态信息;
    基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息;及
    当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
  10. 如权利要求9所述的计算机设备,其特征在于,在所述获取待识别图像的步骤之前,所述处理器执行所述计算机可读指令还用以实现以下步骤:
    配置所述预设关键特征点的上下左右检测边界信息,以建立得到一特征点检测框;及
    将所述预设关键特征点的多个预设状态变化信息与多个预设控制指令相关联。
  11. 如权利要求9或10所述的计算机设备,其特征在于,所述初始状态信息包括初始位置信息或者初始表情信息;当所述初始状态信息为所述预设关键特征点的初始位置信息时,以根据所述人脸图像的运动状态信息执行相应的控制操作;当所述初始状态信息为所述预设关键特征点的初始表情信息时,以根据所述人脸图像的表情变化信息执行相应的控制操作。
  12. 如权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令以实现所述当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作时,包括以下步骤:
    当所述预设关键特征点的状态变化信息为所述预设状态变化信息库中的第一预设状态变化信息时,判断所述预设关键特征点的状态变化信息是否为有效状态变化信息;及
    当所述预设关键特征点的状态变化信息为有效状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
  13. 如权利要求12所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令以实现所述判断所述预设关键特征点的状态变化信息是否为有效状态变化信息时,包括以下步骤:
    当所述初始状态信息为所述预设关键特征点的初始位置信息时,获取所述人脸图像在状态变化过程中的平均偏转速度和/或偏转角度,以根据所述平均偏转速度和/或所述偏转角度判断所述预设关键特征点的状态变化信息是否为有效状态变化信息;及
    当所述初始状态信息为所述预设关键特征点的初始表情信息时,获取所述人脸图像在状态变化过程中的表情持续时间,以根据所述表情持续时间判断所述预设关键特征点的状态变化信息是否为有效状态变化信息。
  14. 如权利要求12所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令以实现所述判断所述预设关键特征点的状态变化信息是否为有效状态变化信息时,包括以下步骤:
    获取所述预设关键特征点的状态变化信息的发生时刻;
    判断所述预设关键特征点的状态变化信息的发生时刻与上一状态变化信息的发生时刻 的差值是否大于等于预设时间;及
    根据所述判断结果确定所述预设关键特征点的状态变化信息是否为有效状态变化信息。
  15. 一种非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现以下步骤:
    获取待识别图像,并对所述待识别图像进行人脸检测;
    判断是否检测到人脸图像;
    若检测到人脸图像,则获取所述人脸图像的预设关键特征点的初始状态信息;
    基于所述初始状态信息确定所述人脸图像的预设关键特征点的状态变化信息;及
    当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
  16. 如权利要求15所述的存储介质,其特征在于,在所述获取待识别图像的步骤之前,所述计算机可读指令被所述处理器执行还用以实现以下步骤:
    配置所述预设关键特征点的上下左右检测边界信息,以建立得到一特征点检测框;及
    将所述预设关键特征点的多个预设状态变化信息与多个预设控制指令相关联。
  17. 如权利要求15或16所述的存储介质,其特征在于,所述初始状态信息包括初始位置信息或者初始表情信息;当所述初始状态信息为所述预设关键特征点的初始位置信息时,以根据所述人脸图像的运动状态信息执行相应的控制操作;当所述初始状态信息为所述预设关键特征点的初始表情信息时,以根据所述人脸图像的表情变化信息执行相应的控制操作。
  18. 如权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行以实现所述当所述预设关键特征点的状态变化信息为预设状态变化信息库中的第一预设状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作时,包括以下步骤:
    当所述预设关键特征点的状态变化信息为所述预设状态变化信息库中的第一预设状态变化信息时,判断所述预设关键特征点的状态变化信息是否为有效状态变化信息;及
    当所述预设关键特征点的状态变化信息为有效状态变化信息时,触发所述第一预设状态变化信息对应的控制指令执行相应的控制操作。
  19. 如权利要求18所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行以实现所述判断所述预设关键特征点的状态变化信息是否为有效状态变化信息时,包括以下步骤:
    当所述初始状态信息为所述预设关键特征点的初始位置信息时,获取所述人脸图像在 状态变化过程中的平均偏转速度和/或偏转角度,以根据所述平均偏转速度和/或所述偏转角度判断所述预设关键特征点的状态变化信息是否为有效状态变化信息;及
    当所述初始状态信息为所述预设关键特征点的初始表情信息时,获取所述人脸图像在状态变化过程中的表情持续时间,以根据所述表情持续时间判断所述预设关键特征点的状态变化信息是否为有效状态变化信息。
  20. 如权利要求18所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行以实现所述判断所述预设关键特征点的状态变化信息是否为有效状态变化信息的步骤包括:
    获取所述预设关键特征点的状态变化信息的发生时刻;
    判断所述预设关键特征点的状态变化信息的发生时刻与上一状态变化信息的发生时刻的差值是否大于等于预设时间;及
    根据所述判断结果确定所述预设关键特征点的状态变化信息是否为有效状态变化信息。
PCT/CN2019/118974 2019-06-05 2019-11-15 终端设备控制方法、装置、计算机设备及可读存储介质 WO2020244160A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910487841.2A CN110377201A (zh) 2019-06-05 2019-06-05 终端设备控制方法、装置、计算机装置及可读存储介质
CN201910487841.2 2019-06-05

Publications (1)

Publication Number Publication Date
WO2020244160A1 true WO2020244160A1 (zh) 2020-12-10

Family

ID=68249821

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118974 WO2020244160A1 (zh) 2019-06-05 2019-11-15 终端设备控制方法、装置、计算机设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN110377201A (zh)
WO (1) WO2020244160A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377201A (zh) * 2019-06-05 2019-10-25 平安科技(深圳)有限公司 终端设备控制方法、装置、计算机装置及可读存储介质
CN113504831A (zh) * 2021-07-23 2021-10-15 电光火石(北京)科技有限公司 基于人脸图像特征识别的iot设备控制方法、iot及终端设备
CN114549843B (zh) * 2022-04-22 2022-08-23 珠海视熙科技有限公司 频闪条纹检测及消除方法、装置、摄像设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090142029A1 (en) * 2007-12-03 2009-06-04 Institute For Information Industry Motion transition method and system for dynamic images
CN107562203A (zh) * 2017-09-14 2018-01-09 北京奇艺世纪科技有限公司 一种输入方法及装置
CN109819100A (zh) * 2018-12-13 2019-05-28 平安科技(深圳)有限公司 手机控制方法、装置、计算机装置及计算机可读存储介质
CN110377201A (zh) * 2019-06-05 2019-10-25 平安科技(深圳)有限公司 终端设备控制方法、装置、计算机装置及可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010142455A2 (en) * 2009-06-12 2010-12-16 Star Nav Method for determining the position of an object in an image, for determining an attitude of a persons face and method for controlling an input device based on the detection of attitude or eye gaze
CN105705993B (zh) * 2013-11-29 2019-09-06 英特尔公司 利用面部检测来控制摄像机
WO2017005070A1 (zh) * 2015-07-09 2017-01-12 重庆邮电大学 显示控制方法及装置
CN106371551A (zh) * 2015-07-20 2017-02-01 深圳富泰宏精密工业有限公司 人脸表情操作系统、方法及电子装置
CN106681509A (zh) * 2016-12-29 2017-05-17 北京七鑫易维信息技术有限公司 界面操作的方法和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090142029A1 (en) * 2007-12-03 2009-06-04 Institute For Information Industry Motion transition method and system for dynamic images
CN107562203A (zh) * 2017-09-14 2018-01-09 北京奇艺世纪科技有限公司 一种输入方法及装置
CN109819100A (zh) * 2018-12-13 2019-05-28 平安科技(深圳)有限公司 手机控制方法、装置、计算机装置及计算机可读存储介质
CN110377201A (zh) * 2019-06-05 2019-10-25 平安科技(深圳)有限公司 终端设备控制方法、装置、计算机装置及可读存储介质

Also Published As

Publication number Publication date
CN110377201A (zh) 2019-10-25

Similar Documents

Publication Publication Date Title
US10990803B2 (en) Key point positioning method, terminal, and computer storage medium
TWI751161B (zh) 終端設備、智慧型手機、基於臉部識別的認證方法和系統
WO2019174439A1 (zh) 图像识别方法、装置、终端和存储介质
WO2020244074A1 (zh) 表情交互方法、装置、计算机设备及可读存储介质
US10616475B2 (en) Photo-taking prompting method and apparatus, an apparatus and non-volatile computer storage medium
WO2020244160A1 (zh) 终端设备控制方法、装置、计算机设备及可读存储介质
US11074430B2 (en) Directional assistance for centering a face in a camera field of view
WO2022027912A1 (zh) 一种人脸姿态检测方法、装置、终端设备及存储介质
JP6374986B2 (ja) 顔認識方法、装置及び端末
WO2020140723A1 (zh) 人脸动态表情的检测方法、装置、设备及存储介质
WO2020019664A1 (zh) 基于人脸的形变图像生成方法和装置
WO2017070971A1 (zh) 一种面部验证的方法和电子设备
US11017253B2 (en) Liveness detection method and apparatus, and storage medium
WO2020244151A1 (zh) 图像处理方法、装置、终端及存储介质
WO2020124994A1 (zh) 活体检测方法、装置、电子设备及存储介质
WO2018103416A1 (zh) 用于人脸图像的检测方法和装置
US20210192192A1 (en) Method and apparatus for recognizing facial expression
US20230306792A1 (en) Spoof Detection Based on Challenge Response Analysis
TWI734246B (zh) 人臉辨識的方法及裝置
KR20210000671A (ko) 헤드 포즈 추정
WO2020224136A1 (zh) 界面交互方法及装置
CN108961314B (zh) 运动图像生成方法、装置、电子设备及计算机可读存储介质
WO2021227333A1 (zh) 人脸关键点检测方法、装置以及电子设备
CN110321009B (zh) Ar表情处理方法、装置、设备和存储介质
WO2020244076A1 (zh) 人脸识别方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19931846

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19931846

Country of ref document: EP

Kind code of ref document: A1