WO2022222705A1 - 设备控制方法和电子设备 - Google Patents

设备控制方法和电子设备 Download PDF

Info

Publication number
WO2022222705A1
WO2022222705A1 PCT/CN2022/083654 CN2022083654W WO2022222705A1 WO 2022222705 A1 WO2022222705 A1 WO 2022222705A1 CN 2022083654 W CN2022083654 W CN 2022083654W WO 2022222705 A1 WO2022222705 A1 WO 2022222705A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
gesture
electronic device
image
face image
Prior art date
Application number
PCT/CN2022/083654
Other languages
English (en)
French (fr)
Inventor
杨吉年
郭昊帅
李宏禹
刘宏马
张雅琪
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22790814.2A priority Critical patent/EP4310725A1/en
Publication of WO2022222705A1 publication Critical patent/WO2022222705A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4826End-user interface for program selection using recommendation lists, e.g. of programs or channels sorted out according to their score
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/011Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns

Definitions

  • the present application relates to the technical field of intelligent terminals, and in particular, to a device control method and an electronic device.
  • a user controls an electronic device such as a smart TV, in addition to using a remote control to control, the user can also use gestures to control.
  • the electronic device shoots the video stream through the camera set in the electronic device, detects the user's gesture from the video frame of the video stream, recognizes the user's different gestures as different commands, and responds accordingly, so that the user can use the gesture to control the electronic device. .
  • this method of controlling the electronic device through user gestures has a high probability of false triggering.
  • the present application provides a device control method and an electronic device, which can reduce false triggering of the electronic device due to user gestures.
  • the present application provides a device control method, which is applied to an electronic device, including: collecting a video stream; and detecting a gesture of the first user in the video stream when a frontal face image of the first user is detected in the video stream. ; Identify the operation corresponding to the gesture; perform the operation corresponding to the gesture.
  • the electronic device recognizes the user's gesture and responds. It also needs to detect the frontal face image of the first user, and only responds to the gesture of the first user whose frontal face image appears, so that the number of electronic devices can be reduced. False triggers due to user gestures.
  • detecting that the front face image of the first user appears in the video stream includes: detecting that the consecutive target video frames of the video stream include the front face image of the first user.
  • the continuous target video frames of the video stream include the frontal face image of the first user, including: for each target video frame in the continuous target video frames, obtaining the first user from the target video frame A face image of a user, the yaw angle of the face image is calculated, and it is determined that the yaw angle is smaller than a preset first threshold.
  • the continuous target video frames of the video stream include the frontal face image of the first user, and further includes: calculating the pitch angle and/or roll angle of the face image; judging that the pitch angle is smaller than a preset The second threshold, and/or, determine that the roll angle is smaller than a preset third threshold.
  • the method before detecting that the front face image of the first user appears in the video stream, the method further includes: collecting the first image in response to the face setting instruction; displaying the first image, and in the displayed first image The face image in the first image is indicated on the image; in response to the face selection instruction, the face image indicated by the face selection instruction is set as the face image of the first user.
  • the method further includes: when it is detected that the front face image of the second user appears in the video stream, detecting the gesture of the second user in the video stream; recognizing the operation corresponding to the gesture of the second user; executing the first Two operations corresponding to the user's gesture.
  • the method further includes: judging that the gesture of the first user and the gesture of the second user are simultaneously detected in the video stream, and selecting a gesture from the gesture of the first user and the gesture of the second user; identifying the selection the operation corresponding to the selected gesture; perform the operation corresponding to the selected gesture.
  • detecting the gesture of the first user includes: detecting a continuous target video frame including a human hand image of the first user from the continuous target video frames including the frontal face image of the first user; The continuous target video frames of a user's human hand image determine the movement trajectory of the first user's human hand image; and determine the first user's gesture according to the movement trajectory of the first user's human hand image.
  • an embodiment of the present application provides a device control method, which is applied to an electronic device, including: collecting a video stream; when detecting that a frontal image of the first user appears in the video stream, detecting the first user's face in the video stream gesture, and determine the attribute information and/or emotion information of the first user according to the front face image of the first user; the attribute information is used to record the attributes of the first user, and the emotion information is used to record the emotion of the first user; the identification gesture corresponds to operation; if the operation is a content recommendation operation, an interface corresponding to the content recommendation operation is displayed, and the recommended content is displayed on the interface, and the recommended content is obtained according to the attribute information and/or emotional information of the first user.
  • the electronic device recognizes the user's gesture and responds. It also needs to detect the frontal face image of the first user, and only responds to the gesture of the first user whose frontal face image appears, so that the number of electronic devices can be reduced. Due to false triggers generated by user gestures; moreover, content screening is performed according to the attribute information and/or emotional information of the frontal face frame, and the recommended content of the frontal face frame is obtained and displayed, thereby realizing the personalization for the user. The recommendation makes the displayed recommended content more targeted, improves the interaction effect between the electronic device and the user, and improves the user experience.
  • the method before displaying the interface corresponding to the content recommendation operation, the method further includes: sending the attribute information and/or emotion information of the first user to the server; receiving the information sent by the server in response to the attribute information and/or emotion information The first information, where the first information includes: recommended content matching the attribute information and/or the emotion information.
  • detecting that the front face image of the first user appears in the video stream includes: detecting that the consecutive target video frames of the video stream include the front face image of the first user.
  • the continuous target video frames of the video stream include the frontal face image of the first user, including: for each target video frame in the continuous target video frames, obtaining the first user from the target video frame A face image of a user, the yaw angle of the face image is calculated, and it is determined that the yaw angle is smaller than a preset first threshold.
  • the continuous target video frames of the video stream include the frontal face image of the first user, and further includes: calculating the pitch angle and/or roll angle of the face image; judging that the pitch angle is smaller than a preset The second threshold, and/or, determine that the roll angle is smaller than a preset third threshold.
  • the method before detecting that the front face image of the first user appears in the video stream, the method further includes: collecting the first image in response to the face setting instruction; displaying the first image, and in the displayed first image The face image in the first image is indicated on the image; in response to the face selection instruction, the face image indicated by the face selection instruction is set as the face image of the first user.
  • the method further includes: when it is detected that the front face image of the second user appears in the video stream, detecting the gesture of the second user in the video stream; recognizing the operation corresponding to the gesture of the second user; executing the first Two operations corresponding to the user's gesture.
  • the method further includes: judging that the gesture of the first user and the gesture of the second user are simultaneously detected in the video stream, and selecting a gesture from the gesture of the first user and the gesture of the second user; identifying the selection the operation corresponding to the selected gesture; perform the operation corresponding to the selected gesture.
  • detecting the gesture of the first user includes: detecting a continuous target video frame including a human hand image of the first user from the continuous target video frames including the frontal face image of the first user; The continuous target video frames of a user's human hand image determine the movement trajectory of the first user's human hand image; and determine the first user's gesture according to the movement trajectory of the first user's human hand image.
  • an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory is used to store a computer program, and when the processor executes the computer program, the electronic device executes any one of the first aspect. method described in item.
  • an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory is used to store a computer program, and when the processor executes the computer program, the electronic device executes any one of the second aspect. method described in item.
  • embodiments of the present application provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on a computer, causes the computer to execute the method of any one of the first aspect.
  • embodiments of the present application provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on a computer, causes the computer to execute the method of any one of the second aspect.
  • the present application provides a computer program for performing the method of the first aspect when the computer program is executed by a computer.
  • the program in the seventh aspect may be stored in whole or in part on a storage medium packaged with the processor, or may be stored in part or in part in a memory not packaged with the processor.
  • FIG. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 2 is a block diagram of a software structure of an electronic device according to an embodiment of the present application.
  • 3A is a schematic interface diagram of the smart TV of the application.
  • 3B is a schematic diagram of a setting interface in the smart TV of the application.
  • 3C is a schematic interface diagram of the smart TV of the application.
  • FIG. 4 is a schematic diagram of an applicable scenario of the device control method of the present application.
  • FIG. 5 is a flowchart of an embodiment of the device control method of the present application.
  • 6A is a schematic diagram of the standby state of the smart TV of the present application.
  • 6B is a schematic diagram of the interface after the smart TV of the present application is turned on;
  • FIG. 7A is a schematic diagram of a playback screen of Video 1 of the application.
  • FIG. 7B is a schematic diagram of a video selection interface of the application.
  • 9A is a schematic diagram of the applicant's body frame, face frame and human hand frame
  • 9B is a schematic diagram of a method for establishing a pixel coordinate system of the present application.
  • 10A is a schematic diagram of a preset face image setting interface of the present application.
  • 10B is a schematic diagram of a preset face image selection interface of the present application.
  • 11A is a schematic diagram of a method for establishing a camera coordinate system of the present application.
  • 11B is a schematic diagram of the method for establishing a face angle coordinate system of the applicant.
  • FIG. 12 is a flowchart of another embodiment of the device control method of the present application.
  • 13A to 13B are exemplary diagrams of the display mode of the recommended content of the present application.
  • 14A to 14E are exemplary diagrams of the display mode of the recommended content of the application.
  • FIG. 15 is a structural diagram of another embodiment of the electronic device of the present application.
  • the method of controlling an electronic device through a user gesture has a high probability of false triggering. Specifically, the user makes a certain gesture in daily life, but the gesture is not made by the user to control the electronic device, but the electronic device recognizes it as an instruction issued by the user for the electronic device, so as to make response, resulting in false triggering of electronic equipment.
  • embodiments of the present application provide a device control method and an electronic device, which can reduce false triggering of the electronic device due to user gestures.
  • FIG. 1 is a schematic structural diagram of an electronic device 100 .
  • Electronic device 100 may include cell phones, foldable electronic devices, tablet computers, desktop computers, laptop computers, handheld computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, cell phones, personal computers Personal digital assistant (PDA), augmented reality (AR) device, virtual reality (VR) device, artificial intelligence (AI) device, wearable device, in-vehicle device, smart home equipment, or at least one of smart city equipment.
  • PDA personal digital assistant
  • AR augmented reality
  • VR virtual reality
  • AI artificial intelligence
  • wearable device wearable device
  • smart home equipment smart home equipment
  • smart city equipment smart city equipment
  • the electronic device 100 may include a processor 110 , an internal memory 121 , a camera module 193 , and a display screen 194 .
  • the electronic device 100 may further include: an external memory interface 120 , a universal serial bus (USB) connector 130 , and a charging management module 140 , power management module 141, battery 142, antenna 2, wireless communication module 160, audio module 170, speaker 170A, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, and user identification module ( subscriber identification module, SIM) card interface 195, etc.
  • USB universal serial bus
  • the electronic device 100 may further include: an antenna 1, a mobile communication module 150, a receiver 170B, and the like.
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural-network processing unit neural-network processing unit
  • the processor can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in the processor 110 may be a cache memory.
  • the memory may store instructions or data that are used by the processor 110 or are frequently used. If the processor 110 needs to use the instructions or data, it can be called directly from this memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • the processor 110 may be connected to modules such as a touch sensor, an audio module, a wireless communication module, a display, a camera module, and the like through at least one of the above interfaces.
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the USB connector 130 is an interface conforming to the USB standard specification, which can be used to connect the electronic device 100 and peripheral devices, and specifically can be a Mini USB connector, a Micro USB connector, a USB Type C connector, and the like.
  • the USB connector 130 can be used to connect to a charger, so that the charger can charge the electronic device 100, and can also be used to connect to other electronic devices, so as to transmit data between the electronic device 100 and other electronic devices. It can also be used to connect headphones to output audio stored in electronic devices through the headphones.
  • This connector can also be used to connect other electronic devices, such as VR devices, etc.
  • the standard specifications of the Universal Serial Bus may be USB1.x, USB2.0, USB3.x, and USB4.
  • the charging management module 140 is used for receiving charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB connector 130 .
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera module 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), bluetooth low power power consumption (bluetooth low energy, BLE), ultra wide band (UWB), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other electronic devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
  • the GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi- zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 may implement a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED diode AMOLED
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • electronic device 100 may include one or more display screens 194 .
  • the electronic device 100 may implement a camera function through a camera module 193, an ISP, a video codec, a GPU, a display screen 194, an application processor AP, a neural network processor NPU, and the like.
  • the camera module 193 can be used to collect color image data and depth data of the photographed object.
  • the ISP can be used to process the color image data collected by the camera module 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera module 193 .
  • the camera module 193 may be composed of a color camera module and a 3D sensing module.
  • the photosensitive element of the camera of the color camera module may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CCD charge coupled device
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the 3D sensing module may be a time of flight (TOF) 3D sensing module or a structured light (structured light) 3D sensing module.
  • the structured light 3D sensing is an active depth sensing technology, and the basic components of the structured light 3D sensing module may include an infrared (Infrared) emitter, an IR camera module, and the like.
  • the working principle of the structured light 3D sensing module is to first emit a light spot of a specific pattern on the object to be photographed, and then receive the light coding of the light spot pattern on the surface of the object, and then compare the similarities and differences with the original projected light spot. And use the principle of trigonometry to calculate the three-dimensional coordinates of the object.
  • the three-dimensional coordinates include the distance between the electronic device 100 and the object to be photographed.
  • the TOF 3D sensing can be an active depth sensing technology, and the basic components of the TOF 3D sensing module can include an infrared (Infrared) transmitter, an IR camera module, and the like.
  • the working principle of the TOF 3D sensing module is to calculate the distance (ie depth) between the TOF 3D sensing module and the object to be photographed through the time of infrared reentry to obtain a 3D depth map.
  • Structured light 3D sensing modules can also be used in face recognition, somatosensory game consoles, industrial machine vision detection and other fields.
  • TOF 3D sensing modules can also be applied to game consoles, augmented reality (AR)/virtual reality (VR) and other fields.
  • AR augmented reality
  • VR virtual reality
  • the camera module 193 may also be composed of two or more cameras.
  • the two or more cameras may include color cameras, and the color cameras may be used to collect color image data of the photographed object.
  • the two or more cameras may use stereo vision technology to collect depth data of the photographed object.
  • Stereoscopic vision technology is based on the principle of human eye parallax. Under natural light sources, two or more cameras are used to capture images of the same object from different angles, and then operations such as triangulation are performed to obtain the electronic device 100 and the object. The distance information between the objects, that is, the depth information.
  • the electronic device 100 may include one or more camera modules 193 .
  • the electronic device 100 may include a front camera module 193 and a rear camera module 193 .
  • the front camera module 193 can usually be used to collect the color image data and depth data of the photographer facing the display screen 194, and the rear camera module can be used to collect the shooting objects (such as people, landscapes, etc.) that the photographer faces. etc.) color image data and depth data.
  • the CPU, GPU or NPU in the processor 110 may process the color image data and depth data collected by the camera module 193 .
  • the NPU can recognize the color image data collected by the camera module 193 (specifically, the color camera module) through a neural network algorithm based on the skeleton point recognition technology, such as a convolutional neural network algorithm (CNN). , to determine the skeleton point of the person being photographed.
  • CNN convolutional neural network algorithm
  • the CPU or GPU can also run the neural network algorithm to realize the determination of the skeletal points of the photographed person according to the color image data.
  • the CPU, GPU or NPU can also be used to confirm the figure (such as the body of the person being photographed) according to the depth data collected by the camera module 193 (which may be a 3D sensing module) and the identified skeletal points. ratio, the fatness and thinness of the body parts between the skeletal points), and can further determine the body beautification parameters for the photographed person, and finally process the photographed image of the photographed person according to the body beautification parameters, so that the photographed image
  • the body shape of the person to be photographed is beautified. Subsequent embodiments will introduce in detail how to perform body beautification processing on the image of the person being photographed based on the color image data and depth data collected by the camera module 193 , which will not be described here.
  • Digital signal processors are used to process digital signals and can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 Moving Picture Experts Group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card. Or transfer music, video and other files from electronic devices to external memory cards.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the processor 110 executes various functional methods or data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 may listen to music through the speaker 170A, or output an audio signal for a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be a USB connector 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the pressure sensor 180A may be provided on the display screen 194 .
  • the capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .
  • the angular velocity of electronic device 100 about three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyroscope sensor 180B detects the shaking angle of the electronic device 100, calculates the distance to be compensated by the lens module according to the angle, and controls the reverse movement of the lens to offset the shaking of the electronic device 100 to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenarios.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
  • the magnetic sensor 180D can be used to detect the folding or unfolding of the electronic device, or the folding angle.
  • the electronic device 100 when the electronic device 100 is a flip machine, the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • the electronic device 100 can measure the distance through infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When the intensity of the detected reflected light is greater than the threshold, it may be determined that there is an object near the electronic device 100 . When the intensity of the detected reflected light is less than the threshold, the electronic device 100 may determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • Proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L may be used to sense ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is blocked, eg, the electronic device is in a pocket. When it is detected that the electronic device is blocked or in a pocket, some functions (such as touch functions) can be disabled to prevent misuse.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking pictures with fingerprints, answering incoming calls with fingerprints, and the like.
  • the temperature sensor 180J is used to detect the temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature detected by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in the performance of the processor in order to reduce the power consumption of the electronic device to implement thermal protection.
  • the electronic device 100 heats the battery 142 when the temperature detected by the temperature sensor 180J is below another threshold. In other embodiments, the electronic device 100 may boost the output voltage of the battery 142 when the temperature is below yet another threshold.
  • Touch sensor 180K also called “touch device”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
  • the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibration bone block obtained by the bone conduction sensor 180M, and realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.
  • the keys 190 may include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, battery change, and may also be used to indicate messages, missed calls, notifications, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be contacted and separated from the electronic device 100 by inserting into the SIM card interface 195 or pulling out from the SIM card interface 195 .
  • the electronic device 100 may support one or more SIM card interfaces.
  • the SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. Multiple cards can be inserted into the same SIM card interface 195 at the same time. Multiple cards can be of the same type or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 100 employs an eSIM, ie: an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100 .
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiments of the present application take an Android system with a layered architecture as an example to exemplarily describe the software structure of the electronic device 100 .
  • FIG. 2 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into five layers, from top to bottom, the application layer, the application framework layer, the Android runtime (Android runtime, ART) and the native C/C++ library, and the hardware abstraction layer (Hardware abstraction layer). Abstract Layer, HAL) and kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include window managers, content providers, view systems, resource managers, notification managers, activity managers, input managers, and so on.
  • the window manager provides window management services (Window Manager Service, WMS).
  • WMS can be used for window management, window animation management, surface management, and as a transfer station for the input system.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • This data can include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
  • Activity Manager can provide activity management services (Activity Manager Service, AMS), AMS can be used for system components (such as activities, services, content providers, broadcast receivers) startup, switching, scheduling and application process management and scheduling work .
  • AMS Activity Manager Service
  • system components such as activities, services, content providers, broadcast receivers
  • the input manager can provide an input management service (Input Manager Service, IMS), and the IMS can be used to manage the input of the system, such as touch screen input, key input, sensor input and so on.
  • IMS Input Manager Service
  • IMS fetches events from input device nodes, and distributes events to appropriate windows through interaction with WMS.
  • the Android runtime includes the core library and the Android runtime.
  • the Android runtime is responsible for converting source code to machine code.
  • the Android runtime mainly includes the use of ahead or time (AOT) compilation technology and just in time (JIT) compilation technology.
  • the core library is mainly used to provide the functions of basic Java class libraries, such as basic data structures, mathematics, IO, tools, databases, networks and other libraries.
  • the core library provides an API for users to develop Android applications.
  • a native C/C++ library can include multiple functional modules. For example: surface manager, Media Framework, libc, OpenGL ES, SQLite, Webkit, etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media framework supports playback and recording of many common audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • OpenGL ES provides the drawing and manipulation of 2D graphics and 3D graphics in applications. SQLite provides a lightweight relational database for applications of the electronic device 100 .
  • the hardware abstraction layer runs in user space, encapsulates the kernel layer driver, and provides a calling interface to the upper layer.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon, for example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
  • the camera module 193 captures still images or video.
  • the device control method of the present application will be described by taking the electronic device being a smart TV as an example.
  • the device control method of the present application can be used as a function provided by the operating system of the electronic device, or as a function provided by a third-party application (application, App) of the electronic device. Assuming that this function is called an intelligent interaction function, the user can select the function from the operating system or a third-party application. Open the interface where the intelligent interaction function is located in the application.
  • the user selects the “setting” control displayed on the smart TV to enter the setting function.
  • the setting function the user can find a setting interface provided in the setting function.
  • the setting interface includes "smart interaction function".
  • the smart TV can trigger the execution of the device control method of the present application after receiving the user's intelligent interaction trigger instruction.
  • the user selects the "App X" application icon displayed on the smart TV to open App X.
  • the user can find a setting interface provided in Application X.
  • the setting interface includes an "intelligent interaction function", and the user selects "Enable” corresponding to the "intelligent interaction function”. ” control to activate the intelligent interaction function.
  • the smart TV receives the user’s intelligent interaction trigger instruction, it can trigger the execution of the device control method of the present application.
  • the above is only a possible implementation of the smart TV triggering the execution of the device control method of the present application. If the smart interactive function is enabled by default in the setting function of the smart TV or in the above-mentioned application X, the user does not need to perform the above process, and the smart TV also It can trigger the execution of the device control method of the present application.
  • the smart TV triggers the execution of the device control method of the present application, starts the camera to start video shooting, and obtains a video stream; if the user is located within the camera shooting range of the smart TV, the front face is facing the smart TV, preferably the front face is facing the smart TV.
  • the camera of the TV makes gestures corresponding to the function according to the function the user wants the smart TV to perform; correspondingly, the video stream captured by the camera includes the user's body image, and the smart TV can check whether the face image in the body image is positive.
  • the face image is determined to determine whether the user's face is facing the smart TV. After determining that the face image is a frontal image, the user's gesture is recognized according to the motion trajectory of the human hand image in the human body image.
  • the camera that shoots the video stream may be a camera built into the electronic device, or may be a camera independent of the electronic device, which is not limited in this application.
  • the camera that shoots the video stream can be set in the middle position directly above the electronic device. For example, as shown in FIG. 4 , it is shown that the possible setting position of the camera built in the smart TV is the upper middle position of the smart TV, and the shooting direction of the camera may be horizontally forward.
  • the correspondence between gestures and operations that can be triggered by the smart TV can be preset in the smart TV, and the operations that can be triggered by the smart TV may include but are not limited to changing channels, adjusting volume, or switching interfaces.
  • the smart TV recognizes the operation corresponding to the gesture, and executes the operation corresponding to the gesture. For example, if the smart TV recognizes that the operation corresponding to the gesture is to switch to the main interface, the smart TV switches the displayed interface to the main interface.
  • the electronic device only responds to the gesture made when the user faces the electronic device. Compared with the electronic device in the prior art, the electronic device responds when the user's face is facing the electronic device. The judgment of the device can reduce the problem of false triggering of the electronic device due to user gestures.
  • FIG. 5 it is a schematic flowchart of the device control method of the present application, including:
  • Step 501 The electronic device enables intelligent interaction.
  • Step 501 is an optional step.
  • Step 502 The electronic device collects the video stream.
  • the electronic device can activate the camera to shoot and obtain a video stream.
  • Step 503 The electronic device performs gesture recognition from the video stream.
  • This step can include:
  • the gesture of the first user is recognized according to several consecutive target video frames passed through the front face discrimination.
  • the first user may be a user corresponding to any human body image included in the target video frame, or may be a pre-designated user, such as a user or owner of an electronic device.
  • Step 504 The electronic device recognizes the operation corresponding to the gesture, and executes the operation corresponding to the gesture.
  • the correspondence between the gesture and the operation of the electronic device can be preset in the electronic device. For example, assuming that the electronic device is a smart TV, the first gesture corresponds to the operation "return to the previous page" of the smart TV, and the second gesture corresponds to the operation of the smart TV. The operation of the smart TV “Turn on the smart TV” corresponds, then,
  • step 503 if the second gesture is detected in step 503, then referring to FIG. 6B , the smart TV is turned on, and the smart TV displays the main interface after it is turned on;
  • the smart TV is playing video 1
  • the smart TV returns to the previous interface for playing the video, such as the video selection interface shown in FIG. 7B
  • the video selection interface displays There are selection controls corresponding to several videos including Video 1.
  • step 503 the electronic device performs frontal face discrimination on the face image of the first user in the target video frame, and recognizes the gesture of the first user according to several consecutive target video frames passed through the frontal face discrimination, and then performs an operation of gesture matching, Compared with the prior art, the electronic device responds when it recognizes the user's gesture, which increases the judgment of the user's face facing the electronic device, thereby reducing the problem of false triggering of the electronic device due to the user's gesture.
  • step 503 and step 504 will be described in more detail below with reference to FIG. 8 .
  • Figure 8 including:
  • Step 801 The electronic device performs multi-task detection on each target video frame in the video stream.
  • the multi-task detection in this step may include: detecting a human body image, a human face image and a human hand image from the target video frame, and obtaining a human body frame, and a face frame and a human hand frame in each human body frame.
  • the target video frame is the video frame in the video stream that the electronic device needs to process.
  • the electronic device can use each video frame in the video stream as the target video frame; or, the electronic device can also use some video frames in the video stream as the target video frame according to a certain preset rule, for example, the preset rule can be: If the interval is 1 video frame, the electronic device can use the 1st, 3rd, 5th... video frames in the video stream as the target video frame.
  • the above-mentioned human frame is used to identify the human body image detected from the target video frame
  • the face frame is used to identify the human face image detected from the target video frame
  • the human hand frame is used to identify the human hand image detected from the target video frame.
  • the human body frame may be the circumscribing frame of the human body contour detected in the target video frame
  • the face frame may be the circumscribing frame of the human body contour in the human body frame
  • the human hand frame may be the circumscribing frame of the human hand contour in the human body frame.
  • FIG. 9A a human body frame 710 , a face frame 711 , and a human hand frame 712 corresponding to a certain human body image in the target video frame are shown.
  • a pixel coordinate system may be established based on the target video frame.
  • the vertex of the upper left corner of the target video frame is taken as the origin of the pixel coordinate system from the angle at which the user views the target video frame, and along the target video frame
  • the rightward direction of the upper border of the target video frame is the positive direction of the x-axis
  • the downward direction along the left border of the target video frame is the positive direction of the y-axis.
  • the electronic device can determine the pixel coordinate of each pixel in the target video frame. coordinates in the system, hereinafter referred to as pixel coordinates.
  • the above-mentioned human body frame, face frame and human hand frame may be rectangular frames, and the electronic device may record a rectangular frame by using the pixel coordinates of the diagonal vertices of the rectangular frame.
  • the rectangular frame ABCD shown in FIG. 9B the rectangular frame ABCD can be recorded by the pixel coordinates of the diagonal vertices AC, or BD.
  • a pre-trained first model for detecting human frame, face frame and human hand frame can be preset in the electronic device.
  • the input of the first model is the target video frame, and the output is each human body included in the target video frame.
  • the human frame, face frame and hand frame corresponding to the image.
  • the training samples can be several pictures including human body images, the human body frame is marked in the picture, and the face frame and the human hand frame in each human body frame are marked, and a deep learning network is preset. , such as Convolutional Neural Networks (CNN), the training samples are input into the deep learning network for model training, so as to obtain the above-mentioned first model.
  • CNN Convolutional Neural Networks
  • the number of human body frames that the electronic device can detect from a target video frame is related to the number of human body images actually included in the target video frame, which can be zero, one or more.
  • the following steps 802 and 803 are used to realize the front face determination of the target face frame by the electronic device.
  • the above-mentioned frontal face determination is used to determine that the face in the target face frame is facing the electronic device, that is, to determine that the face image in the target face frame is a frontal image.
  • the above-mentioned target face frame may be a face frame including a preset face image in the face frame detected in the target video frame, or may be each face frame detected in the target video frame.
  • the target face frame is a face frame including a preset face image in the face frame
  • a certain face image can be preset in the electronic device, so that in this step, the electronic device first determines the person detected in the target video frame There is a face frame including the preset face image in the face frame, and a target face frame is obtained, and then a frontal face determination is performed on the target face frame.
  • the electronic device can sequentially use each face frame as the target face frame to perform frontal face determination.
  • the process of presetting a certain face image in the electronic device may be completed before executing this step.
  • the user may preset a target face image for the electronic device before triggering the intelligent interaction function, and the target face There may be one or more images, and the specific number is not limited in this application.
  • the target face image may be the face image of the owner of the electronic device, or the face image of the user who frequently uses the electronic device.
  • the electronic device can provide the user with a setting interface for the target face image. For example, as shown in FIG. 10A , the setting interface displays a "target face image setting" control, and the user clicks the "target face image setting" control in the electronic device, The electronic device starts the camera and displays the selection interface of the target face image.
  • the electronic device displays the image captured by the camera in the selection interface of the target face image, and identifies several detected individuals in the image.
  • face image the above-mentioned face image can be identified by the face frame
  • the user can select a face image as the target face image, correspondingly, the electronic device receives the user's face selection instruction, and the face selection instruction
  • the indicated face image is set as the target face image.
  • the user can enter the setting interface of the target face image at any time, and perform editing operations such as adding, modifying, and deleting the target face image, which is not limited in this application.
  • Step 802 The electronic device performs state detection on the face image identified by the target face frame in the target video frame.
  • the above state detection is used to detect the degree to which the human face image is a frontal image, that is, the degree to which the frontal face of the human face in the human face image faces the electronic device.
  • the above state detection result may include: the yaw angle of the face image, preferably, in order to improve the accuracy of the frontal face determination in the subsequent steps, the state detection result may also include: the pitch angle and/or the roll angle of the face image.
  • the state detection result includes the yaw angle, pitch angle, and roll angle of the face image as an example.
  • the camera coordinate system can be established in advance.
  • the physical center point of the camera is used as the origin, and the back of the camera is pointed horizontally through the origin as the positive direction, and the user’s viewing direction is consistent with the positive direction.
  • the left direction is used as the positive x-axis direction ox
  • the horizontal upward direction passing through the origin is used as the y-axis positive direction oy
  • the horizontal direction passing through the origin to the front of the back of the camera is used as the z-axis positive direction oz.
  • a face angle coordinate system can be established in advance.
  • the center point of the head of the person is taken as the origin o, and from the perspective of the person facing forward, the horizontal direction to the left through the origin is x
  • the positive axis direction ox', the vertical upward direction passing through the origin is the positive y-axis direction oy', and the horizontal forward direction passing the origin is the positive z-axis direction oz'.
  • the pitch angle refers to: the rotation angle Pitch along ox'
  • the roll angle refers to: the rotation angle Roll along the oz'
  • the yaw angle refers to: the rotation angle Yaw along the oy'.
  • the pitch angle, roll angle and yaw angle of the face image calculated based on the face angle coordinate system are converted to the camera coordinate system and the angle values remain unchanged. Therefore, the above-mentioned angle value of the face image calculated based on the face angle coordinate system can represent the degree to which the face image in the video frame captured by the camera is a frontal face image.
  • the angle regression model can be preset in the electronic device.
  • the angle regression model is used to detect the pitch angle, roll angle and yaw angle of the face image.
  • the input of the angle regression model is the face image, and the output is the pitch angle and roll angle of the face image.
  • the electronic device can cut out the face image in the target face frame from the target video frame, input the face image into the preset angle regression model, and obtain the pitch angle, roll angle and Yaw angle.
  • the angle regression model can be pre-trained.
  • the training method can include: taking face images marked with pitch angle, roll angle and yaw angle as samples, the initial model can be Convolutional Neural Networks (CNN), The above samples are input to the initial model for training.
  • the initial model can learn key features of the face in the face image, such as the distance between the eyebrows, the position of the nose in the face image, the position of the mouth in the face image, etc., and obtain the angle regression model. .
  • Step 803 The electronic device determines whether the state detection result of the target face frame is within a preset threshold.
  • this step may include: the electronic device determines that the yaw angle of the face image in the target face frame is smaller than the preset first threshold;
  • this step may further include: the electronic device determines that the pitch angle of the face image in the target face frame is smaller than the preset second threshold, And/or, it is determined that the roll angle of the face image in the target face frame is smaller than a preset third threshold.
  • the above-mentioned first threshold, second threshold and third threshold are conditions for determining whether the face image in the target face frame is a frontal face image.
  • the specific values of the above-mentioned first threshold, second threshold, and third threshold are not limited in this embodiment of the present application.
  • the first threshold, the second threshold, and the third threshold may be respectively 15 degrees.
  • step 804 If there are multiple target face frames in the target video frame, in this step, the above judgment will be performed on each target face frame in turn, and the electronic device determines that the state detection result of at least one target face frame in the target video frame is within Within the corresponding threshold, it is indicated that the face image in the target face frame is a frontal face image, then step 804 is performed; otherwise, the process returns to step 801, and the electronic device processes the next target video frame.
  • the target face frame in which the face image included in this step is a frontal face image is hereinafter referred to as a frontal face frame.
  • Step 804 The electronic device determines, with respect to the frontal face frame in the target video frame, the human hand frame corresponding to the frontal face frame.
  • the human hand frame corresponding to the frontal face frame refers to the human hand frame that belongs to the same human frame as the frontal face frame.
  • the human face frame 711 and the human hand frame 712 in FIG. 9A are the human face frame and the human hand frame belonging to the same human frame. Therefore, the human hand frame corresponding to the human face frame 711 is the human hand frame 712 .
  • Step 805 The electronic device determines the motion trajectory of the human hand frame corresponding to the front face frame.
  • any frontal face frame in the target video frame in this step is referred to as the first frontal face frame
  • the human hand frame corresponding to the first frontal face frame is referred to as the first human hand frame.
  • a method for determining the motion trajectory of the human hand frame. This step can include:
  • the electronic device obtains several consecutive first target video frames, and the first target video frame includes a second target video frame and a third target video frame; wherein, the second target video frame is the first face frame in step 405.
  • the target video frame to which it belongs the third target video frame is the target video frame located before the second target video frame, and the third target video frame includes a second face frame, and the second face frame is the same as the first frame.
  • the frontal face frame matched by the frontal face frame;
  • the electronic device obtains several consecutive fourth target video frames from the third target video frame, the fourth target video frame includes a second human hand frame, and the second human hand frame is a human hand frame matched with the first human hand frame;
  • the motion trajectory of the first human hand frame is determined according to the pixel coordinates of the first human hand frame in the second target video frame and the pixel coordinates of the second human hand frame in the fourth target video frame.
  • An example of a method for determining that a certain target video frame includes a second frontal face frame is as follows:
  • the frontal face frame is the frontal face frame that matches the first frontal face frame, that is, the second frontal face frame, otherwise, the target video frame does not include the second frontal face frame face frame.
  • the designated point of the frontal face frame of the target video frame and the designated point of the first frontal face frame are points in the same position in the face frame. The distance between two specified points can be calculated from the pixel coordinates of the two specified points.
  • An upper limit value can be set for the number of first target video frames obtained by the electronic device, that is, the maximum number of first target video frames obtained each time. For example, if the maximum number is 5, the electronic device can obtain a maximum of 5 frames. A target video frame.
  • An example of a method for determining that a certain target video frame includes the first person's hand frame is as follows:
  • the human hand frame with the smallest distance between the designated point and the designated point of the first human hand frame from the human hand frame of the target video frame; if the above-mentioned distance corresponding to the human hand frame is less than the preset threshold, then the human hand frame is the second human hand frame. frame, otherwise, the target video frame does not include the frame of the second person's hand.
  • the designated point of the human hand frame of the target video frame and the designated point of the first human hand frame are points in the same position in the human hand frame.
  • the distance between two specified points can be calculated from the pixel coordinates of the two specified points.
  • the designated point when determining the motion trajectory of the first human hand frame, it can be determined based on the pixel coordinates of the designated point of the first human hand frame and the designated point of the second human hand frame in each target video frame, and the designated point can be Any point in the frame of the human hand is preferably a certain vertex or a center point of the frame of the human hand.
  • the designated point of the first hand frame and the designated point of the second hand frame are the same points in the human hand frame.
  • the first target video frame may be a target video frame that does not trigger the function, that is to say, if the electronic device has realized the recognition of the gesture instruction according to the face frame and the hand frame in a target video frame, and then triggers the After a certain operation, the target video frame is no longer used as the first target video frame corresponding to the face frame.
  • Step 806 The electronic device matches the motion trajectory of the human hand frame with a preset gesture. If the motion trajectory of a human hand frame is successfully matched with a preset gesture, step 807 is performed; otherwise, return to step 801, and the electronic device will A target video frame for processing.
  • Different gestures can be preset in the electronic device.
  • the above-mentioned gestures can be embodied as the user's left and right waving, up and down, etc.
  • the gesture can be expressed as the feature of the motion trajectory.
  • the first gesture can be set For: the left and right width of the motion track reaches the first multiple of the reference width
  • the second gesture can be set as: the upper and lower heights of the motion track reach the second multiple of the reference height, and so on.
  • the electronic device matches the motion trajectory of the first person's hand frame with a preset gesture, which may include:
  • the electronic device calculates the reference width according to the width of the first person's hand frame
  • the above calculation of the reference width may be implemented by calculating the average or median of the widths of the first hand frame.
  • the specific value of the first multiple is not limited in the embodiment of the present application, which is related to the motion range of the user waving left and right required by the electronic device.
  • the user makes a gesture of waving left and right.
  • the electronic device matches the motion trajectory of the first person's hand frame with a preset gesture, which may include:
  • the electronic device calculates the reference height according to the height of the first person's hand frame
  • the above calculation of the reference height can be achieved by calculating the average or median of the heights of the first person's hand frame.
  • the specific value of the second multiple is not limited in the embodiment of the present application, which is related to the motion range of the user waving up and down required by the electronic device.
  • the user makes a gesture of waving up and down.
  • Step 807 The electronic device recognizes the operation corresponding to the gesture, and executes the operation corresponding to the gesture.
  • each face frame in the target video frame is determined as the target face frame in step 802, then in step 803 there may be multiple face frames that are front face frames, correspondingly, in step 806, there may be Gestures corresponding to multiple face frames will be obtained.
  • the electronic device sequentially performs the above steps for each face frame in the target video frame according to the flow shown in FIG. 8, if there are multiple face frames in the face frame of the target video frame, each face frame can be identified in turn.
  • the gesture corresponding to the face frame perform the operation corresponding to the gesture. For example, assuming that there are face frame 1 and face frame 2 in the target video frame, the gesture corresponding to face frame 1 can be identified first. 1.
  • the electronic device performs the above steps for multiple face frames in parallel, if there are multiple face frames in the face frame of the target video frame, and the electronic device recognizes the gestures corresponding to the multiple face frames at the same time, then, A gesture corresponding to a frontal face frame can be selected from it, and an operation corresponding to the gesture can be performed. Specifically, which gesture corresponding to the frontal face frame is selected, that is, the gesture corresponding to which user is selected to respond, which is not limited in this application.
  • the electronic device on the basis of the aforementioned method, detects the attribute information and/or emotional information of the face frame of the front face, and if the operation corresponding to the user's gesture is executed, it is a display interface, In addition, recommended content is displayed in the displayed interface, and the recommended content can be obtained according to the attribute information and/or emotional information of the face frame. Therefore, the recommended content is more targeted, the interaction effect between the electronic device and the user is improved, and the user experience is improved.
  • the method may include:
  • Step 1201 The electronic device captures the video stream.
  • Step 1202 The electronic device performs multi-task detection on each target video frame in the video stream.
  • Step 1203 The electronic device performs state detection on the face image identified by the target face frame in the target video frame.
  • Step 1204 The electronic device determines whether the state detection result of the target face frame is within a preset threshold.
  • the electronic device determines that the state detection result of the target face frame is within the corresponding threshold, indicating that the face image in the target face frame is a frontal face image, Then go to steps 1205 and 1208, otherwise, return to step 1202, and the electronic device processes the next target video frame.
  • step 1205 is executed, otherwise the next target face frame is judged in this step, if all the target face frames have If the judgment results are all negative, the process returns to step 1202, and the electronic device processes the next target video frame.
  • Step 1205 The electronic device determines, with respect to the frontal face frame in the target video frame, the human hand frame corresponding to the frontal face frame.
  • Step 1206 The electronic device determines the motion trajectory of the human hand frame corresponding to the face frame.
  • Step 1207 The electronic device matches the motion track of the human hand frame with a preset gesture. If the motion track of a human hand frame is successfully matched with a preset gesture, step 1209 is executed. If no match is successful, return to step 1202. The electronic device processes the next target video frame.
  • Step 1208 the electronic device detects the attribute information and/or emotion information of the front face frame, and executes step 1209 .
  • Step 1209 The electronic device recognizes the operation corresponding to the gesture of the face frame. If the above operation is a content recommendation operation, the interface corresponding to the operation is displayed. The above interface displays the recommended content, and the recommended content is based on the attributes of the face frame. Information and/or emotional information acquisition.
  • step 1208 is as follows:
  • a second model for detecting emotional information of a face can be preset in the electronic device; the input of the second model can be a face image, such as a face image in a frontal face frame, and the output is a possible value of the emotional information Probability value; for example, if the preset emotional information values are: happy, angry, sad, neutral, surprised, disgusted, and fearful, then the output is the probability value of each value.
  • the emotional value is used as the emotional information of the face output by the second model. For example, suppose that after a certain face image is input into the second model, the probability value of happiness among the probability values corresponding to the values of each emotional information output by the second model The highest, the emotional information of the face image is happy.
  • the training samples may be: face images marked with values of emotional information, and input the training samples into a preset deep learning network for training to obtain the second model.
  • a third model for detecting the attribute information of a face can be preset in the electronic device, the input of the third model can be a face image, such as a face image in a front-face face frame, and the output is a possible value of the attribute information Probability value; for example, the preset attribute information may include gender and age range, the value of gender may be: male and female, and the value of age range may be: child, youth, middle-aged, and elderly.
  • the training samples may be: face images marked with values of attribute information, and input the training samples into a preset deep learning network for training to obtain the third model.
  • the face image in the front face frame is input into the second model and/or the third model, and the attribute information and/or emotion information of the front face frame can be obtained.
  • step 1209 is as follows:
  • the corresponding relationship between the gesture and the operation of the electronic device may be preset in the electronic device, and accordingly, in this step, the operation corresponding to the gesture may be recognized based on the corresponding relationship.
  • operations that can be triggered by the smart TV may include, but are not limited to, changing channels, adjusting volume, or switching interfaces. If the electronic device displays an interface after performing a certain operation, and the recommended content is displayed on the interface, such an operation is referred to as a content recommendation operation in this application. It should be noted that, in addition to displaying recommended content, the above-mentioned displayed interface may also display other content, which is not limited in this application.
  • step 1209 if the operation corresponding to the gesture is a content recommendation operation, and the interface displayed after performing the operation has a control for displaying the recommended content, the electronic device recommends the content according to the attribute information and/or emotional information of the face frame. After screening, the recommended content corresponding to the front face frame is obtained, and the recommended content is displayed in the interface.
  • the recommended content can be obtained only according to the attribute information and/or emotional information of the front-face frame corresponding to the operation. For example, if the electronic device recognizes the operation corresponding to the gesture of the front-face frame 1, Displaying the interface corresponding to the operation, the recommended content displayed in the interface can be obtained only according to the attribute information and/or emotional information of the face frame 1;
  • the recommended content may be obtained according to the attribute information and/or emotion information of multiple face frames in the same target video frame. For example, if the electronic device recognizes the gesture of the face frame 1 Corresponding operation, the interface corresponding to the operation is displayed, and the electronic device determines whether there are other positive-face frames in the target video frame where the front-face frame 1 is located. face frame 1 and face frame 2, the recommended content displayed in the interface can be obtained according to the attribute information and/or emotion information of face frame 1 and face frame 2.
  • the other front-face frames whose gestures are not recognized in the same target video frame may be the front-face frames corresponding to the operation, as long as the positive-face frames corresponding to the operation are If the face frame is located in the same target video frame, the recommended content can be obtained according to the face frame.
  • the above recommended content may include, but is not limited to, the music and video shown in FIG. 12 , and may also include news, for example.
  • the following describes how the electronic device performs content screening according to the attribute information and/or emotional information of the face frame when the operation performed by the electronic device includes displaying the recommended content, and obtains the recommended content of the face frame:
  • a content library can be preset in the electronic device, the content library includes several pieces of content, and several tags can be set for each piece of content in the content library, and the attribute information or emotional information corresponding to the piece of content is recorded in the tag;
  • the electronic device can, according to the attribute information and/or emotional information of the frontal face frame, search for the content that matches the attribute information and/or emotional information of the frontal face frame, and use the found content as the frontal face frame.
  • Recommended content corresponding to the face frame can be used.
  • attribute labels and emotion labels are respectively set.
  • the attribute information of the face frame is: female, child, and the emotional information is angry, then content 2 will be screened out as the recommended content of the face frame; if the attribute information of the face frame is : Male, youth, the emotional information is happy, content 1 will be selected as the recommended content for the face frame.
  • the content library can be set in the server, so that the above-mentioned screening of recommended content can be completed by the server.
  • the electronic device can connect to the server, and the electronic device can combine the attribute information of the face frame with the face frame. /or the emotional information is sent to the server, and the server performs content screening to obtain the recommended content corresponding to the frontal face frame, and sends the recommended content corresponding to the frontal face frame to the electronic device.
  • the recommended content can be displayed according to the face frames, for example, the recommended content for each face frame Allocate an area of a certain area and then display the corresponding recommended content.
  • area 101 is used to display the recommended content of frontal face frame 1
  • area 102 is used to display the frontal face.
  • the recommended content of the face frame 2; or, as shown in FIG. 13B , the recommended content of each front face frame can also be combined and displayed in the same area 103 .
  • the manner of displaying the recommended content in each area is not limited in this embodiment of the present application.
  • a list of names of the recommended content can be displayed, so that the user can select the content that the electronic device wants to display according to the name; or, if the recommended content is the songs in Table 1, Videos such as cartoons, period dramas, etc., can also play part of the video clips of each recommended content in sequence; and so on.
  • the number of pieces of content displayed in each area is not limited in the present application, and one piece of content shown in FIGS. 14A to 14C is only an example.
  • the area in the interface only displays the recommended content corresponding to one frontal face frame.
  • the electronic device detects the three frontal face frames exemplified in FIGS. 14A to 14C at the same time.
  • the recommended content corresponding to one frontal face frame can be displayed in three areas, for example, as shown in FIG. 14D , or three frontal face frames can be displayed in one area.
  • the corresponding recommended content is shown, for example, in FIG. 14E .
  • the method shown in FIG. 12 performs content screening according to the attribute information and/or emotional information of the frontal face frame, obtains the recommended content of the frontal face frame and displays it, thereby realizing the personalized recommendation for the user, making the display
  • the recommended content is more targeted, which improves the interaction effect between the electronic device and the user, and improves the user experience.
  • FIG. 15 is a schematic structural diagram of an embodiment of the electronic device of the present application.
  • the electronic device 1500 may include: a processor 1510, a camera 1520, and a display screen 1530; wherein,
  • the camera 1520 can be used to capture video streams
  • the processor 1510 may be configured to detect the gesture of the first user in the video stream when it is detected that the frontal face image of the first user appears in the video stream; identify the operation corresponding to the gesture; operation. If the operation corresponding to performing the gesture includes displaying an image or video, etc., the processor 1510 may also be configured to: control the display screen 1530 to display the image or video corresponding to the operation;
  • the display screen 1530 may be used to: display images or videos, and the like.
  • the processor 1510 may be configured to: when detecting that the frontal face image of the first user appears in the video stream, detect the gesture of the first user in the video stream, and, according to the first user's gesture
  • the frontal face image determines the attribute information and/or emotion information of the first user
  • the attribute information is used to record the attributes of the first user
  • the emotion information is used to record the emotion of the first user
  • the operation corresponding to the recognition gesture is a content recommendation operation
  • the interface corresponding to the recommendation operation where the recommended content is displayed, and the recommended content is obtained according to the attribute information and/or emotional information of the first user.
  • the display screen 1530 may be used to: display the interface corresponding to the gesture.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer causes the computer to execute the program shown in FIG. 5 , FIG. 8 , or FIG. 12 of the present application. Methods provided by the examples.
  • An embodiment of the present application further provides a computer program product, the computer program product includes a computer program, which, when running on a computer, causes the computer to execute the method provided by the embodiment shown in FIG. 5 , FIG. 8 , or FIG. 12 of the present application.
  • “at least one” refers to one or more, and “multiple” refers to two or more.
  • “And/or”, which describes the association relationship of the associated objects means that there can be three kinds of relationships, for example, A and/or B, which can indicate the existence of A alone, the existence of A and B at the same time, and the existence of B alone. where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • “At least one of the following” and similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, where a, b, c may be single, or Can be multiple.
  • any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory; hereinafter referred to as: ROM), random access memory (Random Access Memory; hereinafter referred to as: RAM), magnetic disk or optical disk and other various A medium on which program code can be stored.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or optical disk and other various A medium on which program code can be stored.

Abstract

一种设备控制方法和电子设备,在方法中,采集视频流,当检测到视频流中出现第一用户的正脸图像时,检测视频流中第一用户的手势,识别手势对应的操作,执行手势对应的操作。本申请能够减少电子设备由于用户手势产生的误触发。

Description

设备控制方法和电子设备 技术领域
本申请涉及智能终端技术领域,特别涉及设备控制方法和电子设备。
背景技术
目前,用户对智能电视等电子设备进行控制时,除了使用遥控器进行控制外,还可以由用户使用手势进行控制。
电子设备通过电子设备中设置的摄像头拍摄视频流,从视频流的视频帧中检测用户的手势,将用户的不同手势识别为不同的指令,进行相应的响应,实现用户使用手势对电子设备进行控制。但是,这种通过用户手势控制电子设备的方法存在较大概率的误触发问题。
发明内容
本申请提供了一种设备控制方法和电子设备,能够减少电子设备由于用户手势产生的误触发。
第一方面,本申请提供了一种设备控制方法,应用于电子设备,包括:采集视频流;当检测到视频流中出现第一用户的正脸图像时,检测视频流中第一用户的手势;识别手势对应的操作;执行手势对应的操作。该方法相对于现有技术中电子设备识别到用户的手势就进行响应,还需要检测第一用户的正脸图像,只对出现正脸图像的第一用户的手势进行响应,从而可以减少电子设备由于用户手势产生的误触发。
在一种可能的实现方式中,检测到视频流中出现第一用户的正脸图像,包括:检测到视频流的连续目标视频帧中包括第一用户的正脸图像。
在一种可能的实现方式中,检测到视频流的连续目标视频帧中包括第一用户的正脸图像,包括:对于连续目标视频帧中的每个目标视频帧,从目标视频帧中获取第一用户的人脸图像,计算人脸图像的偏航角,判断偏航角小于预设第一阈值。
在一种可能的实现方式中,检测到视频流的连续目标视频帧中包括第一用户的正脸图像,还包括:计算人脸图像的俯仰角和/或滚转角;判断俯仰角小于预设第二阈值,和/或,判断滚转角小于预设第三阈值。
在一种可能的实现方式中,检测到视频流中出现第一用户的正脸图像之前,还包括:响应于人脸设置指令,采集第一图像;显示第一图像,并在显示的第一图像上指示第一图像中的人脸图像;响应于人脸选择指令,将人脸选择指令指示的人脸图像设置为第一用户的人脸图像。
在一种可能的实现方式中,还包括:当检测到视频流中出现第二用户的正脸图像时,检测视频流中第二用户的手势;识别第二用户的手势对应的操作;执行第二用户的手势对应的操作。
在一种可能的实现方式中,还包括:判断同时检测到视频流中第一用户的手势和第二用户的手势,从第一用户的手势和第二用户的手势中选择一个手势;识别选择的手势对应的操作;执行选择的手势对应的操作。
在一种可能的实现方式中,检测第一用户的手势,包括:从包括第一用户的正脸图像的连续目标视频帧中检测包括第一用户的人手图像的连续目标视频帧;根据包括第一用户的人手图像的连续目标视频帧确定第一用户的人手图像的运动轨迹;根据第一用户的人手图像的运动轨迹确定第一用户的手势。
第二方面,本申请实施例提供一种设备控制方法,应用于电子设备,包括:采集视频流;当检测到视频流中出现第一用户的正脸图像时,检测视频流中第一用户的手势,并且,根据第一用户的正脸图像确定第一用户的属性信息和/或情绪信息;属性信息用于记录第一用户的属性,情绪信息用于记录第一用户的情绪;识别手势对应的操作;如果所述操作是内容推荐操作,显示内容推荐操作对应的界面,界面上显示有推荐内容,推荐内容根据第一用户的属性信息和/或情绪信息获取。该方法相对于现有技术中电子设备识别到用户的手势就进行响应,还需要检测第一用户的正脸图像,只对出现正脸图像的第一用户的手势进行响应,从而可以减少电子设备由于用户手势产生的误触发;而且,根据正脸人脸框的属性信息和/或情绪信息进行内容筛选,得到正脸人脸框的推荐内容并进行显示,从而实现了针对于用户的个性化推荐,使得显示的推荐内容更具有针对性,提升了电子设备与用户之间的交互效果,提升用户体验。
在一种可能的实现方式中,显示内容推荐操作对应的界面之前,还包括:将第一用户的属性信息和/或情绪信息发送至服务器;接收服务器响应于属性信息和/或情绪信息发送的第一信息,第一信息包括:与属性信息和/或情绪信息匹配的推荐内容。
在一种可能的实现方式中,检测到视频流中出现第一用户的正脸图像,包括:检测到视频流的连续目标视频帧中包括第一用户的正脸图像。
在一种可能的实现方式中,检测到视频流的连续目标视频帧中包括第一用户的正脸图像,包括:对于连续目标视频帧中的每个目标视频帧,从目标视频帧中获取第一用户的人脸图像,计算人脸图像的偏航角,判断偏航角小于预设第一阈值。
在一种可能的实现方式中,检测到视频流的连续目标视频帧中包括第一用户的正脸图像,还包括:计算人脸图像的俯仰角和/或滚转角;判断俯仰角小于预设第二阈值,和/或,判断滚转角小于预设第三阈值。
在一种可能的实现方式中,检测到视频流中出现第一用户的正脸图像之前,还包括:响应于人脸设置指令,采集第一图像;显示第一图像,并在显示的第一图像上指示第一图像中的人脸图像;响应于人脸选择指令,将人脸选择指令指示的人脸图像设置为第一用户的人脸图像。
在一种可能的实现方式中,还包括:当检测到视频流中出现第二用户的正脸图 像时,检测视频流中第二用户的手势;识别第二用户的手势对应的操作;执行第二用户的手势对应的操作。
在一种可能的实现方式中,还包括:判断同时检测到视频流中第一用户的手势和第二用户的手势,从第一用户的手势和第二用户的手势中选择一个手势;识别选择的手势对应的操作;执行选择的手势对应的操作。
在一种可能的实现方式中,检测第一用户的手势,包括:从包括第一用户的正脸图像的连续目标视频帧中检测包括第一用户的人手图像的连续目标视频帧;根据包括第一用户的人手图像的连续目标视频帧确定第一用户的人手图像的运动轨迹;根据第一用户的人手图像的运动轨迹确定第一用户的手势。
第三方面,本申请实施例提供一种电子设备,包括处理器和存储器,所述存储器用于存储计算机程序,当所述处理器执行所述计算机程序时,使得电子设备执行第一方面任意一项所述的方法。
第四方面,本申请实施例提供一种电子设备,包括处理器和存储器,所述存储器用于存储计算机程序,当所述处理器执行所述计算机程序时,使得电子设备执行第二方面任意一项所述的方法。
第五方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行第一方面任一项的方法。
第六方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行第二方面任一项的方法。
第七方面,本申请提供一种计算机程序,当计算机程序被计算机执行时,用于执行第一方面的方法。
在一种可能的设计中,第七方面中的程序可以全部或者部分存储在与处理器封装在一起的存储介质上,也可以部分或者全部存储在不与处理器封装在一起的存储器上。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本申请实施例电子设备的结构示意图;
图2为本申请实施例电子设备的软件结构框图;
图3A为本申请智能电视的一个界面示意图;
图3B为本申请智能电视中一个设置界面示意图;
图3C为本申请智能电视的一个界面示意图;
图4为本申请设备控制方法适用场景示意图;
图5为本申请设备控制方法一个实施例的流程图;
图6A为本申请智能电视待机状态示意图;
图6B为本申请智能电视开机后界面示意图;
图7A为本申请视频1播放画面示意图;
图7B为本申请视频选择界面示意图;
图8为本申请设备控制方法另一个实施例的流程图;
图9A为本申请人体框、人脸框和人手框示意图;
图9B为本申请像素坐标系建立方法示意图;
图10A为本申请预设人脸图像设置界面示意图;
图10B为本申请预设人脸图像选择界面示意图;
图11A为本申请摄像头坐标系建立方法示意图;
图11B为本申请人脸角度坐标系建立方法示意图;
图12为本申请设备控制方法又一个实施例的流程图;
图13A~图13B为本申请推荐内容显示方式示例图;
图14A~图14E为本申请推荐内容显示方式示例图;
图15为本申请电子设备另一个实施例的结构图。
具体实施方式
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
目前,通过用户手势控制电子设备的方法存在较大概率的误触发问题。具体来说,用户在日常生活中做出了某一手势,但是该手势并不是用户为了控制电子设备而做出的,电子设备却将其识别为用户针对于电子设备发出的指令,从而做出了响应,造成电子设备的误触发。
为此,本申请实施例提供一种设备控制方法和电子设备,能够减少电子设备由于用户手势产生的误触发。
图1示意了一种电子设备100的结构示意图。
电子设备100可以包括手机、可折叠电子设备、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备、或智慧城市设备中的至少一种。本申请实施例对该电子设备100的具体类型不作特殊限制。
电子设备100可以包括处理器110,内部存储器121,摄像模组193,以及显示屏194。可选地,如果电子设备100是智能电视等不具有通话功能的电子设备,电子设备100还可以包括:外部存储器接口120,通用串行总线(universal serial bus,USB)接头130,充电管理模块140,电源管理模块141,电池142,天线2,无线通信模块 160,音频模块170,扬声器170A,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,以及用户标识模块(subscriber identification module,SIM)卡接口195等。如果电子设备100是可以实现通话的设备,例如手机,电子设备100还可以包括:天线1,移动通信模块150,受话器170B等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器可以为高速缓冲存储器。该存储器可以保存处理器110用过或使用频率较高的指令或数据。如果处理器110需要使用该指令或数据,可从该存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。处理器110可以通过以上至少一种接口连接触摸传感器、音频模块、无线通信模块、显示器、摄像模组等模块。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
USB接头130是一种符合USB标准规范的接口,可以用于连接电子设备100和外围设备,具体可以是Mini USB接头,Micro USB接头,USB Type C接头等。USB接头130可以用于连接充电器,实现充电器为该电子设备100充电,也可以用于连接其他电子设备,实现电子设备100与其他电子设备之间传输数据。也可以用于连接耳 机,通过耳机输出电子设备中存储的音频。该接头还可以用于连接其他电子设备,例如VR设备等。在一些实施例中,通用串行总线的标准规范可以为USB1.x、USB2.0、USB3.x和USB4。
充电管理模块140用于接收充电器的充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接头130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像模组193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),蓝牙低功耗(bluetooth low energy,BLE),超宽带(ultra wide band,UWB),全 球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络和其他电子设备通信。该无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。该GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100可以通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或多个显示屏194。
电子设备100可以通过摄像模组193,ISP,视频编解码器,GPU,显示屏194以及应用处理器AP、神经网络处理器NPU等实现摄像功能。
摄像模组193可用于采集拍摄对象的彩色图像数据以及深度数据。ISP可用于处理摄像模组193采集的彩色图像数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将该电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像模组193中。
在一些实施例中,摄像模组193可以由彩色摄像模组和3D感测模组组成。
在一些实施例中,彩色摄像模组的摄像头的感光元件可以是电荷耦合器件(charge  coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。
在一些实施例中,3D感测模组可以是(time of flight,TOF)3D感测模块或结构光(structured light)3D感测模块。其中,结构光3D感测是一种主动式深度感测技术,结构光3D感测模组的基本零组件可包括红外线(Infrared)发射器、IR相机模等。结构光3D感测模组的工作原理是先对被拍摄物体发射特定图案的光斑(pattern),再接收该物体表面上的光斑图案编码(light coding),进而比对与原始投射光斑的异同,并利用三角原理计算出物体的三维坐标。该三维坐标中就包括电子设备100距离被拍摄物体的距离。其中,TOF 3D感测可以是主动式深度感测技术,TOF 3D感测模组的基本组件可包括红外线(Infrared)发射器、IR相机模等。TOF 3D感测模组的工作原理是通过红外线折返的时间去计算TOF 3D感测模组跟被拍摄物体之间的距离(即深度),以得到3D景深图。
结构光3D感测模组还可应用于人脸识别、体感游戏机、工业用机器视觉检测等领域。TOF 3D感测模组还可应用于游戏机、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)等领域。
在另一些实施例中,摄像模组193还可以由两个或更多个摄像头构成。这两个或更多个摄像头可包括彩色摄像头,彩色摄像头可用于采集被拍摄物体的彩色图像数据。这两个或更多个摄像头可采用立体视觉(stereo vision)技术来采集被拍摄物体的深度数据。立体视觉技术是基于人眼视差的原理,在自然光源下,透过两个或两个以上的摄像头从不同的角度对同一物体拍摄影像,再进行三角测量法等运算来得到电子设备100与被拍摄物之间的距离信息,即深度信息。
在一些实施例中,电子设备100可以包括1个或多个摄像模组193。具体的,电子设备100可以包括1个前置摄像模组193以及1个后置摄像模组193。其中,前置摄像模组193通常可用于采集面对显示屏194的拍摄者自己的彩色图像数据以及深度数据,后置摄像模组可用于采集拍摄者所面对的拍摄对象(如人物、风景等)的彩色图像数据以及深度数据。
在一些实施例中,处理器110中的CPU或GPU或NPU可以对摄像模组193所采集的彩色图像数据和深度数据进行处理。在一些实施例中,NPU可以通过骨骼点识别技术所基于的神经网络算法,例如卷积神经网络算法(CNN),来识别摄像模组193(具体是彩色摄像模组)所采集的彩色图像数据,以确定被拍摄人物的骨骼点。CPU或GPU也可来运行神经网络算法以实现根据彩色图像数据确定被拍摄人物的骨骼点。在一些实施例中,CPU或GPU或NPU还可用于根据摄像模组193(可以是3D感测模组)所采集的深度数据和已识别出的骨骼点来确认被拍摄人物的身材(如身体比例、骨骼点之间的身体部位的胖瘦情况),并可以进一步确定针对该被拍摄人物的身体美化参数,最终根据该身体美化参数对被拍摄人物的拍摄图像进行处理,以使得该拍摄图像中该被拍摄人物的体型被美化。后续实施例中会详细介绍如何基于摄像模组193 所采集的彩色图像数据和深度数据对被拍摄人物的图像进行美体处理,这里先不赘述。
数字信号处理器用于处理数字信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。或将音乐,视频等文件从电子设备传输至外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,该可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能方法或数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或输出免提通话的音频信号。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接头130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测该触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,控制镜头反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100根据气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。当电子设备为可折叠电子设备,磁传感器180D可以用于检测电子设备的折叠或展开,或折叠角度。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到的反射光的强度大于阈值时,可以确定电子设备100附近有物体。当检测到的反射光的强度 小于阈值时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L可以用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否被遮挡,例如电子设备在口袋里。当检测到电子设备被遮挡或在口袋里,可以使部分功能(例如触控功能)处于禁用状态,以防误操作。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当通过温度传感器180J检测的温度超过阈值,电子设备100执行降低处理器的性能,以便降低电子设备的功耗以实施热保护。在另一些实施例中,当通过温度传感器180J检测的温度低于另一阈值时,电子设备100对电池142加热。在其他一些实施例中,当温度低于又一阈值时,电子设备100可以对电池142的输出电压升压。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于该骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于该骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190可以包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示 消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或多个SIM卡接口。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图2是本申请实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为五层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime,ART)和原生C/C++库,硬件抽象层(Hardware Abstract Layer,HAL)以及内核层。
应用程序层可以包括一系列应用程序包。
如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,资源管理器,通知管理器,活动管理器,输入管理器等。
窗口管理器提供窗口管理服务(Window Manager Service,WMS),WMS可以用于窗口管理、窗口动画管理、surface管理以及作为输入系统的中转站。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。该数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
活动管理器可以提供活动管理服务(Activity Manager Service,AMS),AMS可以用于系统组件(例如活动、服务、内容提供者、广播接收器)的启动、切换、调度以及应用进程的管理和调度工作。
输入管理器可以提供输入管理服务(Input Manager Service,IMS),IMS可以用于管理系统的输入,例如触摸屏输入、按键输入、传感器输入等。IMS从输入设备节点取出事件,通过和WMS的交互,将事件分配至合适的窗口。
安卓运行时包括核心库和安卓运行时。安卓运行时负责将源代码转换为机器码。安卓运行时主要包括采用提前(ahead or time,AOT)编译技术和及时(just in time,JIT)编译技术。
核心库主要用于提供基本的Java类库的功能,例如基础数据结构、数学、IO、工具、数据库、网络等库。核心库为用户进行安卓应用开发提供了API。
原生C/C++库可以包括多个功能模块。例如:表面管理器(surface manager),媒体框架(Media Framework),libc,OpenGL ES、SQLite、Webkit等。
其中,表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。媒体框架支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。OpenGL ES提供应用程序中2D图形和3D图形的绘制和操作。SQLite为电子设备100的应用程序提供轻量级关系型数据库。
硬件抽象层运行于用户空间(user space),对内核层驱动进行封装,向上层提供调用接口。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合捕获拍照场景,示例性说明电子设备100软件以及硬件的工作流程。
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像模组193捕获静态图像或视频。
为了便于理解,本申请以下实施例将以具有图1和图2所示结构的电子设备为例,结合附图和应用场景,对本申请实施例提供的方法进行具体说明。
首先以电子设备是智能电视为例对本申请设备控制方法进行说明。
本申请设备控制方法可以作为电子设备的操作系统提供的功能,或者作为电子设备的第三方应用(application,App)提供的功能,假设该功能称为智能交互功能,用户可以从操作系统或者第三方应用中打开该智能交互功能所在界面。
例如图3A中以智能交互功能是操作系统提供的功能为例,用户选择智能电视显示的“设置”控件,进入设置功能。在设置功能中,用户可以查找到设置功能中提供的一个设置界面,例如图3B所示,设置界面中包括“智能交互功能”,用户选中 该“智能交互功能”对应的“开启”控件,启动该智能交互功能,相应的,智能电视接收到用户的智能交互触发指令,可以触发执行本申请设备控制方法。
例如在图3C中以智能交互功能是第三方应用“应用X”提供的功能为例,用户选择智能电视显示的“应用X”这一应用图标,打开应用X。在应用X中,用户可以查找到应用X中提供的一个设置界面,该设置界面的实现可以参考图3B,设置界面中包括“智能交互功能”,用户选中该“智能交互功能”对应的“开启”控件,启动该智能交互功能,相应的,智能电视接收到用户的智能交互触发指令,可以触发执行本申请设备控制方法。
应理解,以上仅是智能电视触发执行本申请设备控制方法的一种可能实现,如果智能电视的设置功能中或者上述应用X中默认该智能交互功能开启,则用户无需执行上述过程,智能电视也可以触发执行本申请设备控制方法。
如图4所示,智能电视触发执行本申请设备控制方法,启动摄像头开始进行视频拍摄,得到视频流;如果用户位于智能电视的摄像头拍摄范围内,正脸朝向智能电视,优选为正脸朝向智能电视的摄像头,根据用户自身希望智能电视执行的功能,做出功能对应的手势;相应的,摄像头拍摄到的视频流中包括用户的人体图像,智能电视可以对人体图像中的人脸图像是否是正脸图像进行判定,以确定用户的人脸是否是正脸朝向智能电视,判定人脸图像是正脸图像后,再根据人体图像中人手图像的运动轨迹识别出用户的手势。
应理解,拍摄视频流的摄像头可以是内置于电子设备中的摄像头,也可以是独立于电子设备之外的摄像头,本申请不作限定。拍摄视频流的摄像头可以设置于电子设备的正上方中间位置。例如图4所示,示出了内置于智能电视中的摄像头的可能设置位置为智能电视的正上方中间位置,摄像头的拍摄方向可以是水平向前。
智能电视中可以预设手势与智能电视可以被触发的操作之间的对应关系,智能电视可以被触发的操作可以包括但不限于换台、调整音量、或者界面切换等。智能电视识别手势对应的操作,执行手势对应的操作。例如,假设智能电视识别到手势对应的操作是切换至主界面,则智能电视将显示的界面切换至主界面。
本申请实施例设备控制方法中电子设备仅对用户正脸朝向电子设备时做出的手势进行响应,相对于现有技术中电子设备识别到用户的手势就进行响应,增加了用户正脸朝向电子设备的判断,从而可以减少电子设备由于用户手势产生的误触发问题。
参见图5,是本申请设备控制方法的流程示意图,包括:
步骤501:电子设备开启智能交互。
本步骤的实现可以参考图3A~图3C及其对应描述,这里不再赘述。步骤501为可选步骤。
步骤502:电子设备采集视频流。
电子设备可以启动摄像头进行拍摄,得到视频流。
步骤503:电子设备从视频流中进行手势识别。
本步骤可以包括:
对视频流中目标视频帧进行多任务检测,得到目标视频帧中第一用户的人体图像、人脸图像和人手图像;
对目标视频帧中第一用户的人脸图像进行正脸判别;
根据正脸判别通过的连续若干个目标视频帧识别第一用户的手势。
第一用户可以是目标视频帧中包括的任一人体图像对应的用户,也可以是预先指定的用户,例如电子设备的使用用户或者机主等。
步骤504:电子设备识别手势对应的操作,执行手势对应的操作。
电子设备中可以预设手势与电子设备的操作之间的对应关系,举例来说,假设电子设备是智能电视,第一手势与智能电视的操作“返回上一页面”对应,第二手势与智能电视的操作“打开智能电视”对应,则,
参见图6A所示,假设智能电视处于待机状态,如果步骤503中检测到第二手势,则参见图6B所示,智能电视被打开,智能电视显示开机后的主界面;
参见图7A所示,假设智能电视正在播放视频1,如果步骤503中检测到第一手势,则智能电视返回播放视频的上一界面,例如图7B所示的视频选择界面,视频选择界面中显示有包括视频1在内的若干个视频对应的选择控件。
在步骤503中,电子设备对目标视频帧中第一用户的人脸图像进行正脸判别,根据正脸判别通过的连续若干个目标视频帧识别第一用户的手势,进而执行手势匹配的操作,相对于现有技术中电子设备识别到用户的手势就进行响应,增加了用户正脸朝向电子设备的判断,从而可以减少电子设备由于用户手势产生的误触发问题。
以下通过图8对步骤503和步骤504的实现进行更为具体的说明。如图8所示,包括:
步骤801:电子设备针对于视频流中的每个目标视频帧进行多任务检测。
本步骤中的多任务检测可以包括:从目标视频帧中检测人体图像、人脸图像和人手图像,得到人体框、以及每个人体框中的人脸框和人手框。
目标视频帧是视频流中电子设备所需要处理的视频帧。电子设备可以将视频流中的每个视频帧均作为目标视频帧;或者,电子设备也可以按照某种预设规则将视频流中的部分视频帧作为目标视频帧,例如,预设规则可以是间隔1个视频帧,则电子设备可以将视频流中第1、3、5…个视频帧作为目标视频帧。
上述人体框用于标识从目标视频帧中检测到的人体图像,人脸框用于标识从目标视频帧中检测到的人脸图像,人手框用于标识从目标视频帧中检测到的人手图像。
人体框可以是目标视频帧中检测到的人体轮廓的外接框,人脸框可以是人体框中人脸轮廓的外接框,人手框可以是人体框中人手轮廓的外接框。例如图9A所示,示出了目标视频帧中某一个人体图像对应的人体框710、人脸框711和人手框712。
本申请实施例中可以基于目标视频帧建立像素坐标系,例如图9B所示,以用户观看目标视频帧的角度,将目标视频帧的左上角的顶点作为像素坐标系的原点,沿目标视频帧的上边框向右的方向是x轴正方向,沿目标视频帧的左边框向下的方向 是y轴正方向,基于该像素坐标系,电子设备可以确定目标视频帧中每个像素在像素坐标系中的坐标,以下称为像素坐标。
上述的人体框、人脸框和人手框可以是矩形框,电子设备可以通过矩形框对角顶点的像素坐标来记录一个矩形框。例如图9B中所示的矩形框ABCD,可以通过对角顶点AC、或者BD的像素坐标来记录矩形框ABCD。
电子设备中可以预设一预先训练好的用于检测人体框、人脸框和人手框的第一模型,第一模型的输入是目标视频帧,输出是目标视频帧中所包括的每个人体图像对应的人体框、人脸框和人手框。在进行第一模型的训练时,训练的样本可以是若干个包括人体图像的图片,图片中标出人体框,并且标出每个人体框中的人脸框和人手框,预设一深度学习网络,例如卷积神经网络(Convolutional Neural Networks,CNN),将训练样本输入深度学习网络进行模型训练,从而得到上述第一模型。
电子设备从一个目标视频帧中可以检测到的人体框的数量与目标视频帧中实际包括的人体图像的数量相关,可以是0个、1个或者多个。
以下的步骤802和步骤803用于实现电子设备对目标人脸框的正脸判定。
上述正脸判定用于确定目标人脸框中的人脸是正脸朝向电子设备,也即确定目标人脸框中的人脸图像是正脸图像。
上述目标人脸框可以是目标视频帧中检测到的人脸框中包括某一预设人脸图像的人脸框,也可以是目标视频帧中检测到的每个人脸框。
如果目标人脸框是人脸框中包括某一预设人脸图像的人脸框,电子设备中可以预设某一人脸图像,从而本步骤中电子设备首先确定目标视频帧中检测到的人脸框中存在包括该预设人脸图像的人脸框,得到目标人脸框,再对该目标人脸框进行正脸判定。
如果目标人脸框是目标视频帧中检测到的每个人脸框,电子设备可以依次将每个人脸框作为目标人脸框,进行正脸判定。
电子设备中预设某一人脸图像的过程可以在执行本步骤之前完成,在一种可能的实现方式中,用户可以在触发智能交互功能之前,为电子设备预设目标人脸图像,目标人脸图像可以是一个或者多个,具体数量本申请不作限定。目标人脸图像可以是电子设备的拥有者的人脸图像,或者经常使用电子设备的用户的人脸图像。电子设备可以为用户提供目标人脸图像的设置界面,例如图10A所示,设置界面中显示有“目标人脸图像设置”控件,用户通过点击电子设备中的“目标人脸图像设置”控件,电子设备启动摄像头,并显示目标人脸图像的选择界面,例如图10B所示,电子设备在目标人脸图像的选择界面中显示摄像头拍摄到的图像,并在图像中标识出检测出的若干个人脸图像,上述人脸图像可以通过人脸框来进行标识,用户可以从中选择一个人脸图像作为目标人脸图像,相应的,电子设备接收到用户的人脸选择指令,将该人脸选择指令指示的人脸图像设置为目标人脸图像。需要说明的是,用户可以随时进入该目标人脸图像的设置界面,对目标人脸图像进行增加、修改、删除等编辑操作,本申请不作限定。
步骤802:电子设备对目标视频帧中目标人脸框标识的人脸图像进行状态检测。
上述状态检测用于检测人脸图像是正脸图像的程度,也即人脸图像中的人脸正脸朝向电子设备的程度。上述状态检测结果可以包括:人脸图像的偏航角,优选地,为了提高后续步骤中正脸判定的精确度,状态检测结果还可以包括:人脸图像的俯仰角,和/或,滚转角。例如在图8中以状态检测结果包括人脸图像的偏航角、俯仰角、以及滚转角为例。
参见图11A所示,可以预先建立摄像头坐标系,例如图11A中以摄像头的物理中心点作为原点,以经过原点水平指向摄像背面为正方向,用户观看方向与正方向一致,将经过原点水平向左方向作为x轴正方向ox,将经过原点水平向上的方向作为y轴正方向oy,将经过原点水平向摄像头背面正前方的方向作为z轴正方向oz。
参见图11B所示,可以预先建立人脸角度坐标系,在图11B中以人的头部的中心点为原点o,从人自身面向前方的视角来说,经过原点水平向左的方向为x轴正方向ox’,经过原点垂直向上的方向为y轴正方向oy’,经过原点水平向前的方向为z轴正方向oz’。俯仰角是指:沿ox’的旋转角度Pitch,滚转角是指:沿oz’的旋转角度Roll,偏航角是指:沿oy’的旋转角度Yaw。
由于人脸角度坐标系和摄像头坐标系只有平移关系,因此基于人脸角度坐标系计算得到的人脸图像的俯仰角、滚转角和偏航角,转换至摄像头坐标系中角度值不变。因此,可以基于人脸角度坐标系计算得到的人脸图像的上述角度值来表示摄像头拍摄到的视频帧中的人脸图像是正脸图像的程度。
电子设备中可以预设角度回归模型,角度回归模型用于检测人脸图像的俯仰角、滚转角和偏航角,角度回归模型的输入是人脸图像,输出是人脸图像的俯仰角、滚转角和偏航角,电子设备可以将目标人脸框中的人脸图像从目标视频帧中截取出来,将人脸图像输入预设的角度回归模型,得到人脸图像的俯仰角、滚转角和偏航角。角度回归模型可以预先训练得到,训练的方法可以包括:以标注有俯仰角、滚转角和偏航角的人脸图像作为样本,初始模型可以是卷积神经网络(Convolutional Neural Networks,CNN),将上述样本输入初始模型进行训练,初始模型可以学习人脸图像中的人脸关键特征,例如两眉间距、鼻子在人脸图像中的位置、嘴在人脸图像中的位置等,得到角度回归模型。
步骤803:电子设备判断目标人脸框的状态检测结果是否在预设阈值内。
如果目标人脸框的状态检测结果仅包括偏航角,本步骤可以包括:电子设备判断目标人脸框中人脸图像的偏航角小于预设第一阈值;
如果目标人脸框的状态检测结果还包括人脸图像的俯仰角和/或滚转角,本步骤还可以包括:电子设备判断目标人脸框中人脸图像的俯仰角小于预设第二阈值,和/或,判断目标人脸框中人脸图像的滚转角小于预设第三阈值。
上述第一阈值、第二阈值、第三阈值是判定目标人脸框中的人脸图像是否是正脸图像的条件。上述第一阈值、第二阈值、第三阈值的具体取值本申请实施例不作限定,上述阈值越小,则目标人脸框中的人脸越正对电子设备。举例来说,第一阈值、第二阈值、第三阈值可以分别取值为15度。
如果目标视频帧中的目标人脸框是多个,则本步骤中将依次对每个目标人脸框 执行上述判断,电子设备判断目标视频帧中的至少一个目标人脸框的状态检测结果在对应的阈值内,说明该目标人脸框中的人脸图像是正脸图像,则执行步骤804,否则,返回步骤801,电子设备对下一个目标视频帧进行处理。
为了便于描述,以下将本步骤中包括的人脸图像是正脸图像的目标人脸框称为正脸人脸框。
步骤804:电子设备针对于目标视频帧中的正脸人脸框,确定正脸人脸框对应的人手框。
正脸人脸框对应的人手框是指:与正脸人脸框属于同一人体框的人手框。例如图9A中的人脸框711和人手框712是属于同一人体框的人脸框和人手框,因此,人脸框711对应的人手框是人手框712。
步骤805:电子设备确定正脸人脸框对应的人手框的运动轨迹。
以下,将本步骤中目标视频帧中的任一正脸人脸框称为第一正脸人脸框,第一正脸人脸框对应的人手框称为第一人手框,说明第一人手框的运动轨迹的确定方法。本步骤可以包括:
电子设备获取连续的若干个第一目标视频帧,第一目标视频帧中包括第二目标视频帧和第三目标视频帧;其中,第二目标视频帧是步骤405中第一正脸人脸框所属的目标视频帧,第三目标视频帧是位于第二目标视频帧之前的目标视频帧,第三目标视频帧中包括第二正脸人脸框,第二正脸人脸框是与第一正脸人脸框匹配的正脸人脸框;
电子设备从第三目标视频帧中获取连续的若干个第四目标视频帧,第四目标视频帧中包括第二人手框,第二人手框是与第一人手框匹配的人手框;
根据第二目标视频帧中第一人手框的像素坐标、以及第四目标视频帧中第二人手框的像素坐标确定第一人手框的运动轨迹。
其中,确定某一目标视频帧包括第二正脸人脸框的方法举例如下:
从目标视频帧的正脸人脸框中获得指定点与第一正脸人脸框的指定点之间的距离最小的正脸人脸框;如果该正脸人脸框对应的上述距离小于预设阈值,则该正脸人脸框是与第一正脸人脸框匹配的正脸人脸框,也即第二正脸人脸框,否则,该目标视频帧不包括第二正脸人脸框。其中,目标视频帧的正脸人脸框的指定点和第一正脸人脸框的指定点是人脸框中位置相同的点。两个指定点之间的距离可以根据两个指定点的像素坐标计算得到。
可以为电子设备获取到的第一目标视频帧的数量设置上限值,也即每次获取到的第一目标视频帧的最大数量,例如假设最大数量为5,则电子设备最多获取5帧第一目标视频帧。
其中,确定某一目标视频帧包括第一人手框的方法举例如下:
从目标视频帧的人手框中获得指定点与第一人手框的指定点之间的距离最小的人手框;如果该人手框对应的上述距离小于预设阈值,则该人手框是第二人手框,否则,该目标视频帧不包括第二人手框。其中,目标视频帧的人手框的指定点和第一人手框的指定点是人手框中位置相同的点。两个指定点之间的距离可以根据两个 指定点的像素坐标计算得到。
其中,在确定第一人手框的运动轨迹时,可以基于第一人手框的指定点和第二人手框的指定点在每个目标视频帧中的像素坐标来确定,该指定点可以是人手框中的任意点,优选为人手框的某一顶点、或者中心点等。第一人手框的指定点和第二人手框的指定点是人手框中位置相同的点。
需要说明的是,第一目标视频帧可以是未触发功能的目标视频帧,也即是说,如果电子设备已经根据一目标视频帧中的人脸框和人手框实现手势指令的识别,进而触发过某一操作,则该目标视频帧不再作为该人脸框对应的第一目标视频帧。
步骤806:电子设备将人手框的运动轨迹与预设的手势进行匹配,如果一个人手框的运动轨迹与预设的某一手势匹配成功,执行步骤807,否则,返回步骤801,电子设备对下一个目标视频帧进行处理。
电子设备中可以预设不同的手势,上述的手势对于用户可以体现为用户的左右挥手、上下挥手等,而对于电子设备而言,手势可以表示为运动轨迹的特征,例如,第一手势可以设置为:运动轨迹的左右宽度达到参考宽度的第一倍数,第二手势可以设置为:运动轨迹的上下高度达到参考高度的第二倍数,等等。
电子设备将第一人手框的运动轨迹与一预设手势匹配,可以包括:
电子设备根据第一人手框的宽度计算参考宽度;
判断第一人手框的运动轨迹的左右边界点之间的宽度达到参考宽度的第一倍数,则第一人手框的运动轨迹与第一预设手势匹配。
以上计算参考宽度可以通过计算第一人手框的宽度的平均值或者中位数等方式实现。
第一倍数的具体取值本申请实施例不作限定,其与电子设备要求的用户左右挥手的动作幅度有关,第一倍数越大,说明电子设备要求用户左右挥手的动作幅度越大,才能检测到用户做出左右挥手的手势。
或者,电子设备将第一人手框的运动轨迹与一预设手势匹配,可以包括:
电子设备根据第一人手框的高度计算参考高度;
判断第一人手框的运动轨迹的上下边界点之间的高度达到参考高度的第二倍数,则第一人手框的运动轨迹与第二预设手势匹配。
以上计算参考高度可以通过计算第一人手框的高度的平均值或者中位数等方式实现。
第二倍数的具体取值本申请实施例不作限定,其与电子设备要求的用户上下挥手的动作幅度有关,第二倍数越大,说明电子设备要求用户上下挥手的动作幅度越大,才能检测到用户做出上下挥手的手势。
步骤807:电子设备识别手势对应的操作,执行手势对应的操作。
需要说明的是,如果步骤802中将目标视频帧中的每个人脸框都确定为目标人脸框,则步骤803中可能存在多个人脸框是正脸人脸框,相应的,步骤806中可能会得到多个正脸人脸框对应的手势。如果电子设备按照上述图8所示的流程对于目标视频帧中的每个人脸框依次执行上述步骤,如果目标视频帧的人脸框中存在多个 正脸人脸框,可以依次识别每个正脸人脸框对应的手势,执行该手势对应的操作,举例来说,假设目标视频帧中存在正脸人脸框1和正脸人脸框2,可以先识别正脸人脸框1对应的手势1,执行手势1对应的操作1,之后,再识别正脸人脸框2对应的手势2,执行手势2对应的操作2。如果电子设备对于多个人脸框并行执行上述步骤,如果目标视频帧的人脸框中存在多个正脸人脸框,而电子设备同时识别到多个正脸人脸框对应的手势,那么,可以从中选择一个正脸人脸框对应的手势,执行该手势对应的操作,具体选择哪个正脸人脸框对应的手势,也即选择针对于哪个用户的手势进行响应,本申请不作限定,例如:如果正脸人脸框中存在包括预设人脸图像的正脸人脸框,则选择执行该正脸人脸框的手势对应的操作,如果正脸人脸框中不存在包括预设人脸图像的正脸人脸框,可以选择多个手势中起始时刻更早的手势,也即多个手势对应的连续目标视频帧中,起始目标视频帧更早的连续目标视频帧对应的手势。
在本申请实施例提供的另一个实施例中,在前述方法的基础上,电子设备还检测正脸人脸框的属性信息和/或情绪信息,如果执行用户的手势对应的操作为显示界面,而且显示的界面中显示推荐内容,可以根据正脸人脸框的属性信息和/或情绪信息来进行推荐内容的获取。从而使得推荐内容更具有针对性,提升电子设备与用户之间的交互效果,提升用户体验。
如图12所示,该方法可以包括:
步骤1201:电子设备采集视频流。
步骤1202:电子设备针对于视频流中的每个目标视频帧进行多任务检测。
步骤1203:电子设备对目标视频帧中目标人脸框标识的人脸图像进行状态检测。
步骤1204:电子设备判断目标人脸框的状态检测结果是否在预设阈值内。
如果目标视频帧中的目标人脸框是1个,则本步骤中电子设备判断目标人脸框的状态检测结果在对应的阈值内,说明该目标人脸框中的人脸图像是正脸图像,则执行步骤1205和1208,否则,返回步骤1202,电子设备对下一个目标视频帧进行处理。
如果目标视频帧中的目标人脸框是多个,则本步骤中将依次对每个目标人脸框执行上述判断,对于每个目标人脸框,如果电子设备判断该目标人脸框的状态检测结果在对应的阈值内,说明该目标人脸框中的人脸图像是正脸图像,则执行步骤1205,否则对下一个目标人脸框进行本步骤的判断,如果所有的目标人脸框的判断结果均为否,返回步骤1202,电子设备对下一个目标视频帧进行处理。
步骤1205:电子设备针对于目标视频帧中的正脸人脸框,确定正脸人脸框对应的人手框。
步骤1206:电子设备确定正脸人脸框对应的人手框的运动轨迹。
步骤1207:电子设备将人手框的运动轨迹与预设的手势进行匹配,如果一个人手框的运动轨迹与预设的某一手势匹配成功,执行步骤1209,如果均未匹配成功,返回步骤1202,电子设备对下一个目标视频帧进行处理。
步骤1208:电子设备检测正脸人脸框的属性信息和/或情绪信息,执行步骤1209。
步骤1209:电子设备识别正脸人脸框的手势对应的操作,如果上述操作是内容推荐操作,显示该操作对应的界面,上述界面上显示有推荐内容,推荐内容根据正脸人脸框的属性信息和/或情绪信息获取。
其中,上述步骤1201~步骤1207的实现可以参考前述实施例中的对应描述,这里不赘述。
针对于步骤1208说明如下:
电子设备中可以预设用于检测人脸的情绪信息的第二模型;第二模型的输入可以是人脸图像,例如正脸人脸框中的人脸图像,输出是情绪信息可能取值的概率值;例如,预设情绪信息的取值为:高兴、生气、悲伤、中性、惊讶、厌恶、恐惧,则输出的是每一种取值的概率值,相应的,将概率值最高的情绪取值作为第二模型输出的人脸的情绪信息,举例来说,假设某一人脸图像输入第二模型后,第二模型输出的各个情绪信息的取值对应的概率值中高兴的概率值最高,则该人脸图像的情绪信息为高兴。在进行第二模型的训练时,训练样本可以是:标注有情绪信息的取值的人脸图像,将训练样本输入预设的深度学习网络进行训练,得到第二模型。
电子设备中可以预设用于检测人脸的属性信息的第三模型,第三模型的输入可以是人脸图像,例如正脸人脸框中的人脸图像,输出是属性信息可能取值的概率值;例如,预设属性信息可以包括性别和年龄区间,性别的取值可以为:男、女,年龄区间的取值可以为:儿童、青年、中年、老年。进行第二模型的训练时,训练样本可以是:标注有属性信息的取值的人脸图像,将训练样本输入预设的深度学习网络进行训练,得到第三模型。
基于以上预设的模型,将正脸人脸框中的人脸图像输入上述第二模型和/或第三模型,可以得到正脸人脸框的属性信息和/或情绪信息。
针对于步骤1209说明如下:
电子设备中可以预设手势与电子设备的操作之间的对应关系,相应的,本步骤中可以基于该对应关系识别手势对应的操作。举例来说,假设电子设备是智能电视,智能电视可以被触发的操作可以包括但不限于换台、调整音量、或者界面切换等。如果电子设备执行某一操作后显示一界面,且该界面中显示有推荐内容,本申请中将该种操作称为内容推荐操作。需要说明的是,上述显示的界面除了显示推荐内容外,还可以显示其他内容,本申请不作限定。
在步骤1209中,如果手势对应的操作是内容推荐操作,执行该操作后显示的界面中具有显示推荐内容的控件,那么电子设备根据正脸人脸框的属性信息和/或情绪信息进行推荐内容筛选,得到正脸人脸框对应的推荐内容,在界面中显示该推荐内容。
在一种可能的实现方式中,推荐内容可以仅根据操作对应的正脸人脸框的属性信息和/或情绪信息获取,例如,如果电子设备识别正脸人脸框1的手势对应的操作,显示该操作对应的界面,则界面中显示的推荐内容可以仅根据正脸人脸框1的属性信息和/或情绪信息获取;
在另一种可能的实现方式中,推荐内容可以根据同一目标视频帧中多个正脸人脸框的属性信息和/或情绪信息获取,例如,如果电子设备识别正脸人脸框1的手势对应的操作,显示该操作对应的界面,电子设备确定正脸人脸框1所在目标视频帧中是否存在其他正脸人脸框,如果正脸人脸框1所在的目标视频帧中识别出正脸人脸框1和正脸人脸框2,则界面中显示的推荐内容可以根据正脸人脸框1和正脸人脸框2的属性信息和/或情绪信息获取。在一种可能的实现方式中,与操作对应的正脸人脸框位于同一目标视频帧的其他正脸人脸框可以是未识别出其手势的正脸人脸框,只要与操作对应的正脸人脸框位于同一目标视频帧,就可以根据该正脸人脸框获取推荐内容。
上述推荐内容可以包括但不限于图12中所示的音乐和视频,例如还可以包括新闻等。
以下说明电子设备在执行的操作中包括显示推荐内容时,如何根据正脸人脸框的属性信息和/或情绪信息进行内容筛选,得到正脸人脸框的推荐内容:
电子设备中可以预设内容库,内容库中包括若干条内容,可以为内容库中每一条内容设置若干个标签,标签中记录该条内容对应的属性信息,或者情绪信息;
相应的,电子设备可以根据正脸人脸框的属性信息和/或情绪信息,查找标签与正脸人脸框的属性信息和/或情绪信息匹配的内容,将查找到的内容作为正脸人脸框对应的推荐内容。
举例来说,电子设备中存储有n条内容,分别设置有属性标签和情绪标签,属性标签记录内容对应的人脸属性,情绪标签记录内容对应的人脸情绪,例如下表1所示。
Figure PCTCN2022083654-appb-000001
表1
则,如果正脸人脸框的属性信息为:女、儿童,情绪信息为生气,则内容2将被筛选出来,作为正脸人脸框的推荐内容;如果正脸人脸框的属性信息为:男、青年,情绪信息为高兴,内容1将被筛选出来,作为正脸人脸框的推荐内容。
在另一种可能的实现方式中,内容库可以设置于服务器中,从而上述推荐内容的 筛选可以由服务器完成,此时,电子设备可以连接服务器,电子设备将正脸人脸框的属性信息和/或情绪信息发送至服务器,由服务器进行内容筛选,得到正脸人脸框对应的推荐内容,将正脸人脸框对应的推荐内容发送至电子设备。
在界面上对推荐内容进行显示时,如果目标视频帧中包括多个正脸人脸框,可以按照正脸人脸框分别进行推荐内容的显示,例如为每个正脸人脸框的推荐内容分配一定面积的区域进而显示对应的推荐内容,如图13A所示以2个正脸人脸框为例,区域101用于显示正脸人脸框1的推荐内容,区域102用于显示正脸人脸框2的推荐内容;或者,如图13B所示,也可以将各个正脸人脸框的推荐内容合并显示于同一区域103。
在每个区域显示推荐内容的方式本申请实施例不作限定,例如:可以显示推荐内容的名称列表,供用户按照名称选择希望电子设备显示的内容;或者,如果推荐内容是表1中的歌曲、动画片、年代剧等视频,也可以依次播放每一条推荐内容的部分视频片段;等等。
举实例来说:
以电子设备中预设表1所示的对应关系为例,则,
参见图14A所示,如果正脸人脸框的属性信息为:青年、女性,情绪信息为:高兴,则可以筛选出内容2,则可以在对应区域显示内容2;
参见图14B所示,如果正脸人脸框的属性信息为:中年、男性,情绪信息为:悲伤,则可以筛选出内容1,则可以在对应区域显示内容1;
参见图14C所示,如果正脸人脸框的属性信息为:儿童、女性,情绪信息为:高兴,则可以筛选出内容5,则可以在对应区域显示内容5。
需要说明的是,每个区域显示的内容的条数本申请不作限定,图14A~图14C中显示1条仅为示例。上述图14A~图14C中以界面中的区域仅显示1个正脸人脸框对应的推荐内容为例,假设电子设备同时检测到图14A~图14C中举例的3个正脸人脸框,则可以参考图13A所示的方式,分别在3个区域显示1个正脸人脸框对应的推荐内容,例如图14D所示,或者,也可以在1个区域显示3个正脸人脸框对应的推荐内容,例如图14E所示。
图12所示的方法根据正脸人脸框的属性信息和/或情绪信息进行内容筛选,得到正脸人脸框的推荐内容并进行显示,从而实现了针对于用户的个性化推荐,使得显示的推荐内容更具有针对性,提升了电子设备与用户之间的交互效果,提升用户体验。
可以理解的是,上述实施例中的部分或全部步骤或操作仅是示例,本申请实施例还可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照上述实施例呈现的不同的顺序来执行,并且有可能并非要执行上述实施例中的全部操作。
图15为本申请电子设备一个实施例的结构示意图,如图15所示,该电子设备1500可以包括:处理器1510、摄像头1520、以及显示屏1530;其中,
摄像头1520可以用于采集视频流;
在一种可能的实现方式中,处理器1510可以用于当检测到视频流中出现第一用户的正脸图像时,检测视频流中第一用户的手势;识别手势对应的操作;执行手势对应的操作。如果执行手势对应的操作包括显示图像或视频等,处理器1510还可以用于:控制显示屏1530显示操作对应的图像或视频;
显示屏1530可以用于:显示图像或视频等。
在另一种可能的实现方式中,处理器1510可以用于:当检测到视频流中出现第一用户的正脸图像时,检测视频流中第一用户的手势,并且,根据第一用户的正脸图像确定第一用户的属性信息和/或情绪信息;属性信息用于记录第一用户的属性,情绪信息用于记录第一用户的情绪;识别手势对应的操作是内容推荐操作;显示内容推荐操作对应的界面,界面上显示有推荐内容,推荐内容根据第一用户的属性信息和/或情绪信息获取。
显示屏1530可以用于:显示手势对应的界面。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行本申请图5、图8、或图12所示实施例提供的方法。
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括计算机程序,当其在计算机上运行时,使得计算机执行本申请图5、图8、或图12所示实施例提供的方法。
本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示单独存在A、同时存在A和B、单独存在B的情况。其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项”及其类似表达,是指的这些项中的任意组合,包括单项或复数项的任意组合。例如,a,b和c中的至少一项可以表示:a,b,c,a和b,a和c,b和c或a和b和c,其中a,b,c可以是单个,也可以是多个。
本领域普通技术人员可以意识到,本文中公开的实施例中描述的各单元及算法步骤,能够以电子硬件、计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,任一功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、 移动硬盘、只读存储器(Read-Only Memory;以下简称:ROM)、随机存取存储器(Random Access Memory;以下简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。本申请的保护范围应以所述权利要求的保护范围为准。

Claims (21)

  1. 一种设备控制方法,应用于电子设备,其特征在于,包括:
    采集视频流;
    当检测到所述视频流中出现第一用户的正脸图像时,检测所述视频流中所述第一用户的手势;
    识别所述手势对应的操作;
    执行所述手势对应的操作。
  2. 根据权利要求1所述的方法,其特征在于,所述检测到所述视频流中出现第一用户的正脸图像,包括:
    检测到所述视频流的连续目标视频帧中包括所述第一用户的正脸图像。
  3. 根据权利要求2所述的方法,其特征在于,所述检测到所述视频流的连续目标视频帧中包括所述第一用户的正脸图像,包括:
    对于连续目标视频帧中的每个目标视频帧,从所述目标视频帧中获取所述第一用户的人脸图像,计算所述人脸图像的偏航角,判断所述偏航角小于预设第一阈值。
  4. 根据权利要求3所述的方法,其特征在于,所述检测到所述视频流的连续目标视频帧中包括所述第一用户的正脸图像,还包括:
    计算所述人脸图像的俯仰角和/或滚转角;
    判断所述俯仰角小于预设第二阈值,和/或,判断所述滚转角小于预设第三阈值。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述检测到所述视频流中出现第一用户的正脸图像之前,还包括:
    响应于人脸设置指令,采集第一图像;
    显示所述第一图像,并在显示的所述第一图像上指示所述第一图像中的人脸图像;
    响应于人脸选择指令,将所述人脸选择指令指示的人脸图像设置为所述第一用户的人脸图像。
  6. 根据权利要求1至3任一项所述的方法,其特征在于,还包括:
    当检测到所述视频流中出现第二用户的正脸图像时,检测所述视频流中所述第二用户的手势;
    识别所述第二用户的手势对应的操作;
    执行所述第二用户的手势对应的操作。
  7. 根据权利要求6所述的方法,其特征在于,还包括:
    判断同时检测到所述视频流中所述第一用户的手势和所述第二用户的手势,从所述第一用户的手势和所述第二用户的手势中选择一个手势;
    识别选择的所述手势对应的操作;
    执行选择的所述手势对应的操作。
  8. 根据权利要求2至4任一项所述的方法,其特征在于,所述检测所述第一用户的手势,包括:
    从包括所述第一用户的正脸图像的连续目标视频帧中检测包括所述第一用户的人手图像的连续目标视频帧;
    根据包括所述第一用户的人手图像的连续目标视频帧确定所述第一用户的人手图像的运动轨迹;
    根据所述第一用户的人手图像的运动轨迹确定所述第一用户的手势。
  9. 一种设备控制方法,应用于电子设备,其特征在于,包括:
    采集视频流;
    当检测到所述视频流中出现第一用户的正脸图像时,检测所述视频流中所述第一用户的手势,并且,根据所述第一用户的正脸图像确定所述第一用户的属性信息和/或情绪信息;所述属性信息用于记录所述第一用户的属性,所述情绪信息用于记录所述第一用户的情绪;
    识别所述手势对应的操作;
    如果所述操作是内容推荐操作,显示所述内容推荐操作对应的界面,所述界面上显示有推荐内容,所述推荐内容根据所述第一用户的属性信息和/或情绪信息获取。
  10. 根据权利要求9所述的方法,其特征在于,所述显示所述内容推荐操作对应的界面之前,还包括:
    将所述第一用户的属性信息和/或情绪信息发送至服务器;
    接收所述服务器响应于所述属性信息和/或情绪信息发送的第一信息,所述第一信息包括:与所述属性信息和/或情绪信息匹配的推荐内容。
  11. 根据权利要求9所述的方法,其特征在于,所述检测到所述视频流中出现第一用户的正脸图像,包括:
    检测到所述视频流的连续目标视频帧中包括所述第一用户的正脸图像。
  12. 根据权利要求11所述的方法,其特征在于,所述检测到所述视频流的连续目标视频帧中包括所述第一用户的正脸图像,包括:
    对于连续目标视频帧中的每个目标视频帧,从所述目标视频帧中获取所述第一用户的人脸图像,计算所述人脸图像的偏航角,判断所述偏航角小于预设第一阈值。
  13. 根据权利要求12所述的方法,其特征在于,所述检测到所述视频流的连续目标视频帧中包括所述第一用户的正脸图像,还包括:
    计算所述人脸图像的俯仰角和/或滚转角;
    判断所述俯仰角小于预设第二阈值,和/或,判断所述滚转角小于预设第三阈值。
  14. 根据权利要求9至13任一项所述的方法,其特征在于,所述检测到所述视频流中出现第一用户的正脸图像之前,还包括:
    响应于人脸设置指令,采集第一图像;
    显示所述第一图像,并在显示的所述第一图像上指示所述第一图像中的人脸图像;
    响应于人脸选择指令,将所述人脸选择指令指示的人脸图像设置为所述第一用户的人脸图像。
  15. 根据权利要求9至13任一项所述的方法,其特征在于,还包括:
    当检测到所述视频流中出现第二用户的正脸图像时,检测所述视频流中所述第二用户的手势;
    识别所述第二用户的手势对应的操作;
    执行所述第二用户的手势对应的操作。
  16. 根据权利要求15所述的方法,其特征在于,还包括:
    判断同时检测到所述视频流中所述第一用户的手势和所述第二用户的手势,从所述第一用户的手势和所述第二用户的手势中选择一个手势;
    识别选择的所述手势对应的操作;
    执行选择的所述手势对应的操作。
  17. 根据权利要求11至13任一项所述的方法,其特征在于,所述检测所述第一用户的手势,包括:
    从包括所述第一用户的正脸图像的连续目标视频帧中检测包括所述第一用户的人手图像的连续目标视频帧;
    根据包括所述第一用户的人手图像的连续目标视频帧确定所述第一用户的人手图像的运动轨迹;
    根据所述第一用户的人手图像的运动轨迹确定所述第一用户的手势。
  18. 一种电子设备,其特征在于,包括处理器和存储器,
    所述存储器用于存储计算机程序,当所述处理器执行所述计算机程序时,使得电子设备执行权利要求1至8任一项所述的方法。
  19. 一种电子设备,其特征在于,包括处理器和存储器,
    所述存储器用于存储计算机程序,当所述处理器执行所述计算机程序时,使得电子设备执行权利要求9至17任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行权利要求1至8任一项所述的方法。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行权利要求9至17任一项所述的方法。
PCT/CN2022/083654 2021-04-19 2022-03-29 设备控制方法和电子设备 WO2022222705A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22790814.2A EP4310725A1 (en) 2021-04-19 2022-03-29 Device control method and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110420844.1A CN115223236A (zh) 2021-04-19 2021-04-19 设备控制方法和电子设备
CN202110420844.1 2021-04-19

Publications (1)

Publication Number Publication Date
WO2022222705A1 true WO2022222705A1 (zh) 2022-10-27

Family

ID=83605633

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083654 WO2022222705A1 (zh) 2021-04-19 2022-03-29 设备控制方法和电子设备

Country Status (3)

Country Link
EP (1) EP4310725A1 (zh)
CN (1) CN115223236A (zh)
WO (1) WO2022222705A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523502A (zh) * 2011-12-15 2012-06-27 四川长虹电器股份有限公司 智能电视交互系统及交互方法
CN105700683A (zh) * 2016-01-12 2016-06-22 厦门施米德智能科技有限公司 一种智能窗及其控制方法
CN105721936A (zh) * 2016-01-20 2016-06-29 中山大学 一种基于情景感知的智能电视节目推荐系统
CN108960163A (zh) * 2018-07-10 2018-12-07 亮风台(上海)信息科技有限公司 手势识别方法、装置、设备和存储介质
CN110705356A (zh) * 2019-08-31 2020-01-17 深圳市大拿科技有限公司 功能控制方法及相关设备
CN111291671A (zh) * 2020-01-23 2020-06-16 深圳市大拿科技有限公司 手势控制方法及相关设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523502A (zh) * 2011-12-15 2012-06-27 四川长虹电器股份有限公司 智能电视交互系统及交互方法
CN105700683A (zh) * 2016-01-12 2016-06-22 厦门施米德智能科技有限公司 一种智能窗及其控制方法
CN105721936A (zh) * 2016-01-20 2016-06-29 中山大学 一种基于情景感知的智能电视节目推荐系统
CN108960163A (zh) * 2018-07-10 2018-12-07 亮风台(上海)信息科技有限公司 手势识别方法、装置、设备和存储介质
CN110705356A (zh) * 2019-08-31 2020-01-17 深圳市大拿科技有限公司 功能控制方法及相关设备
CN111291671A (zh) * 2020-01-23 2020-06-16 深圳市大拿科技有限公司 手势控制方法及相关设备

Also Published As

Publication number Publication date
CN115223236A (zh) 2022-10-21
EP4310725A1 (en) 2024-01-24

Similar Documents

Publication Publication Date Title
JP7391102B2 (ja) ジェスチャ処理方法およびデバイス
CN112130742B (zh) 一种移动终端的全屏显示方法及设备
CN113645351B (zh) 应用界面交互方法、电子设备和计算机可读存储介质
CN110495819B (zh) 机器人的控制方法、机器人、终端、服务器及控制系统
WO2021000881A1 (zh) 一种分屏方法及电子设备
WO2021104485A1 (zh) 一种拍摄方法及电子设备
WO2020029306A1 (zh) 一种图像拍摄方法及电子设备
WO2021052139A1 (zh) 手势输入方法及电子设备
US20220262035A1 (en) Method, apparatus, and system for determining pose
WO2021008589A1 (zh) 一种应用的运行方法及电子设备
WO2022206494A1 (zh) 目标跟踪方法及其装置
WO2022095744A1 (zh) Vr显示控制方法、电子设备及计算机可读存储介质
CN115115679A (zh) 一种图像配准方法及相关设备
WO2022007707A1 (zh) 家居设备控制方法、终端设备及计算机可读存储介质
WO2021254113A1 (zh) 一种三维界面的控制方法和终端
CN114222020B (zh) 位置关系识别方法、设备及可读存储介质
CN115032640B (zh) 手势识别方法和终端设备
CN116048243B (zh) 一种显示方法和电子设备
WO2022222688A1 (zh) 一种窗口控制方法及其设备
WO2022152174A9 (zh) 一种投屏的方法和电子设备
WO2022078116A1 (zh) 笔刷效果图生成方法、图像编辑方法、设备和存储介质
WO2022222705A1 (zh) 设备控制方法和电子设备
CN113970965A (zh) 消息显示方法和电子设备
WO2022222702A1 (zh) 屏幕解锁方法和电子设备
WO2022206709A1 (zh) 应用程序的组件加载方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22790814

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022790814

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022790814

Country of ref document: EP

Effective date: 20231019

NENP Non-entry into the national phase

Ref country code: DE