WO2023125514A1 - 设备控制方法及相关装置 - Google Patents

设备控制方法及相关装置 Download PDF

Info

Publication number
WO2023125514A1
WO2023125514A1 PCT/CN2022/142260 CN2022142260W WO2023125514A1 WO 2023125514 A1 WO2023125514 A1 WO 2023125514A1 CN 2022142260 W CN2022142260 W CN 2022142260W WO 2023125514 A1 WO2023125514 A1 WO 2023125514A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
user
hand
video
adjustment
Prior art date
Application number
PCT/CN2022/142260
Other languages
English (en)
French (fr)
Inventor
姚淅峰
陈开济
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023125514A1 publication Critical patent/WO2023125514A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present application relates to the field of electronic technology, in particular to a device control method and related devices.
  • the target element can be volume, brightness, etc.; Adjusts the magnitude of the above-targeted element by the specified ratio in the speech. For example, if the preset ratio is 5%, when it is detected that the user says “turn up the volume”, the electronic device will turn up the volume by 5% according to the preset ratio; when it detects that the user says “turn up the volume to 50%", the electronic device will The device turns up the volume to 50% of maximum volume.
  • the user cannot predict how the output effect of the target element after an amplitude adjustment is in line with the expected effect.
  • the user may need to communicate with the voice assistant of the electronic device several times. Voice interaction is required to adjust the range of the target element to the desired effect. In this way, user operations are cumbersome, and user experience is poor.
  • the present application provides a device control method and a related device, which can improve the operating efficiency of the user to control the range of the target element in the air, and effectively improve the user experience.
  • the present application provides a device control method, including: the first electronic device receives the first voice information; when the first electronic device determines that the intention corresponding to the first voice information is the adjustment range, based on the first voice information, determine The target element for amplitude adjustment; the first electronic device obtains the user's hand movement parameter; the first electronic device determines the first amplitude adjustment parameter corresponding to the hand movement parameter; the first electronic device adjusts the amplitude of the target element with the first amplitude adjustment parameter .
  • the first electronic device when it is detected that the intention corresponding to the voice spoken by the user is to adjust the range, the first electronic device can obtain the user's hand movement speed and hand movement parameters; along with the user's hand movement, the first electronic device The electronic device can continuously adjust the magnitude of the target element with the magnitude adjustment parameter indicated by the hand movement parameter. In this way, the user does not need to make complicated gestures, and the range of the target element can be adjusted to the desired effect by simply moving the hand, which effectively improves the operation efficiency of the user's control of the range of the target element in the air, thereby effectively improving the user experience.
  • the above solution is applicable to any element that can be adjusted in magnitude, and the user does not need to set a specific gesture for each element; since the first electronic device does not need to recognize complicated gestures, the above solution can also appropriately reduce the performance requirements for the first electronic device .
  • the first electronic device before the first electronic device adjusts the range of the target element with the first range adjustment parameter, it further includes: the first electronic device obtains the user's hand movement direction; the first electronic device determines the hand movement direction The corresponding amplitude adjustment direction is the first amplitude adjustment direction, and the amplitude adjustment direction includes amplitude adjustment and amplitude adjustment; the first electronic device uses the first amplitude adjustment parameter to adjust the amplitude of the target element, including: the first electronic device adjusts the amplitude along the first The magnitude adjustment direction adjusts the magnitude of the target element with the first magnitude adjustment parameter.
  • the first electronic device when it is detected that the intention corresponding to the voice spoken by the user is to adjust the range, the first electronic device can also acquire the moving direction of the user's hand; along with the movement of the user's hand, the first electronic device can The magnitude adjustment direction indicated by the above hand movement direction continuously adjusts the magnitude of the target element. In this way, the user does not need to make complicated gestures, and the adjustment direction of the range of the target element can be determined by simply moving the hand, which effectively improves the operation efficiency of the user to control the range of the target element from a distance.
  • the method further includes: when the first electronic device determines that the intent corresponding to the first voice information is an adjustment range, the first electronic device further determines the range adjustment direction based on the slot corresponding to the first voice information as The first amplitude adjustment direction, the amplitude adjustment direction includes amplitude adjustment and amplitude adjustment; the first electronic device adjusts the amplitude of the target element with the first amplitude adjustment parameter, including: the first electronic device adjusts the amplitude along the first amplitude adjustment direction with the first The magnitude adjustment parameter adjusts the magnitude of the target element.
  • the first electronic device may also acquire the range adjustment direction indicated by the voice. In this way, after the user speaks the voice, he can continuously control the amplitude adjustment of the target element by moving the hand in any direction until the output effect of the target element reaches the expected effect, which effectively improves the user experience.
  • the hand movement parameter is the hand movement speed
  • the first amplitude adjustment parameter is the amplitude adjustment speed
  • the hand movement parameter is the hand movement distance
  • the first amplitude adjustment parameter is the first amplitude adjustment value .
  • the first electronic device may adjust the magnitude of the target element based on the magnitude of the hand movement speed indication, and may also adjust the magnitude of the target element based on the first magnitude adjustment value indicated by the hand movement distance. Be specific.
  • the first electronic device determining the first amplitude adjustment parameter corresponding to the hand movement parameter specifically includes: the first electronic device determining the first amplitude adjustment parameter based on the hand movement parameter and the first sensitivity. Wherein, when the hand movement parameter is constant, the greater the first sensitivity is, the greater the first amplitude adjustment parameter is; when the first sensitivity is constant, the greater the hand movement parameter is, the greater the first amplitude adjustment parameter is.
  • the method further includes: the first electronic device receives the second voice information; when the first electronic device determines the first amplitude adjustment parameter
  • the purpose corresponding to the two voice information is to adjust the sensitivity
  • the first sensitivity adjustment direction and the first sensitivity adjustment value are determined based on the slot corresponding to the first voice information; the first sensitivity is adjusted along the first sensitivity adjustment direction with the first sensitivity adjustment value.
  • the method further includes: when the first electronic device detects the first preset condition, ending the target element This amplitude adjustment; the first preset condition includes one or more of the following: the time after receiving the first voice message exceeds the first preset time; within the second preset time, no effective movement of the user’s hand is detected ; Receive the first preset gesture for stopping the amplitude adjustment; Receive the third voice information for stopping the amplitude adjustment;
  • the effective movement of the hand refers to: the moving distance of the hand along the preset moving direction is greater than the distance threshold , or, the moving speed of the hand is greater than the speed threshold.
  • the first electronic device when the first preset condition is detected, can end the current amplitude adjustment of the target element, and no longer obtain the user's hand movement direction and hand movement parameters; Only then can the amplitude adjustment of the target element be triggered again.
  • the first electronic device obtains the user's hand movement parameters, which specifically includes: the first electronic device obtains the first image and the second image collected by the camera; the first electronic device obtains the The position of the first feature point of the hand in the first image and the second image; based on the position of the first feature point of the hand in the first image and the second image, determine a hand movement parameter of the user.
  • the moving direction of the hand is determined based on the positions of the first feature points of the hand in the first image and the second image.
  • the second electronic device is worn on the user's hand
  • the first electronic device acquires the user's hand movement parameters, including: the first electronic device sends an acquisition request to the second electronic device; the first electronic device The device receives the moving speed and/or moving distance of the second electronic device in the first coordinate system sent by the second electronic device; determines the user's hand movement based on the moving speed and/or moving distance of the second electronic device in the first coordinate system parameter.
  • the first electronic device may acquire the user's hand movement direction and hand movement parameters through the second electronic device. In this way, performance requirements and power consumption of the first electronic device are reduced.
  • the hand movement direction is determined based on the movement direction of the second electronic device in the first coordinate system.
  • the amplitude adjustment direction includes increasing the amplitude and decreasing the amplitude.
  • the first electronic device is preset with a first mapping relationship.
  • the amplitude adjustment corresponds to at least one preset movement direction.
  • the first mapping In the relationship, the reduction in amplitude corresponds to at least one preset movement direction, and the preset movement direction corresponding to increase in amplitude is different from the preset movement direction corresponding to reduction in amplitude; when it is determined based on the first mapping relationship that the hand movement direction belongs to the corresponding
  • the first amplitude adjustment direction is to increase the amplitude; when it is determined based on the first mapping relationship that the hand movement direction belongs to the preset movement direction corresponding to the amplitude decrease, the first amplitude adjustment direction is to decrease the amplitude.
  • both the amplitude adjustment and the amplitude adjustment may correspond to one or more preset moving directions. In this way, more choices can be given to the user
  • the method further includes: when the first electronic device determines that the intention corresponding to the first voice information is an adjustment range, further determining that the target device for the range adjustment is the third device based on the slot corresponding to the first voice information
  • the electronic device the first electronic device adjusts the magnitude of the target element with the first magnitude adjustment parameter, including: the first electronic device sends an adjustment request to the third electronic device, so as to control the third electronic device to adjust the first magnitude in the first magnitude adjustment direction
  • a range adjustment parameter adjusts the range of the target element, and the adjustment request carries the target element, the first range adjustment parameter, and the first range adjustment direction. In this way, not limited to adjusting the amplitude of the target element on the device, the amplitude of the target element of other connected electronic devices can also be adjusted.
  • the first electronic device obtaining the user's hand movement parameters includes: the first electronic device obtaining the user's hand movement parameters of the second preset gesture.
  • the user indicates the magnitude adjustment direction and the magnitude adjustment speed of the target element through the movement of the second preset gesture, and the above target element may be any element that can be adjusted in magnitude.
  • the first electronic device only needs to recognize one gesture (that is, the second preset gesture), which has lower requirements on the system performance of the first electronic device; The amplitude adjustment of these elements effectively improves the user experience.
  • the acquisition of the user's hand movement parameters by the first electronic device includes: the first electronic device acquires the first image and the second image collected by the camera; the first electronic device recognizes the first Whether the image and the second image contain the second preset gesture; the first electronic device acquires the hand movement parameters of the user's first gesture, including: when the first image and the second image contain the second preset gesture, based on The position of the first feature point of the hand in the first image and the second image determines the hand movement parameter of the user.
  • the method further includes: the first electronic device plays the video; when the first electronic device plays the video until the first moment of the video, it detects that the user's line of sight leaves the screen of the first electronic device, and the first electronic device The device records the first moment; the first electronic device detects whether the user leaves the viewing range of the screen or whether the duration of the user's sight leaving the screen exceeds the first threshold; when it is detected that the user leaves the viewing range of the screen or the duration of the user's sight leaving the screen exceeds When the first threshold is reached, the first electronic device rolls back the video to the first moment and pauses it.
  • the video when it is detected that the user's gaze leaves the screen, the timeout of the user's gaze leaving the screen, or the user leaving the viewing range of the screen, the video can be rolled back to the moment when the user's gaze leaves the screen. In this way, it can prevent the user from missing the video clip played after the user's line of sight leaves the screen, and at the same time avoid the frequent mispause of the video caused by the user's line of sight accidentally leaving the screen, realizing intelligent control of video playback, and effectively improving the user's video viewing experience.
  • the method further includes: after the first electronic device pauses the video, when the first electronic device detects that the user's gaze is fixed on the screen again, controlling the video to continue playing.
  • the video after controlling the pause of the video based on the user's line of sight and the state of the character (that is, whether to leave the viewing range of the screen), when the user's line of sight returns to the screen again, the video can be automatically controlled to continue playing without the user operating the first electronic device. In this way, user operations are reduced, intelligent control of video playback is further realized, and the user's video viewing experience is effectively improved.
  • the method further includes: when it is detected that the user has not left the viewing range of the screen or the duration of the user's line of sight leaving the screen has not exceeded the first threshold, the first electronic device detects whether the currently played video clip is exciting ; When it is detected that the currently played video segment is wonderful, the first electronic device pauses the video.
  • the first electronic device controls the playing and pausing of the video according to the splendor of the video segment.
  • the first electronic device controls the video to pause, which can prevent the user from missing the exciting video segment.
  • the method further includes: when it is detected that the currently played video segment is not exciting, acquiring an interaction parameter between the user's line of sight and the screen, and controlling the playback speed of the video based on the interaction parameter.
  • the video segment is not exciting, the user can control the playback speed of the video through the interaction parameters between the line of sight and the screen, which reflects the user's autonomous video control and improves the user's video viewing efficiency.
  • the interaction parameters include interaction frequency and/or gaze duration
  • the interaction frequency refers to the frequency at which the user's line of sight leaves the screen within a preset time
  • the gaze duration refers to the duration of the user's single gaze on the screen.
  • the playback speed of the video including: when the interaction frequency is greater than the second threshold, control the video playback speed to the first speed, and the first speed is greater than the normal playback speed; when the interaction frequency is less than or equal to the second threshold, control the video playback speed to normal playback Speed; when the duration of gazing is less than the third threshold, the control video playback speed is the second speed, and the second speed is greater than the normal playback speed; when the duration of gazing is greater than or equal to the third threshold, the control video playback speed is the normal playback speed.
  • the playback speed of the video can be appropriately increased, thereby improving the user's video viewing efficiency.
  • the method further includes: when detecting that the currently played video segment is not exciting, the first electronic device also detects whether the user leaves the viewing range of the screen or whether the user's line of sight leaves the screen for longer than the first threshold.
  • the method before detecting that the user's line of sight leaves the screen of the first electronic device, the method further includes: the first electronic device judges whether the current environment is an elevator; if the current environment is an elevator, After detecting that the user's line of sight leaves the screen of the first electronic device, the first electronic device detects whether the duration of the user's line of sight leaving the screen exceeds a first threshold.
  • the video includes at least one pre-divided video segment, and whether the video segment is exciting or not may be preset by a video producer, supplier, or copyright owner.
  • the present application provides a device control method, including: the first electronic device plays the video; when the first electronic device plays the video to the first moment of the video, it detects that the user's line of sight leaves the screen of the first electronic device, and the second An electronic device records the first moment; the first electronic device detects whether the user leaves the viewing range of the screen or whether the duration of the user's line of sight leaving the screen exceeds a first threshold; When the duration exceeds the first threshold, the first electronic device rolls back the video to the first moment and pauses it.
  • the video when it is detected that the user's gaze leaves the screen, the timeout of the user's gaze leaving the screen, or the user leaving the viewing range of the screen, the video can be rolled back to the moment when the user's gaze leaves the screen. In this way, it can prevent the user from missing the video clip played after the user's line of sight leaves the screen, and at the same time avoid the frequent mispause of the video caused by the user's line of sight accidentally leaving the screen, realizing intelligent control of video playback, and effectively improving the user's video viewing experience.
  • the method further includes: after the first electronic device pauses the video, when the first electronic device detects that the user's gaze is fixed on the screen again, controlling the video to continue playing.
  • the video can be automatically controlled to continue playing without the user operating the first electronic device. In this way, user operations are reduced, intelligent control of video playback is further realized, and the user's video viewing experience is effectively improved.
  • the method further includes: when it is detected that the user has not left the viewing range of the screen or the duration of the user's line of sight leaving the screen has not exceeded the first threshold, the first electronic device detects whether the currently played video clip is exciting ; When it is detected that the currently played video segment is wonderful, the first electronic device pauses the video.
  • the first electronic device controls the playing and pausing of the video according to the splendor of the video segment.
  • the first electronic device controls the video to pause, which can prevent the user from missing the exciting video segment.
  • the method further includes: when it is detected that the currently played video segment is not exciting, acquiring an interaction parameter between the user's line of sight and the screen, and controlling the playback speed of the video based on the interaction parameter.
  • the video segment is not exciting, the user can control the playback speed of the video through the interaction parameters between the line of sight and the screen, which reflects the user's autonomous video control and improves the user's video viewing efficiency.
  • the interaction parameters include interaction frequency and/or gaze duration
  • the interaction frequency refers to the frequency at which the user's line of sight leaves the screen within a preset time
  • the gaze duration refers to the duration of the user's single gaze on the screen.
  • the playback speed of the video including: when the interaction frequency is greater than the second threshold, control the video playback speed to the first speed, and the first speed is greater than the normal playback speed; when the interaction frequency is less than or equal to the second threshold, control the video playback speed to normal playback Speed; when the duration of gazing is less than the third threshold, the control video playback speed is the second speed, and the second speed is greater than the normal playback speed; when the duration of gazing is greater than or equal to the third threshold, the control video playback speed is the normal playback speed.
  • the playback speed of the video can be appropriately increased, thereby improving the user's video viewing efficiency.
  • the method further includes: when detecting that the currently played video segment is not exciting, the first electronic device also detects whether the user leaves the viewing range of the screen or whether the user's line of sight leaves the screen for longer than the first threshold.
  • the method before detecting that the user's line of sight leaves the screen of the first electronic device, the method further includes: the first electronic device judges whether the current environment is an elevator; if the current environment is an elevator, After detecting that the user's line of sight leaves the screen of the first electronic device, the first electronic device detects whether the duration of the user's line of sight leaving the screen exceeds a first threshold.
  • the video includes at least one pre-divided video segment, and whether the video segment is exciting or not may be preset by a video producer, supplier, or copyright owner.
  • the present application provides an electronic device, including one or more processors and one or more memories.
  • the one or more memories are coupled with one or more processors, the one or more memories are used to store computer program codes, the computer program codes include computer instructions, and when the one or more processors execute the computer instructions, the electronic device performs A device control method in any possible implementation manner of any one of the foregoing aspects.
  • an embodiment of the present application provides a computer storage medium, including computer instructions, which, when the computer instructions are run on the electronic device, cause the electronic device to execute the device control method in any possible implementation of any one of the above aspects .
  • an embodiment of the present application provides a computer program product, which, when the computer program product is run on a computer, causes the computer to execute the device control method in any possible implementation manner of any one of the above aspects.
  • FIG. 1 is a system architecture diagram of a communication system provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 3A is a schematic structural diagram of a smart bracelet provided by an embodiment of the present application.
  • FIG. 3B is a schematic diagram of a ground coordinate system provided by the embodiment of the present application.
  • FIG. 3C is a schematic diagram of an electronic device coordinate system provided by an embodiment of the present application.
  • FIGS. 4A to 4C are schematic diagrams of application scenarios of volume adjustment provided by the embodiment of the present application.
  • 5A to 5C are schematic diagrams of application scenarios of volume adjustment provided by the embodiment of the present application.
  • FIG. 6A and FIG. 6B are schematic diagrams of application scenarios of sensitivity adjustment provided by the embodiment of the present application.
  • FIG. 7A and FIG. 7B are schematic diagrams of application scenarios for stopping volume adjustment provided by the embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a device control method provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a joint recognition model provided by an embodiment of the present application.
  • FIG. 10 is a schematic flow diagram of obtaining the hand movement direction and hand movement speed provided by the embodiment of the present application.
  • FIG. 11A and FIG. 11B are schematic diagrams of acquiring hand feature points provided by the embodiment of the present application.
  • FIG. 11C and Figure 11D are schematic diagrams of hand movement in the image provided by the embodiment of the present application.
  • Figure 11E and Figure 11F are schematic diagrams of the direction of hand movement and the direction of amplitude adjustment provided by the embodiment of the present application;
  • FIG. 12 is a schematic flow diagram of another method for acquiring hand movement direction and hand movement speed provided by the embodiment of the present application.
  • FIG. 13 is a system architecture diagram of a dialogue system provided by an embodiment of the present application.
  • FIGS. 14A to 14J are schematic diagrams of application scenarios for controlling video playback provided by the embodiment of the present application.
  • FIG. 15 is a schematic flowchart of another device control method provided in the embodiment of the present application.
  • FIG. 16 is a schematic diagram of the user's line of sight provided by the embodiment of the present application.
  • FIG. 17 is a schematic flow chart of another device control method provided by the embodiment of the present application.
  • first and second are used for descriptive purposes only, and cannot be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, the “multiple” The meaning is two or more.
  • FIG. 1 exemplarily shows a system architecture diagram of a communication system 10 provided by an embodiment of the present application.
  • the communication system 10 may include an electronic device 100 (the first electronic device involved in the embodiment of the present application may be the electronic device 100), and may also include at least one electronic device 200 connected to the electronic device 100 (the first electronic device in this application The second electronic device involved in this embodiment may be the electronic device 200).
  • the first electronic device involved in the embodiment of the present application may be the electronic device 100
  • the second electronic device involved in this embodiment may be the electronic device 200.
  • the user can interact with the electronic device 100 (such as a smart phone, a smart home device, etc.) in combination with two operation modes of voice and gesture, so as to realize the control of the electronic device 100 or other electronic devices by the electronic device 100, For example, adjusting the magnitude of a target element of the electronic device 100 or other electronic devices.
  • the target element may be volume, brightness, display brightness, curtain opening degree, fan speed, light brightness, air conditioner temperature and other elements that can be adjusted in magnitude.
  • the embodiment of the present application does not specifically limit the type of the target element.
  • the electronic device 100 is a smart phone, and the user can interact with the smart phone by combining voice and gesture operations, so as to control the smart phone to adjust the opening degree of the smart curtain.
  • the electronic device 100 may be equipped with a microphone and voice recognition capabilities, so as to implement voice recognition on the collected environmental sounds, and then determine the target element to be adjusted based on the recognized voice instructions; the electronic device 100 may also be equipped with Camera and gesture recognition capabilities to realize gesture recognition on the collected images, and then determine the direction and speed of the user's gestures. Combining with the voice command and user gesture recognized by the electronic device 100, the electronic device 100 can adjust the magnitude of the target element.
  • the above camera may be a low power consumption camera.
  • the electronic device 200 may also receive and recognize the user's voice command and/or gesture, and then send it to the electronic device 100 .
  • the electronic device 200 is an electronic device that can be held or worn on the hand.
  • the electronic device 200 can be a wearable device (such as a smart bracelet), a remote control device (such as a TV remote control), etc., which are not specifically limited here. Subsequent embodiments will be described by taking the electronic device 200 as the smart bracelet 200 as an example.
  • the second electronic device involved in this embodiment of the present application may be a smart bracelet 200 .
  • the electronic device 100 can have a microphone and voice recognition capabilities, and can realize voice recognition of collected environmental sounds, and then determine the target element to be adjusted based on the recognized voice commands; the smart bracelet 200 can have The acceleration sensor and the gyro sensor are used to obtain the moving direction and moving speed of the user's hand and send them to the electronic device 100 . Combining the voice command recognized by the electronic device 100 and the moving direction and moving speed of the user's gesture acquired by the smart bracelet 200, the electronic device 100 can adjust the magnitude of the target element. It is not limited to the acceleration sensor and the gyroscope sensor. In the embodiment of the present application, the electronic device 200 may also obtain the moving direction and moving speed of the user's hand through other sensors, which are not specifically limited here.
  • the electronic device 100 can be directly connected to the smart bracelet 200 through a short-range wireless communication connection or a local wired connection.
  • the electronic device 100 and the smart bracelet 200 may have a wireless fidelity (wireless fidelity, WiFi) communication module, an ultra wide band (ultra wide band, UWB) communication module, a bluetooth (bluetooth) communication module, a near field communication (near field communication, NFC) communication module, one or more short-distance communication modules in the ZigBee communication module.
  • the electronic device 100 can detect and scan electronic devices (such as smart bracelets 200) near the electronic device 100 by transmitting signals through a short-range communication module (such as a Bluetooth communication module), so that the electronic device 100 can discover nearby wireless devices through a short-range wireless communication protocol.
  • a short-range communication module such as a Bluetooth communication module
  • smart bracelet 200 establish a wireless communication connection with the nearby smart bracelet 200, and transmit data to the nearby smart bracelet 200.
  • the electronic device 100 may also be indirectly connected to the smart bracelet 200 through the communication network 400 .
  • the communication system 10 may further include one or more application servers, such as the server 300 .
  • the server 300 can communicate with the electronic device 100 through the communication network 400 to provide services such as voice recognition and gesture recognition for the electronic device 100 .
  • the electronic device 100 can send the collected environmental sounds and images to the server 300, and the server performs speech recognition and gesture recognition to determine the target element to be adjusted, as well as the moving direction and moving speed of the user's gesture; and the above target element, The moving direction and moving speed are sent to the electronic device 100 .
  • the communication network 400 may be a local area network (local area networks, LAN), or a wide area network (wide area networks, WAN), such as the Internet.
  • the communication network 400 can be implemented using any known network communication protocol, and the above-mentioned network communication protocol can be various wired or wireless communication protocols, such as Ethernet, universal serial bus (universal serial bus, USB), fire wire (FIREWIRE) , Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (Wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), Bluetooth, wireless fidelity (Wi-Fi) , NFC, voice over Internet protocol (voice over Internet protocol, VoIP), communication protocol supporting network slicing architecture, or any other suitable communication protocol.
  • GSM Global System for Mobile Communications
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • the structure shown in this embodiment does not constitute a specific limitation on the communication system 10 .
  • the communication system 10 may include more or less devices than those shown.
  • FIG. 2 shows a schematic structural diagram of the electronic device 100 .
  • the electronic device 100 may be a cell phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, as well as a cellular phone, a personal digital assistant (personal digital assistant) digital assistant (PDA), augmented reality (augmented reality, AR) device, virtual reality (virtual reality, VR) device, artificial intelligence (artificial intelligence, AI) device, wearable device, vehicle-mounted device, smart home device and/or
  • PDA personal digital assistant
  • augmented reality augmented reality, AR
  • VR virtual reality
  • artificial intelligence artificial intelligence
  • wearable device wearable device
  • vehicle-mounted device smart home device
  • smart home device smart home device
  • the embodiment of the present application does not specifically limit the specific type of the electronic equipment.
  • the electronic device can be equipped with iOS, Android, Microsoft or other operating systems.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input and output
  • subscriber identity module subscriber identity module
  • SIM subscriber identity module
  • USB universal serial bus
  • the charging management module 140 is configured to receive a charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 can receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 is charging the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be disposed in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be set in the same device.
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, a baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve antenna utilization.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves and radiate them through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs sound signals through audio equipment (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite, etc. applied on the electronic device 100.
  • System global navigation satellite system, GNSS
  • frequency modulation frequency modulation, FM
  • near field communication technology near field communication, NFC
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , demodulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device 100 is coupled to the hand mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR techniques, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • code division multiple access code division multiple access
  • CDMA broadband Code division multiple access
  • WCDMA wideband code division multiple access
  • time division code division multiple access time-division code division multiple access
  • LTE long term evolution
  • BT GNSS
  • WLAN NFC
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 realizes the display function through the GPU, the display screen 194 , and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 and the application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also optimize the algorithm for image noise and brightness.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be located in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • the electronic device 100 may perform gesture recognition on images collected by the camera 193 to obtain the moving speed and moving direction of the user's hand.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be realized through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the internal memory 121 may include one or more random access memories (random access memory, RAM) and one or more non-volatile memories (non-volatile memory, NVM).
  • RAM random access memory
  • NVM non-volatile memory
  • Random access memory can include static random-access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous dynamic random access memory, SDRAM), double data rate synchronous Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, such as the fifth generation DDR SDRAM is generally called DDR5SDRAM), etc.; non-volatile memory can include disk storage devices, flash memory (flash memory).
  • SRAM static random-access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • non-volatile memory can include disk storage devices, flash memory (flash memory).
  • flash memory can include NOR FLASH, NAND FLASH, 3D NAND FLASH, etc.
  • it can include single-level storage cells (single-level cell, SLC), multi-level storage cells (multi-level cell, MLC), triple-level cell (TLC), quad-level cell (QLC), etc.
  • SLC single-level storage cells
  • MLC multi-level storage cells
  • TLC triple-level cell
  • QLC quad-level cell
  • UFS universal flash storage
  • embedded multimedia memory card embedded multi media Card
  • the random access memory can be directly read and written by the processor 110, and can be used to store executable programs (such as machine instructions) of an operating system or other running programs, and can also be used to store data of users and application programs.
  • the non-volatile memory can also store executable programs and data of users and application programs, etc., and can be loaded into the random access memory in advance for the processor 110 to directly read and write.
  • the external memory interface 120 can be used to connect an external non-volatile memory, so as to expand the storage capacity of the electronic device 100 .
  • the external non-volatile memory communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and video are stored in an external non-volatile memory.
  • the electronic device 100 can implement audio functions through the audio module 170 , the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
  • Speaker 170A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the microphone 170C also called “microphone” or “microphone” is used to collect sounds (such as ambient sounds, including sounds from people and devices, etc.), and convert the sound signals into electrical signals.
  • sounds such as ambient sounds, including sounds from people and devices, etc.
  • the user can put his mouth close to the microphone 170C to make a sound, and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In some other embodiments, the electronic device 100 may be provided with two microphones 170C, which may also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions, etc.
  • the electronic device 100 when the user wants to control the electronic device 100 by voice, the electronic device 100 needs to be woken up by a preset wake-up word first. After detecting the preset wake-up word, the electronic device 100 can respond to the user's voice command and perform corresponding operations.
  • the microphone 170C can collect the ambient sound in real time and obtain audio data.
  • the sound collected by the microphone 170C is related to the environment. For example, when the surrounding environment is relatively noisy and the user speaks a wake-up word, the sound collected by the microphone 170C includes ambient noise and the sound of the user uttering the wake-up word.
  • the application processor of the electronic device 100 remains powered on, and the microphone 170C sends the collected voice information to the application processor.
  • the application processor recognizes the above-mentioned voice information, and can execute the operation corresponding to the above-mentioned voice information. For example, when the application processor recognizes that the above voice information includes a preset wake-up word, it may generate corresponding response information (for example, voice information "I am"), and respond to subsequent voice instructions.
  • the microphone 170C of the electronic device 100 is connected to the microprocessor, the microprocessor remains powered on, and the application processor of the electronic device 100 is not powered on.
  • the microphone 170C sends the collected voice information to the microprocessor, and the microprocessor recognizes the above voice information, and determines whether to wake up the application processor according to the above voice information, that is, powers on the application processor. For example, when the microprocessor recognizes that the voice information includes a preset wake-up word, it wakes up the application processor.
  • the preset wake-up word may be set by default by the electronic device 100 before leaving the factory, or may be preset by the user in the electronic device 100 according to his own needs, which is not specifically limited here.
  • the user can interact with the electronic device 100 in combination with voice and gesture operations, so as to adjust the magnitude of the specified element.
  • the earphone interface 170D is used for connecting wired earphones.
  • the pressure sensor 180A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • the gyro sensor 180B can be used to determine the movement posture of the electronic device 100 .
  • the angular velocity of the electronic device 100 around three axes ie, x, y and z axes
  • the air pressure sensor 180C is used to measure air pressure.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip leather case.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • the distance sensor 180F is used to measure the distance.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • LEDs light emitting diodes
  • photodiodes such as photodiodes
  • the ambient light sensor 180L is used for sensing ambient light brightness.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the temperature sensor 180J is used to detect temperature.
  • the touch sensor 180K is also called “touch device”.
  • the touch sensor 180K can be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the keys 190 include a power key, a volume key and the like.
  • the key 190 may be a mechanical key. It can also be a touch button.
  • the motor 191 can generate a vibrating reminder.
  • the indicator 192 can be an indicator light, and can be used to indicate the charging status, the change of the battery capacity, and can also be used to indicate messages, missed calls, notifications and the like.
  • the SIM card interface 195 is used for connecting a SIM card.
  • FIG. 3A exemplarily shows a schematic structural diagram of a smart bracelet 200 provided by an embodiment of the present application.
  • the smart bracelet 200 may include: a processor 201, a memory 202, a wireless communication module 203, an antenna 204, a power switch 205, a wired LAN communication processing module 206, a USB communication processing module 207, an audio module 208, an acceleration sensor 209 , gyro sensor 210 . in:
  • Processor 201 may be used to read and execute computer readable instructions.
  • the processor 201 may mainly include a controller, an arithmetic unit, and a register.
  • the controller is mainly responsible for instruction decoding, and sends out control signals for the operations corresponding to the instructions.
  • the arithmetic unit is mainly responsible for saving the register operands and intermediate operation results temporarily stored during the execution of the instruction.
  • the hardware architecture of the processor 201 may be an application specific integrated circuit (ASIC) architecture, a MIPS architecture, an ARM architecture, or an NP architecture, and so on.
  • ASIC application specific integrated circuit
  • the processor 201 can be used to analyze signals received by the wireless communication module 203 and/or the wired LAN communication processing module 206, such as detection requests broadcast by the smart bracelet 200, and the like.
  • the processor 201 may be configured to perform corresponding processing operations according to the parsing result, such as generating a probe response, and so on.
  • the processor 201 may also be configured to generate a signal sent by the wireless communication module 203 and/or the wired LAN communication processing module 206 , such as a Bluetooth broadcast signal.
  • the memory 202 is coupled with the processor 201 for storing various software programs and/or sets of instructions.
  • the memory 202 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices or other non-volatile solid-state storage devices.
  • the memory 202 can store operating systems, such as embedded operating systems such as uCOS, VxWorks, and RTLinux.
  • the memory 202 can also store a communication program that can be used by the smart bracelet 200, one or more servers, or accessory devices to communicate.
  • the wireless communication module 203 may include one or more of a UWB communication module 203A, a Bluetooth communication module 203B, a WLAN communication module 203C, and an infrared communication module 203D.
  • one or more of the UWB communication module 203A, the Bluetooth communication module 203B, the WLAN communication module 203C, and the infrared communication module 203D can monitor signals emitted by other devices (such as the electronic device 100), such as measurement signals , scanning signals, etc., and can send response signals, such as measurement responses, scanning responses, etc., so that other devices (such as electronic equipment 100) can find the smart bracelet 200, and through one of UWB, Bluetooth, WLAN or infrared rays or A variety of near-field wireless communication technologies establish wireless communication connections with other devices (such as the electronic device 100 ) for data transmission.
  • one or more of the UWB communication module 203A, the Bluetooth communication module 203B, the WLAN communication module 203C, and the infrared communication module 203D can also transmit signals, such as broadcasting UWB measurement signals and beacon signals, so that other The device (such as the electronic device 100) can discover the smart bracelet 200, and establish a wireless communication connection with other devices (such as the electronic device 100) through one or more short-range wireless communication technologies in UWB, Bluetooth, WLAN or infrared to for data transfer.
  • signals such as broadcasting UWB measurement signals and beacon signals
  • the wireless communication module 203 may also include a cellular mobile communication module (not shown).
  • the cellular mobile communication processing module can communicate with other devices (such as servers) through cellular mobile communication technology.
  • the antenna 204 can be used to transmit and receive electromagnetic wave signals.
  • the antennas of different communication modules can be multiplexed or independent of each other, so as to improve the utilization rate of the antennas.
  • the power switch 205 can be used to control power supply to the smart bracelet 200 .
  • the wired LAN communication processing module 206 can be used to communicate with other devices in the same LAN through the wired LAN, and can also be used to connect to the WAN through the wired LAN, and can communicate with devices in the WAN.
  • the USB communication processing module 207 can be used to communicate with other devices through a USB interface (not shown).
  • the audio module 208 can be used to output audio signals through the audio output interface, so that the smart bracelet 200 can support audio playback.
  • the audio module can also be used to receive audio data through the audio input interface.
  • the gyro sensor 210 can be used to determine the pose of the smart bracelet 200 .
  • the angular velocity of the smart bracelet 200 around three axes ie, x, y and z axes
  • the gyroscope sensor 210 can be determined by the gyroscope sensor 210 , thereby determining the attitude of the smart bracelet 200 .
  • the reference coordinate system of the gyro sensor 210 is usually a ground coordinate system.
  • the three-axis (Xg axis, Yg axis and Zg axis) coordinate system shown in Figure 3B is a kind of ground coordinate system shown in the embodiment of the present application, wherein, the Xg axis points east (east) along the local latitude, and the Yg axis The axis points north along the local meridian, the Zg axis points up along the geographic vertical, and forms a right-handed Cartesian coordinate system with the Xg and Yg axes.
  • the plane formed by the Xg axis and the Yg axis is the local horizontal plane
  • the plane formed by the Y axis and the Zg axis is the local meridian plane.
  • the first coordinate system involved in this embodiment of the present application may be a ground coordinate system.
  • the three-axis (X-axis, Y-axis and Z-axis) coordinate system shown in Figure 3C is an electronic device coordinate system of a smart bracelet 200 shown in the embodiment of the present application, wherein the origin of the electronic device coordinate system can be taken as the electronic device
  • the X-axis points from the center of mass of the above-mentioned main body to the right side of the main body
  • the Y-axis points from the center of mass of the above-mentioned main body to the top of the main body
  • the Y-axis is perpendicular to the X-axis
  • the axes point from the above-mentioned center of mass of the body to the front of the body and are perpendicular to the X and Y axes.
  • the smart bracelet 200 may generally include a main body and a strap, the main body may be configured with a screen, and the XY plane formed by the above-mentioned X axis and Y axis may be parallel to the smart bracelet 200.
  • the main configuration screen may be used to display the smart bracelet 200.
  • the attitude of the smart bracelet 200 can be determined by the three attitude angles of the pitch angle (Pitch), the yaw angle (Yaw) and the roll angle (Roll).
  • the pitch angle (Pitch), the yaw angle (Yaw) and the roll angle (Roll) are usually Refers to the rotation angle of the smart bracelet 200 around the three axes of the ground coordinate system.
  • the pitch angle can be the angle between the Y-axis of the electronic device coordinate system of the smart bracelet 200 and the local horizontal plane;
  • the yaw angle can be the projection of the Y-axis of the above-mentioned electronic device coordinate system on the local horizontal plane and the ground coordinates
  • the included angle of the Yg axis of the coordinate system, and the roll angle may be the included angle of the XY plane of the electronic equipment coordinate system and the Zg axis of the ground coordinate system.
  • the smart bracelet 200 based on the angular velocities around the three axes of the ground coordinate system collected by the gyro sensor, the smart bracelet 200 can determine three attitude angles of the smart bracelet 200, and then determine the current attitude of the smart bracelet 200.
  • the smart bracelet 200 can obtain a transformation matrix between the coordinate system of the electronic device and the coordinate system of the ground.
  • the acceleration sensor 209 can detect the acceleration of the smart bracelet 200 in various directions (generally three axes). When the smart bracelet 200 is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize the posture of the device, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • the acceleration sensor 209 may be a piezoresistive acceleration sensor or a capacitive acceleration sensor, and the embodiment of the present application does not specifically limit the type of the acceleration sensor.
  • the reference coordinate system of the acceleration sensor 209 is the electronic device coordinate system of the smart bracelet 200, and the acceleration sensor 209 can detect the acceleration of the smart bracelet 200 in the three-axis directions of the electronic coordinate system. Based on the acceleration data collected by the acceleration sensor 209 and the time stamp of the acceleration data, the smart bracelet 200 can also determine the speed of the smart bracelet 200 in the three-axis direction of the coordinate system of the electronic device.
  • the smart bracelet 200 can convert the acceleration in the three-axis direction of the electronic coordinate system of the smart bracelet 200 into the acceleration in the three-axis direction of the ground coordinate system based on the aforementioned transformation matrix.
  • the smart bracelet 200 includes an inertial measurement unit (Inertial Measurement Unit, IMU), and the IMU includes a three-axis acceleration sensor (such as the acceleration sensor 209) and a three-axis gyro sensor (such as the gyro sensor 210),
  • IMU Inertial Measurement Unit
  • the above-mentioned acceleration sensor can detect the three-axis acceleration signal in the electronic device coordinate system of the smart bracelet 200
  • the above-mentioned gyroscope sensor can detect the angular velocity signal of the smart bracelet 200 relative to the above-mentioned ground coordinate system, according to the measured smart bracelet 200
  • the angular velocity and acceleration in the three-dimensional space can determine the posture, moving direction and moving speed of the smart bracelet 200 .
  • the smart bracelet 200 shown in FIG. 3A is only one example, and that the smart bracelet 200 may have more or fewer components than those shown in FIG. 3A , and two or more components may be combined. , or can have different component configurations.
  • the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the electronic device 100 may identify the intent and slot corresponding to the voice information input by the user, and then perform corresponding operations in response to the voice information based on the intent and the slot.
  • Intent refers to the operation performed on the specified data or resources.
  • the voice information input by the user each time may correspond to at least one intention of the user.
  • Intentions can be named using verb-object phrases, for example, “turn up the volume”, “turn down the brightness”, “book a plane ticket”, “check the weather”, “play music”, etc. are all expressions of intention.
  • the embodiment of the present application supports any natural language expression that conforms to the user's habits, and different voice information may correspond to the same intention. For example, when the user wants to express the intention of "turning up the volume”, the user is supported to use a more standardized and formatted expression such as “turn up the media volume of the TV”, and the user is also supported to use such as “turn up the volume”, etc. Simpler and less informative expressions also support users to use keyword-based expressions such as "volume”. Of course, the embodiments of the present application also support expressions in other ways, which are not specifically limited here.
  • Intent Detection As the name implies, it is to judge what the user wants to do based on the voice information input by the user. Intent recognition is essentially a text classification (or semantic expression classification) task, that is, multiple intentions are preset, and it is determined which of the above-mentioned intentions corresponds to the voice information.
  • slot refers to the key information extracted from the voice information input by the user; through slot recognition, the user's implicit intention can be converted into an explicit instruction so that the computer can understand .
  • Slots are used to store attributes of data or resources, and specific information in a slot can be referred to as slot information (or slot value) for short.
  • an intent corresponds to at least one slot, and a slot corresponds to the slot information of a type of attribute. For example, when the user says “double the volume of the video on the smart screen", the intent corresponding to the above voice information can be “adjust the volume”, and this intent can correspond to "target device", "volume type", and “adjustment range”. , "Adjust Orientation" and so on.
  • the attribute of the slot information corresponding to the "target device” may be a device name (such as a smart screen), and the attribute of the slot information corresponding to the "adjustment range” may be a numerical value (such as a percentage, a multiple, etc.).
  • Slot filling refers to the extraction of structured fields (ie, semantic components) in the language text corresponding to the voice information input by the user. Slot filling is essentially a sequence labeling task. Sequence labeling is to label each character in a given text, which is essentially a problem of classifying each element in a linear sequence according to the context; given a specific set of labels, sequence labeling can be performed. In the embodiment of the present application, the sequence labeling technology is used to mark each structured field with an appropriate label according to the context of the language text, that is, to determine its slot.
  • the user can indicate the intention of adjusting the amplitude of the electronic device 100 and the target element to be adjusted by voice;
  • the amplitude adjustment speed or the amplitude adjustment value of the target element is indicated by hand movement for example; thus, the electronic device 100 can follow the movement of the user's hand and adjust the amplitude of the target element on the electronic device 100 or other electronic devices in real time.
  • the above-mentioned target elements may be volume, brightness, display brightness, curtain opening degree, fan speed, light brightness, air conditioner temperature and other elements that can be adjusted in magnitude.
  • the electronic device 100 can obtain the first mapping relationship between the preset hand movement direction and the amplitude adjustment direction, and the second mapping relationship between the hand movement speed and the amplitude adjustment speed or the example of hand movement and amplitude adjustment
  • the amplitude adjustment direction includes amplitude adjustment and amplitude adjustment. Wherein, increasing the amplitude may correspond to one or more preset hand moving directions, and decreasing the amplitude may also correspond to one or more preset hand moving directions.
  • moving the hand to the left corresponds to decreasing the amplitude
  • moving the hand to the right corresponds to increasing the amplitude
  • moving the hand counterclockwise corresponds to decreasing the amplitude
  • moving the hand clockwise corresponds to increasing the amplitude
  • the hand moves Moving to the left and moving the hand counterclockwise corresponds to a smaller range
  • moving the hand to the right and moving the hand clockwise corresponds to a larger range.
  • the projection of the hand moving direction to the right on the horizontal axis points to the preset direction of the horizontal axis
  • the projection of the hand moving direction to the left on the horizontal axis points to the preset direction of the horizontal axis.
  • the opposite direction of the set direction It is not limited to the aforementioned hand movement direction and the first mapping relationship. In this embodiment of the present application, other hand movement directions and the first mapping relationship may also be preset, which are not specifically limited here.
  • first mapping relationship, second mapping relationship and third mapping relationship may be set by default when the electronic device 100 leaves the factory, or may be preset by a user, which are not specifically limited here.
  • an application scenario of the device control method provided by the embodiment of the present application is introduced with reference to FIG. 4A to FIG. 7B .
  • moving the hand to the left corresponds to a smaller range
  • moving the hand to the right corresponds to a larger range.
  • the electronic device 100 may collect an image sequence of the user's moving hand through the camera; perform hand recognition on the collected image sequence, and acquire hand moving direction, hand moving speed and/or hand moving distance.
  • the electronic device 100 shown in FIG. 4A is configured with a voice assistant, and the voice assistant of the large-screen device has a voice wake-up function enabled.
  • the voice assistant When the user watches the video played by the large-screen device, he speaks out the first voice information, that is, "Xiaoyi Xiaoyi, adjust the volume".
  • the first voice information includes a wake-up word and a voice for instructing to adjust the volume.
  • the electronic device 100 wakes up the voice assistant, and the voice assistant determines that the intent corresponding to the first voice information is "amplitude adjustment", and after the target element to be adjusted is the volume, the electronic device 100 displays as shown in Figure 4B
  • the volume indicator bar 301 is shown, and the camera is started to collect images. Wherein, the length of the volume indicator bar 301 is used to indicate the maximum volume, and the length of the shaded part in the volume indicator bar 301 is used to indicate the current volume level.
  • the electronic device 100 collects an image sequence of the user's moving hand through the camera, and the electronic device 100 performs hand recognition on the collected images, and then determines the hand.
  • the electronic device 100 determines that the magnitude corresponding to the direction of hand movement is increased based on the above-mentioned first mapping relationship, and calculates the magnitude corresponding to the hand movement speed based on the above-mentioned second mapping relationship Adjust the speed or calculate the amplitude adjustment value corresponding to the hand movement distance based on the above-mentioned third mapping relationship; the electronic device 100 can adjust the speed or the amplitude adjustment value according to the above-mentioned range, increase the volume, and increase the volume indicator bar according to the increased volume The length of the shaded part in 301.
  • the electronic device 100 continuously performs hand recognition on the collected images to obtain The direction of hand movement, the speed of hand movement and/or the distance of hand movement; after the user turns to move the hand to the left, the electronic device 100 determines that the corresponding amplitude of the hand movement direction is reduced based on the first mapping relationship, and continues to The above second mapping relationship calculates the amplitude adjustment speed corresponding to the hand movement speed or calculates the amplitude adjustment value corresponding to the hand movement distance based on the above third mapping relationship; the electronic device 100 can adjust the speed or amplitude adjustment value according to the above amplitude, and turn down the volume , and according to the reduced volume, reduce the length of the shaded part in the volume indicator bar 301 .
  • the electronic device 100 performs gesture recognition on the collected images, and when the second preset gesture is recognized, determines the hand moving direction, hand moving speed and/or hand moving distance. It can be understood that if the user does not move the hand with a specific gesture, the electronic device 100 cannot adjust the volume with the hand movement.
  • the second preset gesture is a hand with five fingers spread out as shown in FIG. 4B .
  • the user when the user intends to adjust the volume, he may first wake up the voice assistant of the large-screen device through a voice message; to instruct the large-screen device to adjust the volume.
  • the user speaks the voice information "Xiaoyi Xiaoyi"; the electronic device 100 detects the user's voice information, recognizes that the above voice information includes the preset wake-up word "Xiaoyi Xiaoyi", and the electronic device 100 wakes up the electronic device 100
  • the voice assistant the voice assistant can send out the voice message "I'm here" for responding to the user, to indicate to the user that the electronic device 100 has been woken up; then, the user sends out the first voice message of "adjust the volume”.
  • the first voice information is used to indicate the intention of the adjustment range and the target element (such as volume) to be adjusted, and is not limited to the language expression of "adjust the volume”, and the voice content indicating the volume adjustment in the first voice information can also be It can be expressions such as “adjust the video playback volume”, “the volume is too loud”, “increase the volume”, etc., which are not specifically limited here.
  • the volume of the electronic device 100 is classified into multiple types, such as ringtone volume, media volume, alarm clock volume, and the like.
  • the user can specify the specific volume type of the target element in the first voice information; when the user does not specify the volume type of the target element, the electronic device 100 can use the volume type used by the application running in the foreground (for example, the video application adopts media volume) or The default volume type, determined as the volume type of the target element.
  • the user can wear the smart bracelet 200 , and the electronic device 100 can obtain the hand movement direction and hand movement speed through the IMU on the smart bracelet 200 .
  • the electronic device 100 can obtain the user's hand moving direction, hand moving speed and/or hand moving distance without collecting images or performing hand recognition; for some electronic devices that do not have a camera or have insufficient performance, It is also possible to continuously adjust the volume range through the first voice information and hand movement.
  • the electronic device 100 wakes up the voice assistant, determines that the intent corresponding to the first voice information is "amplitude adjustment", and after the target element to be adjusted is the volume, displays the volume indication as shown in FIG. 5B bar 301, and send an acquisition request to the smart bracelet 200, where the acquisition request is used to acquire the user's hand moving speed and hand moving direction.
  • the smart bracelet 200 acquires the user's hand movement direction, hand movement speed and/or hand movement distance in real time through the IMU, and sends them to the electronic device 100 .
  • the smart bracelet 200 obtains the user's hand movement direction, hand movement speed and/or hand movement distance in real time through the IMU, and sends them to the electronic The device 100; the electronic device 100 determines that the hand movement direction corresponds to a smaller range based on the above-mentioned first mapping relationship, and calculates the amplitude adjustment speed corresponding to the hand movement speed based on the above-mentioned second mapping relationship or calculates the hand movement based on the above-mentioned third mapping relationship The range adjustment value corresponding to the distance; the electronic device 100 can adjust the speed or range adjustment value according to the range, turn down the volume, and reduce the length of the shaded part in the volume indicator bar 301 according to the lowered volume.
  • the smart bracelet 200 obtains the user's hand movement direction, hand movement direction, and hand movement through the IMU in real time.
  • the electronic device 100 determines that the corresponding range of the hand movement direction is increased based on the above-mentioned first mapping relationship, and continues Calculate the amplitude adjustment speed corresponding to the hand movement speed based on the above-mentioned second mapping relationship or calculate the amplitude adjustment value corresponding to the hand movement distance based on the above-mentioned third mapping relationship; the electronic device 100 can adjust the speed or the amplitude adjustment value according to the amplitude. volume, and increase the length of the shaded part in the small volume indicator bar 301 according to the increased volume.
  • the user can adjust the sensitivity through the second voice information.
  • the sensitivity of the amplitude adjustment when the hand moving speed is constant, the lower the sensitivity of the amplitude adjustment is, the smaller the amplitude adjustment speed corresponding to the hand moving speed is; or, when the hand moving speed is constant, the amplitude adjustment The greater the sensitivity of , the smaller the amplitude adjustment speed corresponding to the hand moving speed, which is not specifically limited here.
  • the sensitivity of the amplitude adjustment when the hand moving distance is constant, the smaller the sensitivity of the amplitude adjustment, the smaller the amplitude adjustment value corresponding to the hand moving distance; or, when the hand moving speed is constant, the greater the sensitivity of the amplitude adjustment, the smaller the The smaller the amplitude adjustment value corresponding to the internal moving distance, it is not specifically limited here.
  • the sensitivity of the amplitude adjustment may also be referred to as the first sensitivity.
  • the electronic device 100 determines the amplitude adjustment speed corresponding to the user's hand movement speed according to the reduced sensitivity and the second mapping relationship, and then adjusts the volume based on the amplitude adjustment speed.
  • the initial value of the sensitivity in each volume adjustment process is the same, that is, the sensitivity adjustment in this volume adjustment process will not be carried over to the next volume adjustment process.
  • the sensitivity of the first volume adjustment of the electronic device 100 is a preset initial value, and the user can adjust the sensitivity in each subsequent volume adjustment process, and the sensitivity adjustment this time will be used until the next time volume adjustment process. In this way, in the implementation manner 2, the adjusted sensitivity conforms to the usage habit of the user, and the user does not need to adjust the sensitivity again in the next volume adjustment process.
  • sensitivity can also be used as a target element that can be adjusted in amplitude, and adjusted through a separate sensitivity adjustment process.
  • the first voice information uttered by the user may be "adjust the sensitivity", and then the user indicates the magnitude adjustment direction and the magnitude adjustment speed of the sensitivity by moving the hand.
  • the electronic device 100 may save the latest adjusted sensitivity, and use the adjusted sensitivity in a subsequent amplitude adjustment process.
  • the electronic device 100 when the electronic device 100 detects the first preset condition, the electronic device 100 ends the current volume adjustment and stops displaying the volume indicator bar 301 .
  • the first preset condition may include one or more of the following: the duration after receiving the first voice message exceeds the first preset duration (for example, the first preset duration is 15s); within the second preset duration (for example, The second preset duration is 5s), no effective movement of the user's hand is detected; the first preset gesture for stopping the amplitude adjustment is received; the third voice information for stopping the amplitude adjustment is received.
  • the effective movement of the user's hand refers to: the moving distance of the user's hand along the preset hand moving direction is greater than the distance threshold, or the user's hand moving speed is greater than the speed threshold; the first preset gesture is different from the second preset gesture; Assuming a gesture, for example, the first preset gesture is a hand making a fist.
  • the user stops hand movement and shows the first preset gesture shown in FIG. 7A .
  • the electronic device 100 detects the first preset gesture, the electronic device 100 ends the current volume adjustment, and stops displaying the volume indicator bar 301.
  • the electronic device 100 can follow the user's hand movement direction, hand movement speed and/or hand movement Continuously increase or decrease the volume according to the distance, so that the user can obtain the video volume that meets the expected effect.
  • the user can also control the sensitivity of the amplitude adjustment through the second voice information.
  • the electronic device 100 detects a first preset condition (such as a first preset gesture), it can automatically end the volume adjustment this time, and the user can start the aforementioned volume adjustment process again through the first voice message.
  • a first preset condition such as a first preset gesture
  • FIG. 8 shows a flow chart of a device control method provided by an embodiment of the present application.
  • the device control method includes but not limited to the following steps S101 to S109.
  • the user speaks out first voice information, and the electronic device 100 receives the first voice information.
  • the first voice information may include a wake-up word of a voice assistant of the electronic device 100 .
  • the wake-up word of the voice assistant of the electronic device 100 is "Xiaoyi Xiaoyi”
  • the first voice information may be "Xiaoyi Xiaoyi, adjust the volume”. It can be understood that after the user wakes up the voice assistant of the electronic device 100 through the wake-up word, the voice assistant of the electronic device 100 will respond to the voice information spoken by the user and perform corresponding operations.
  • the electronic device 100 recognizes language text 1 corresponding to the first voice information.
  • Speech recognition refers to the conversion of human speech into corresponding text by computers.
  • the electronic device 100 acquires the language text 1 corresponding to the first voice information by using an automatic speech recognition (automatic speech recognition, ASR) technology.
  • ASR automatic speech recognition
  • the electronic device 100 recognizes the language text 1 corresponding to the first voice information, including: extracting the audio features of the first voice information; inputting the audio features of the first voice information into an acoustic model (acoustic model, AM), the acoustic The model outputs the phonemes and characters corresponding to the above audio features; the phonemes and characters corresponding to the above audio features are input into the language model (LM), and the language model outputs a set of token sequences corresponding to the first voice information, that is, language text 1.
  • acoustic model acoustic model
  • the token can be a letter (Grapheme, the basic unit of writing), a word (word), a morpheme (Morpheme, the smallest unit that can convey meaning, smaller than a word, larger than a letter) or bytes (bytes), which are not specifically limited.
  • the acoustic model can obtain the probability that an audio feature belongs to a certain acoustic unit (such as a phoneme or a word), and then can decode the audio feature of a voice input (such as the first voice information) into a unit such as a phoneme or a word.
  • the language model can obtain the probability that a set of token sequences is the language text corresponding to this speech information, and then can decode the acoustic unit corresponding to the first speech information into a set of token sequences.
  • the acoustic model and the language model may use a neural network model.
  • the neural network model 1 that combines the acoustic model and the language model can be jointly trained; the language text 1 corresponding to the first speech information can be directly recognized by using the jointly trained neural network model 1 .
  • the above-mentioned electronic device 100 recognizes the language text 1 corresponding to the first voice information, including: extracting the audio features of the first voice information; inputting the audio features of the first voice information into the neural network model 1, and the neural network model 1 outputs the first The language text 1 corresponding to the voice information.
  • the above-mentioned extraction of audio features of the first voice information includes: separating and denoising the input voice stream of the first voice information; A processing algorithm is used to obtain audio features corresponding to the first voice information.
  • the electronic device 100 converts the language text 1 into intents and slots that can be understood and executed by the electronic device 100, and the user's demands are represented by the intents and slots.
  • the electronic device 100 inputs the language text 1 into the intent classifier, and the intent classifier may output the intent corresponding to the above language text 1 .
  • the above-mentioned intent classifier can be a support vector machine (SVM), a decision tree or a deep neural network (Deep Neural Networks, DNN).
  • the deep neural network may be a convolutional neural network (convolutional neural network, CNN) or a recurrent neural network (recurrent neural network, RNN), etc., which are not specifically limited here.
  • the electronic device 100 may use a slot classifier to mark at least one slot of the language text 1 .
  • the above-mentioned slot classifiers can be maximum entropy Markov model (Maximum Entropy Markov Model, MEMM), conditional random field (conditional random field, CRF), and recurrent neural network (RNN).
  • MEMM Maximum Entropy Markov Model
  • CRF conditional random field
  • RNN recurrent neural network
  • intent recognition and slot filling can be processed as two separate tasks, or jointly processed.
  • the input of the above joint training model is language text 1 or the text features corresponding to language text 1
  • the output of the above joint training model is the intent corresponding to language text 1
  • at least one slot corresponding to language text 1 Since the intent and slot corresponding to the same voice information are usually associated, the accuracy of intent recognition and slot filling can be improved through joint processing.
  • FIG. 9 is a joint identification model of an intent and a slot provided by an embodiment of the present application.
  • the language text 1 is input into the recognition model, and the recognition model uses a word embedding (word embedding) algorithm (for example, word2vec algorithm) to generate a word vector (word embedding) corresponding to the language text 1;
  • the word vector of the word is input into the BERT (Bidirectional Encoder Representations from Transformers, based on the converter's bidirectional encoder representation) model, and the BERT model outputs the hidden features corresponding to the above word vectors;
  • the above hidden features are respectively input into the intent classifier (for example, LSTM) and
  • a slot classifier the intent classifier outputs an intent corresponding to language text 1
  • the slot classifier outputs at least one slot corresponding to language text 1.
  • language text 1 is "adjust the brightness of the TV”
  • the corresponding intent of language text 1 is "adjust brightness (ADJUST_LUMINANCE)”
  • [cls] in the text input of the recognition model shown in Figure 9 is used to indicate the text classification task, and the BERT model inserts a [cls] symbol before the text, and uses the output vector corresponding to the symbol as the entire text
  • the semantic representation of the input; [SEP] is used to indicate the sentence pair classification task.
  • the BERT model divides the two input sentences with a [SEP] symbol, and attaches two different word vectors to the two sentences respectively. to distinguish.
  • the embodiment of the present application recognizes the intention to adjust the magnitude of at least one preset element as “adjustment magnitude”.
  • the electronic device 100 can recognize that the intentions corresponding to the two language texts of "adjust the volume of the smart screen” and “adjust the brightness of the smart screen” are both “adjustment range”; in step S104, the electronic device 100 Whether the intent corresponding to language text 1 is "adjustment range" can be determined directly according to the recognition result of step S103.
  • At least one element preset above may be volume, brightness, display brightness, curtain opening degree, fan speed, light brightness, air conditioner temperature and other elements that can be adjusted in magnitude, which is not specifically limited here.
  • the electronic device 100 performs intent recognition on the first voice information according to the existing intent classification, and classifies the intent corresponding to the amplitude adjustment of at least one preset element as "adjustment range"; Device 100 may store file 1 indicating which intents are categorized as "magnitude of adjustment”.
  • Document 1 indicates that the intent classified as “adjustment range” includes “adjust volume” and "adjust brightness”; in step S103, the electronic device 100 identifies the intent corresponding to the language text "adjust the volume of the smart screen” as “Adjust the volume”, in step S104, if the electronic device 100 determines according to the file 1 that "adjust the volume” belongs to the "adjustment range", then determine that the intent corresponding to the language text 1 is "adjustment range”.
  • the electronic device 100 determines the target element to be adjusted based on the slot corresponding to the language text 1.
  • the intent corresponding to the above voice information may be “adjustment range”, and the slot corresponding to the intent includes “volume”; the electronic device 100 may determine the volume to be adjusted based on the slot corresponding to language text 1
  • the target element for is the media volume.
  • the intent corresponding to the above voice information may be “adjust the volume”, which is classified into “adjustment range”, and the slot corresponding to the intent includes "volume type (that is, media volume )"; the electronic device 100 can determine that the target element to be adjusted is the media volume based on the slot corresponding to the language text 1.
  • the slot corresponding to language text 1 may also be referred to as the slot corresponding to the first voice information.
  • the electronic device 100 acquires the user's hand moving direction and hand moving speed.
  • the electronic device 100 After determining that the user's intention is "adjustment range", the electronic device 100 starts to acquire the user's hand movement direction and hand movement speed.
  • the electronic device 100 may acquire a preset first mapping relationship between the hand movement direction and the amplitude adjustment direction, and a second mapping relationship between the hand movement speed and the amplitude adjustment speed.
  • the amplitude adjustment direction includes amplitude adjustment and amplitude adjustment; in the first mapping relationship, the amplitude adjustment can correspond to one or more preset movement directions of the hand, and the amplitude adjustment can also correspond to one or more preset movement directions of the hand Direction, the preset movement direction corresponding to the increase of the amplitude is different from the preset movement direction corresponding to the decrease of the amplitude.
  • the preset moving direction includes, but is not limited to, moving up, moving down, moving right, moving left, moving clockwise, moving counterclockwise, and the like.
  • the preset moving direction 1 of the hand corresponds to a larger amplitude
  • the preset moving direction 2 of the hand corresponds to a smaller amplitude
  • the electronic device 100 acquires the user's hand moving direction and hand moving speed through images collected by the camera.
  • FIG. 10 shows a flow chart of acquiring the user's hand movement direction and hand movement speed provided by the embodiment of the present application.
  • the electronic device 100 acquires an image sequence captured by the camera.
  • the electronic device 100 continuously collects images through a front-facing camera (such as a low-power camera), and after determining that the user's intention is "adjustment range", the electronic device 100 obtains a sequence of images captured by the camera in real time. In one implementation manner, the electronic device 100 determines that the user's intention is "after adjusting the range", and then starts the front camera to collect the image sequence in real time.
  • a front-facing camera such as a low-power camera
  • the electronic device 100 recognizes the hand in each frame of image in the above image sequence through hand recognition.
  • the electronic device 100 obtains the position of the first feature point of the hand in each frame of image collected in real time through hand recognition, and the electronic device 100 obtains the position of the first feature point according to the position change of the first feature point in the image sequence The user's hand movement direction and hand movement speed.
  • the above image sequence includes image 1, the electronic device 100 inputs image 1 into the hand recognition model, and the hand recognition model outputs a hand detection frame of a preset shape corresponding to the hand in image 1, and the hand detection
  • the frame is used to indicate the area where the hand is located in the image 1; the electronic device 100 determines the position of the first feature point based on the hand detection frame.
  • the above-mentioned hand recognition model can adopt a trained neural network model, and the above-mentioned preset shape can be a preset rectangle, ellipse or circle, etc., which are not specifically limited here.
  • the first feature point may be a specific position of the hand detection frame. For example, when the aforementioned preset shape is a preset rectangle, the first feature point may be the center position, upper left corner or upper right corner of the hand detection frame.
  • the above-mentioned preset shape is a rectangle
  • the electronic device 100 obtains a rectangular hand detection frame corresponding to image 1 by performing hand recognition on image 1, and takes the upper left corner of the hand detection frame as the first Feature points.
  • the above image sequence includes image 1, the electronic device 100 inputs image 1 into the hand recognition model, and after the hand recognition model recognizes the hand in image 1, it uses the bone point recognition algorithm to output the hand in image 1
  • Fig. 11B shows 21 skeletal points of the hand.
  • the first feature point may be an average position of at least one preset skeleton point identified by the electronic device 100 .
  • the electronic device 100 recognizes that the positions of the two preset skeletal points of the hand in image 1 are (x1, y1) and (x2, y2) respectively, and the position of the first feature point may be (0.5*(x1+x2 ), 0.5*(y1+y2)).
  • the camera of the electronic device 100 collects image 1 , image 2 and image 3 respectively. Every time the electronic device 100 collects a frame of image, the position of the first feature point can be obtained through hand recognition.
  • the positions of the first feature points acquired by the electronic device 100 are A(X(T1), Y(T1)), B(X(T2), Y(T2) ) and C(X(T3), Y(T3)).
  • the XY coordinate system in FIG. 11C is an exemplary two-dimensional image coordinate system given in this application. In FIG.
  • the positive direction of the positive direction from the first column of pixels in the image is vertical to the last column of pixels in the image, the Y-axis in Figure 11C is parallel to the two sides of the image, and the positive direction of the Y-axis is from the last row of pixels in the image to the first Rows of pixels, with the X axis perpendicular to the Y axis.
  • FIG. 11D shows the movement trajectory of the first feature point of the user's hand from time T1 to time T3 .
  • the moving direction of the user's hand in the above-mentioned two-dimensional image coordinate system can be indicated by the two-dimensional moving vector AB, that is, (X(T2)-X(T1), Y(T2)-Y(T1) ).
  • the moving direction of the user's hand in the above two-dimensional coordinate system can be indicated by the two-dimensional moving vector BC, namely (X(T3)-X(T2), Y(T3)-Y(T2)) .
  • the electronic device 100 determines that the user's hand movement direction is: the preset movement direction corresponding to the two-dimensional movement vector of the hand in the above-mentioned two-dimensional image coordinate system (i.e. Preset direction of movement 1 or direction of movement 2).
  • the preset moving direction 1 is moving to the right, and the preset moving direction 2 is moving to the left. If the projection of the two-dimensional movement vector of the hand on the X-axis in image 1 points to the preset direction of the X-axis (for example, the opposite direction of the X-axis), the electronic device 100 can determine that the hand movement direction is the preset movement direction 1 ( That is, move to the right); if the projection of the hand to the hand moving direction on the X-axis points to the opposite direction of the preset direction of the X-axis (for example, the positive direction of the X-axis), the electronic device 100 can determine that the hand moving direction is the preset direction. Let the moving direction be 2 (that is, move to the left).
  • the moving speed of the user's hand can be the speed along the two-dimensional moving vector (such as vector AB or vector BC), or it can be the speed along the X axis (that is, the preset moving direction 1 or the preset moving direction 2), where Not specifically limited.
  • the electronic device 100 After the electronic device 100 successively collects image 1 and image 2, it determines that the projection of the two-dimensional vector AB of the first feature point on the X axis from time T1 to time T2 points to the horizontal axis It can be determined that the user's hand movement direction belongs to the preset movement direction 1, that is, moving to the right.
  • the moving speed of the user's hand is the speed along the X-axis
  • the first feature point moves along the X-axis (X(T2)-X(T1)) from time T1 to time T2
  • the electronic device 100 determines the user's The hand movement speed is (X(T2)-X(T1))/(T2-T1).
  • the moving speed of the user's hand is the speed along the two-dimensional moving vector AB, the first feature point moves
  • the speed is
  • represents the modulus value of the vector *.
  • the electronic device 100 can determine that the user's hand movement direction belongs to the preset movement direction 1; determine that the user's hand movement speed is (X(T3)-X(T2))/( T3-T2) or
  • the preset moving direction 1 is clockwise, and the preset moving direction 2 is counterclockwise. If the vector cross product of the motion vector BC and the motion vector AB is greater than zero, the motion vector BC is in the clockwise direction of the motion vector AB; that is, the user's hand moves clockwise in the image, but the user's hand moves counterclockwise in the actual environment.
  • the electronic device 100 determines the moving speed of the user's hand as a speed along a two-dimensional moving vector (eg, vector BC or vector AC).
  • the preset moving direction 1 includes clockwise and other preset directions (such as moving to the right and moving up), and the preset moving direction 2 also includes counterclockwise and other preset directions (such as moving to the left moving and moving down), when the moving vector BC and the moving vector AB are collinear, it can be judged whether the user's hand moving direction is moving to the right or moving up; if so, it is determined that the user's hand moving direction belongs to the preset movement direction 1, otherwise it is determined that the user's hand movement direction belongs to the preset movement direction 2.
  • the electronic device 100 After the electronic device 100 successively collects image 1, image 2 and image 3, it determines the two-dimensional motion vector AB of the first feature point from time T1 to time T2 and from time T2 to time T3
  • the two-dimensional movement vector BC, the vector cross product of the vector BC and the vector AB is less than zero, and the electronic device 100 determines that the user's hand movement direction belongs to the preset movement direction 1, that is, moves clockwise.
  • the electronic device 100 determines that the user's hand movement speed is
  • the electronic device 100 determines whether the image in the sequence of images captured by the camera contains the second preset gesture through gesture recognition; position; the electronic device 100 obtains the user's hand movement direction and hand movement speed according to the position change of the first feature point in the image sequence; if not included, the electronic device 100 determines that no effective movement of the user's hand has been detected, that is The direction of hand movement and the speed of hand movement are not detected.
  • the movement of the user with the second preset gesture can indicate the amplitude adjustment direction and amplitude adjustment speed of the target element.
  • the above target elements can be volume, brightness, display brightness, curtain opening degree, fan speed, light brightness, air conditioner, etc. Any element that can be adjusted in amplitude, such as temperature.
  • the electronic device 100 in the solution proposed in the embodiment of the present application only needs to recognize one gesture (that is, The second preset gesture) has lower requirements on the system performance of the electronic device 100; the user can adjust the range of various elements through one gesture, without needing to memorize complicated gestures.
  • the electronic device 100 acquires the user's hand movement direction and hand movement speed through the smart bracelet 200 .
  • Fig. 12 shows another flow chart for obtaining the user's hand movement speed and hand movement direction provided by the embodiment of the present application.
  • the electronic device 100 sends an acquisition request to the smart bracelet 200 .
  • the electronic device 100 after determining that the user's intention is "adjustment range", the electronic device 100 sends an acquisition request to the smart bracelet 200 to request acquisition of the user's hand moving speed and hand moving direction.
  • the smart bracelet 200 acquires the moving speed and moving direction of the smart bracelet 200 .
  • the smart bracelet 200 in response to the above acquisition request, can detect the attitude angle of the smart bracelet 200 through the gyroscope sensor, and can determine the conversion matrix according to the above attitude angle;
  • the smart bracelet 200 is equipped with an IMU, and the IMU includes the above-mentioned acceleration sensor and/or gyroscope sensor, and the smart bracelet 200 can detect the moving speed and movement of the smart bracelet 200 in the above-mentioned ground coordinate system through the IMU. direction.
  • the electronic device 100 receives the moving direction and moving speed of the smart bracelet 200 sent by the smart bracelet 200, and based on the moving direction and moving speed of the smart bracelet 200, determines the user's hand moving direction and hand moving speed.
  • the electronic device 100 determines that the user's hand movement direction is the preset movement direction corresponding to the three-dimensional movement vector (ie, the preset movement direction 1 or the preset movement direction 2).
  • the preset moving direction 1 is moving to the right, and the preset moving direction 2 is moving to the left.
  • the electronic device 100 can confirm that the preset moving direction 1 corresponds to the three-dimensional moving direction 3 in the ground coordinate system, and the preset moving direction 2 corresponds to the opposite direction of the moving direction 3 in the ground coordinate system.
  • the electronic device 100 can determine that the hand moving direction is the preset moving direction 1 (ie, moving to the right); If the projection of the moving direction of the smart bracelet 200 on the above-mentioned moving direction 3 points to the opposite direction of the moving direction 3, the electronic device 100 can determine that the hand moving direction is the preset moving direction 2 (ie moving to the left).
  • the moving speed of the user's hand may be the speed along the above-mentioned three-dimensional moving direction, or may be the speed along the moving direction 3 , which is not specifically limited here.
  • the preset moving direction 1 is clockwise, and the preset moving direction 2 is counterclockwise.
  • the electronic device 100 acquires the three-dimensional moving directions (that is, the moving direction 4 and the moving direction 5) of the smart bracelet 200 at two consecutive moments (such as T4 and T5), if the moving direction 5 and the moving direction 4 of the smart bracelet 200
  • the vector cross product of is greater than zero, then the moving direction 5 is in the clockwise direction of the moving direction 4, and the user's hand moving direction belongs to the preset moving direction 1, that is, moving clockwise; if the moving direction 5 of the smart bracelet 200 and the moving direction If the vector cross product of 4 is less than zero, the moving direction 5 is in the counterclockwise direction of moving direction 4, and the user's hand moving direction belongs to the preset moving direction 2, that is, moving counterclockwise; if the vectors of moving direction 2 and moving direction 1 cross If the product is equal to zero, then the moving direction 5 and the moving direction 4 are collinear.
  • the electronic device 100 determines the moving speed of the user's
  • the preset moving direction 1 includes a clockwise direction (such as moving to the right and moving up), and the preset moving direction 2 includes a counterclockwise direction (such as moving to the left and moving down).
  • the moving direction 5 When collinear with the moving direction 4, it can be judged whether the user’s hand moving direction is moving to the right or upward; if so, determine that the user’s hand moving direction belongs to the preset moving direction 1, otherwise determine the user’s hand moving direction Belongs to default movement direction 2.
  • S108 Determine the first amplitude adjustment direction of the target element based on the user's hand movement direction, and determine the first amplitude adjustment speed of the target element based on the user's hand movement speed and sensitivity.
  • the electronic device 100 determines that the amplitude adjustment direction corresponding to the preset movement direction 1 is the amplitude adjustment direction based on the aforementioned first mapping relationship; when the user's hand The moving direction belongs to the preset moving direction 2, and the electronic device 100 determines that the amplitude adjustment direction corresponding to the preset moving direction 2 is amplitude reduction based on the aforementioned first mapping relationship.
  • the electronic device 100 determines the amplitude adjustment speed v e corresponding to the hand movement speed v h based on the second mapping relationship.
  • v e sen*w 0 *v h , where w 0 is The mapping coefficient from the preset hand moving speed to the amplitude adjustment speed, sen is the sensitivity of the amplitude adjustment. For example, if the volume ranges from 0 to 100, during the volume adjustment process, the hand moving speed is 10cm/s, the value of sen is 1, and the value of w 0 is 2, then the first amplitude adjustment speed of the volume is 20/s.
  • the value of the sensitivity is an initial value.
  • the initial value of sensitivity is 1.
  • the value of the sensitivity is the first value.
  • there may be no sensitivity in the second mapping relationship that is, the value of sen is 1), or the sensitivity may be a non-adjustable fixed value, which are not specifically limited herein.
  • the electronic device 100 determines the amplitude adjustment speed v e corresponding to the hand movement speed v h based on the second mapping relationship.
  • v e w 0 *v h
  • w 0 is preset
  • the mapping coefficient of the hand moving speed to the amplitude adjustment speed, w 0 can also be regarded as the sensitivity of the amplitude adjustment.
  • the electronic device 100 adjusts the magnitude of the target element along the first magnitude adjustment direction at a first magnitude adjustment speed.
  • the first amplitude adjustment direction of the volume is to increase the amplitude
  • the first amplitude adjustment speed of the volume is 20/s, so the electronic device 100 increases the volume at 20/s.
  • the electronic device 100 when the electronic device 100 adjusts the magnitude of the target element, it may announce that the user is adjusting the target element and/or the adjusted output effect of the target element through voice, video, picture, text and other means.
  • the volume indicator bar 301 when the electronic device 100 adjusts the volume following the movement of the user's hand, the volume indicator bar 301 is displayed, and the length of the shaded part in the volume indicator bar 301 is refreshed in real time according to the adjusted volume. It can be understood that the volume indication bar 301 is used to announce the effect of volume adjustment in real time.
  • the electronic device 100 when the electronic device 100 determines that the intent corresponding to the first voice information is the adjustment range, it also determines the target device (such as the third electronic device) for range adjustment based on the slot corresponding to the first voice information; in step S109 The electronic device 100 sends an adjustment request to the target device to control the target device to adjust the amplitude of the target element along the first amplitude adjustment direction at a first amplitude adjustment speed, and the adjustment request carries the target element, the first amplitude adjustment speed, and the first amplitude adjustment direction.
  • the target device such as the third electronic device
  • the first voice information may also indicate an amplitude adjustment direction.
  • the user speaks the first voice information, and in response to the detected first voice information, the electronic device 100 determines an intention of amplitude adjustment indicated by the first voice information, a target element to be adjusted, and an amplitude adjustment direction.
  • the electronic device 100 obtains the hand movement speed through image recognition or the smart bracelet 200, and based on the sensitivity, the aforementioned second mapping relationship and the hand movement speed Get the amplitude adjustment speed, and then adjust the amplitude of the target element according to the amplitude adjustment speed.
  • FIG. 8 Specifically, reference may be made to related descriptions in FIG. 8 , and details are not repeated here.
  • the amplitude adjustment speed is only indicated by the hand moving speed, there is no need to limit the user's hand moving direction in this embodiment. In this way, after the user speaks the first voice information, the user can continuously control the amplitude adjustment of the target element by moving the hand in any direction until the output effect of the target element reaches the expected effect.
  • the user when the user adjusts the magnitude of the target element, the user can adjust the sensitivity of the magnitude adjustment.
  • the hand movement speed is constant, if the sensitivity of the amplitude adjustment is smaller, the amplitude adjustment speed corresponding to the hand movement speed is smaller.
  • S110 to S114 may also be included after step S109.
  • the user speaks the second voice information, and the electronic device 100 receives the second voice information.
  • the user can adjust the sensitivity (ie sen or w 0 ) through the second voice information.
  • the sensitivity ie sen or w 0
  • the user says "decrease the sensitivity”.
  • the electronic device 100 recognizes language text 2 corresponding to the second voice information.
  • the electronic device 100 performs intent recognition and slot filling on the language text 2 above.
  • the electronic device 100 determines whether the intention corresponding to the language text 2 is to adjust the sensitivity; if not, execute step S105; if yes, execute step S114.
  • the electronic device 100 adjusts the sensitivity based on the intent and the slot corresponding to the voice text 2, and executes S108 based on the adjusted sensitivity.
  • steps S111 to S113 For the specific implementation of steps S111 to S113, reference may be made to the relevant descriptions of steps S102 to S104, which will not be repeated here.
  • the electronic device 100 presets the intention of "adjusting sensitivity", and the slot corresponding to “adjusting sensitivity” may include “sensitivity adjustment value” and "sensitivity adjustment direction".
  • the "sensitivity adjustment value” can be a percentage, a multiple, etc., and the "sensitivity adjustment direction” of the sensitivity includes increasing and decreasing.
  • the slot value of "sensitivity adjustment value” corresponding to language text 2 is the first sensitivity adjustment value
  • the slot value of "sensitivity adjustment direction” is the first sensitivity adjustment direction.
  • the slot corresponding to the language text 2 may also be referred to as the slot corresponding to the second voice information.
  • the electronic device 100 when it is determined through slot filling that the user specifies a "sensitivity adjustment value" (for example, a first sensitivity adjustment value) in the language text 2, the electronic device 100 adjusts the sensitivity based on the above-mentioned first sensitivity adjustment value; It is determined by slot filling that the user has not indicated a "sensitivity adjustment value" in the language text 2, and the electronic device 100 may adjust the sensitivity based on a preset ratio.
  • a "sensitivity adjustment value" for example, a first sensitivity adjustment value
  • the preset ratio is 10%. It can be understood that if the "sensitivity adjustment direction" is to increase, and the current sensitivity exceeds 90% of the maximum sensitivity, the electronic device 100 will increase the sensitivity to the maximum sensitivity; if the "sensitivity adjustment direction" is to decrease, and the current sensitivity If it is less than 10% of the maximum sensitivity, the electronic device 100 adjusts the sensitivity down to the minimum sensitivity.
  • the electronic device 100 recognizes that the intent corresponding to the above voice information is "Adjust sensitivity", the sensitivity adjustment direction is to decrease; the electronic device 100 reduces the sensitivity in response to the above voice information; after the sensitivity decreases, when the user's hand moves at a certain speed, the volume adjustment speed decreases.
  • the electronic device 100 may also adjust the magnitude of the target element based on the moving distance of the hand. Specifically, in step S107, the electronic device 100 may acquire the user's hand movement distance; in step S108, the electronic device 100 determines the first amplitude adjustment value of the target element based on the user's hand movement distance and sensitivity; in step S109, the electronic The device 100 The electronic device 100 adjusts the magnitude of the target element along the first magnitude adjustment direction with a first magnitude adjustment value.
  • step S107 the electronic device 100 periodically determines the user's hand movement distance with a preset period of 1, and then determines the first magnitude of the target element corresponding to the hand movement distance according to the preset third mapping relationship Adjust the value to update the amplitude of the target element periodically with the preset period 1 as the user's hand moves.
  • the preset period is 1s.
  • the electronic device 100 may determine the moving distance of the user's hand according to the position of the first feature point in the image captured by the camera. Specifically, reference may be made to the relevant description of the aforementioned FIG. 10 .
  • the electronic device 100 may collect images of the user's hand at time T1 , T2 and T3 respectively. Similar to the hand movement speed, when the preset movement direction 1 is moving to the right, and the preset movement direction 2 is moving to the left, the user's hand movement distance can be a two-dimensional movement vector (such as vector AB or vector BC) The moving distance can also be the moving distance along the X axis (ie, the preset moving direction 1 or the preset moving direction 2), which is not specifically limited here. For example, at time T2, the electronic device 100 determines that from time T1 to time T2, the moving distance of the user's hand is
  • the electronic device 100 may also request the smart bracelet 200 to obtain the moving distance of the smart bracelet 200, and then determine the user's hand moving distance based on the moving distance of the smart bracelet 200; the smart bracelet 200 may feed back the moving distance along the three axes of the ground coordinate system or the moving distance along the three-dimensional moving vector in the ground coordinate system to the electronic device 100 .
  • the smart bracelet 200 may feed back the moving distance along the three axes of the ground coordinate system or the moving distance along the three-dimensional moving vector in the ground coordinate system to the electronic device 100 .
  • the preset moving direction 1 is moving to the right, and the preset moving direction 2 is moving to the left.
  • the electronic device 100 may determine that the preset moving direction 1 corresponds to the three-dimensional moving direction 3 in the ground coordinate system, and the preset moving direction 2 corresponds to the opposite direction of the moving direction 3 in the ground coordinate system.
  • the electronic device 100 determines that the user's hand movement distance may be a distance along the three-dimensional movement vector of the smart bracelet 200 or a distance along the movement direction 3 , which is not specifically limited here.
  • the electronic device 100 determines the amplitude adjustment value D e corresponding to the hand movement distance D h based on the third mapping relationship.
  • D e sen*w 1 *D h
  • w 1 is the mapping coefficient from the preset hand moving distance to the amplitude adjustment value
  • sen is the sensitivity of the amplitude adjustment. For example, if the volume ranges from 0 to 100, during the volume adjustment process, the moving distance of the hand is 5cm, the value of sen is 1, and the value of w 1 is 2, then the first amplitude adjustment value of the volume is 10.
  • step S108 the electronic device 100 determines the amplitude adjustment value D e corresponding to the hand movement distance D h based on the third mapping relationship.
  • D e w 1 *D h , where w 1 It is the mapping coefficient from the preset hand moving distance to the amplitude adjustment value, and w 1 can also be regarded as the sensitivity of the amplitude adjustment.
  • the electronic device 100 detects the second preset condition, the user's hand movement direction and hand movement parameters (ie hand movement speed or hand movement direction) ) to adjust the magnitude of the target element.
  • hand movement parameters ie hand movement speed or hand movement direction
  • the second preset condition is that the time period for which the user stops moving the hand reaches a third preset time period (for example, the third preset time period is 2s).
  • the electronic device 100 detects the second preset condition after the time T2, and the electronic device 100 determines that the user is in the The hand movement direction and hand movement parameters (i.e. hand movement speed or hand movement distance) from T1 to T2, and then adjust the range of the target element based on the user's hand movement direction and hand movement parameters (specifically , refer to the previous examples).
  • the user judges whether the adjusted range of the target element meets the expected effect, and if so, stops adjusting the range of the target element, and when the electronic device 100 detects the first preset condition, stops the current range adjustment process; if not, proceeds to Continue to move the hand for a distance of 2 from time T3 to time T4 and then stop moving again; the electronic device 100 detects the second preset condition after time T4, and the electronic device 100 determines the direction of hand movement of the user from time T3 to time T4 and hand movement parameters (ie, hand movement speed or hand movement distance), and then adjust the magnitude of the target element based on the user's hand movement direction and hand movement parameters.
  • hand movement parameters ie, hand movement speed or hand movement distance
  • the hand of the user may shake slightly when the user stops moving, and the above stop movement includes complete stillness and slight movement within a preset threshold.
  • the electronic device 100 receives the first voice information; when the electronic device 100 determines that the intent corresponding to the first voice information is the adjustment range, it determines the target element for range adjustment based on the first voice information; the electronic device 100 obtains the user's The hand movement parameter; the electronic device 100 determines a first amplitude adjustment parameter corresponding to the hand movement parameter; the electronic device 100 adjusts the amplitude of the target element with the first amplitude adjustment parameter.
  • the hand movement parameter is the hand movement speed
  • the first amplitude adjustment parameter is the aforementioned first amplitude adjustment speed
  • the hand movement parameter is the hand movement distance
  • the first amplitude adjustment parameter is the aforementioned first amplitude adjustment value
  • the electronic device 100 obtains the user's hand movement parameters, which specifically includes: the electronic device 100 obtains the first image and the second image collected by the camera; the electronic device 100 obtains the first image and the second image through hand recognition.
  • the hand moving direction is determined based on the positions of the first feature points of the hand in the first image and the second image.
  • the first image is the aforementioned image 1
  • the second image is the aforementioned image 2
  • the first image is the aforementioned image 2
  • the second image is the aforementioned image 3.
  • the electronic device 100 when it is detected that the intention corresponding to the voice spoken by the user is the adjustment range, the electronic device 100 can obtain the user's hand movement parameters and hand movement direction in real time; along with the user's hand movement, the electronic device 100 100 may continuously adjust the magnitude of the target element along the magnitude adjustment direction indicated by the hand movement direction, and continuously adjust the magnitude adjustment parameter indicated by the hand movement parameter.
  • the range of the target element can be adjusted to the desired effect by simply moving the hand without the need for the user to make complicated gestures; the above solution is applicable to any element that can be adjusted in range, and the user does not need to set specific settings for each element.
  • the above solution can also appropriately reduce performance requirements for the electronic device 100 .
  • the sensitivity of the amplitude adjustment can be adjusted by voice to meet the needs of different users and effectively improve user experience.
  • FIG. 13 shows a system architecture of a dialogue system provided by an embodiment of the present application.
  • the electronic device 100 can determine the user's intention to adjust the range of the target element through voice recognition; and then can follow the movement of the user's hand to adjust the range of the target element.
  • the electronic device 100 includes a front-end processing module for dialog input, a speech recognition (Automatic Speech Recognition, ASR) module, a semantic understanding (Natural Language Understanding, NLU) module, a dialog management (Dialogue Management, DM) module, a natural language Generation (Natural Language Generation, NLG) module, broadcast module and image processing module
  • smart bracelet 200 includes an inertial measurement unit (Inertial measurement unit, IMU).
  • IMU inertial measurement unit
  • Front-end processing module This module is used to process the input voice stream into the data format required by the network model of the ASR module.
  • the front-end processing module performs audio decoding on the received voice information (such as first voice information, second voice information), and uses voiceprint or other features to separate and reduce the input audio decoded voice stream.
  • audio feature extraction is performed on the speech stream after separation and noise reduction, for example, audio features are obtained by audio processing algorithms such as framing, windowing, and short-time Fourier transform (that is, the network model required by the ASR module) Data Format).
  • This module can obtain the audio features output by the front-end processing module, and convert the above audio features into text through the acoustic model and language model for the NLU module to understand the text.
  • the ASR module includes an acoustic model and a language model, the acoustic model and the language model can be trained separately, the acoustic model is used to convert the audio features in the audio features into phonemes and/or text, and the language model is used for The above-mentioned phonemes and/or characters are converted into text corresponding to the user's voice in a series manner.
  • the ASR module includes a joint model, which is a model generated by joint training of an acoustic model and a language model, and the joint model is used to convert the above-mentioned audio features into text corresponding to the user's voice.
  • NLU module This module is used to convert the user's natural language into structured information that the machine can understand, that is, to convert the above-mentioned voice text into executable intentions and slots.
  • the above intent is used to indicate user demands, and the above slots can be understood as relevant parameters for fulfilling the above user demands.
  • the NLU module includes a classification model and a sequence labeling model. The NLU module classifies the above-mentioned language texts into intentions supported by the system through a classification model (such as the aforementioned intent classifier), and then uses the sequence labeling model (such as the aforementioned slot location classifier) to label the slots in the above-mentioned language texts.
  • Image processing module This module is used to perform hand recognition on the image sequence collected by the camera, and determine the user's hand moving direction and hand moving speed based on the position change of the hand in the image sequence.
  • this module is used to judge the execution strategy of the system based on the state of the dialog (that is, the next execution action of the electronic device 100), such as continuing to ask the user, executing user instructions, or recommending other user instructions.
  • This module may include a dialog state tracking (Dialogue State Tracking, DST) module, which records all dialog history and state information, and the auxiliary system understands user requests in combination with context and gives appropriate feedback.
  • the DM module judges whether the intention identified by the NLU module is "adjustment range"; if so, obtains the user's hand movement direction and hand movement speed from the image processing module, and then obtains the corresponding range of the hand movement direction Adjust the speed corresponding to the direction and hand movement speed.
  • the DM module can also determine whether the intention identified by the NLU module is to "adjust sensitivity"; if so, adjust the sensitivity based on the slot identified by the NLU module.
  • NLG module This module is used to obtain the next step execution action selected by the DM module and the current dialogue state maintained by the DST module (language text, intent, slot entered by the user), and use the text-to-speech (TextToSpeech, TTS) module Convert the next action and/or intention into a voice broadcast, and the corresponding broadcast card can also be generated by the voice assistant and displayed to the user.
  • This module can be implemented by configuring templates with specific intentions and specific scenarios, filling in the current dialogue state and outputting text after executing actions, or it can be implemented in a model-based black box, which is not specifically limited here.
  • Broadcasting module the broadcasting module is used for broadcasting according to the broadcasting content generated by the NLG module.
  • broadcast content such as voice broadcast, text display of broadcast cards, etc., which are not specifically limited here.
  • this module is used to detect the three-axis acceleration signal of the electronic device coordinate system of the smart bracelet 200, and the attitude angle of the smart bracelet 200 in the above-mentioned ground coordinate system. According to the above-mentioned acceleration signal and attitude angle, the IMU module can obtain the speed of the smart bracelet 200 along the three axes of the ground coordinate system, and the moving direction of the smart bracelet 200 in the ground coordinate system; and then based on the smart bracelet 200 in the ground coordinate system The moving direction in and the preset moving direction (for example, the aforementioned moving direction 3) can determine the moving direction of the user's hand and the moving speed of the user's hand.
  • the electronic device 100 when playing a video, the electronic device 100 (such as a large-screen device) can adjust target elements (such as volume, brightness, video playback speed, video playback progress, etc.) based on the user's voice information and hand movement. etc.), the electronic device 100 can also control the playing state of the video.
  • target elements such as volume, brightness, video playback speed, video playback progress, etc.
  • the electronic device judges whether there is a face in front of the screen where the video is played during the playback of the video; if there is a face, continue to play the video; The video pauses to the current playback moment of the video.
  • the electronic device can automatically pause the video at the current playback moment of the video, so that when the user watches the video again, he only needs to click the play button to continue playing from the above-mentioned current playback moment.
  • the playback and pause of the video are only controlled according to whether the face exists, when the user does not leave the viewing range of the screen but does not watch the video played on the screen, the user may miss some video clips because the video is still playing.
  • the electronic device 100 can detect whether the user's line of sight is fixed on the screen; if it is detected that the user's line of sight is fixed on the screen, then continue to play the video; then pause the video until the current playback moment of the video.
  • the electronic device can automatically pause the video at the current playing moment of the video.
  • the user can avoid missing the video clip after the line of sight leaves the screen.
  • users watch videos they usually look at other targets (such as mobile phones, other users) in a timely manner, and only controlling video pauses based on their gaze will lead to frequent false pauses in the video.
  • solution 3 This application also provides another solution (referred to as solution 3), in which the electronic device 100 can also collect the user's sight status and character status through the camera, and combine the above-mentioned sight status, character status and the splendor of the current video clip , intelligently control the playback status of the video, effectively improving the user's video viewing experience.
  • the electronic device 100 when the electronic device 100 is playing a video, it continuously detects whether the user's gaze is on the screen.
  • FIG. 14B when the electronic device 100 plays the video to the moment t1 of the video (the first moment involved in the embodiment of the present application may be the moment t1), it detects that the user's line of sight leaves the screen, and the electronic device 100 records the moment t1 (for example, as shown in FIG. 14B ). 00:11:08 shown).
  • FIG. 14C after detecting that the user's line of sight leaves the screen, the electronic device 100 continuously detects whether the user leaves the viewing area of the screen. Referring to FIG.
  • the electronic device 100 plays the video to the time t2 (for example, 00:11:10) of the video, it detects that the user leaves the viewing area of the screen, and the electronic device 100 rolls back the video to the time t1 (that is, the time shown in FIG. 14D ). 00:10:23) and pause.
  • the electronic device 100 pauses the video until time t1 , if it detects that the user's gaze is fixed on the screen again, the electronic device 100 continues to play the video from time t1 .
  • the electronic device 100 after detecting that the user's line of sight leaves the screen at time t1, when the video is played to time t2 (for example, 00:11:10), the electronic device 100 detects that the user has not left the viewing area of the screen; see FIG. 14G After determining that the user has not left the viewing area of the screen, the electronic device 100 judges whether the current video segment is exciting at time t3 (for example, 00:11:11), and if it is wonderful, then pauses the video playback at the current time (that is, time t3). Referring to FIG. 14H , after the electronic device 100 pauses the video until time t3 , if it detects that the user's gaze is fixed on the screen again, the electronic device 100 continues to play the video from time t3 .
  • time t3 for example, 00:11:11
  • the electronic device 100 judges whether the current video segment is exciting at time t3, and if not, then determines the interaction parameters between the user's line of sight and the screen, such as interaction frequency, Staring time etc.
  • the interaction frequency refers to the frequency with which the user's gaze leaves the screen per unit time
  • the gaze duration refers to the duration of the user's gaze on the screen this time.
  • the above interaction parameters may be set by the electronic device 100 by default, or may be set by the user according to his own requirements.
  • the electronic device 100 controls the playback speed of the video based on the interaction parameter.
  • the greater the interaction frequency the greater the video playback speed; when the gaze duration is greater than or equal to the preset threshold, the playback speed is normal; when the gaze duration is less than the preset threshold, the shorter the gaze duration, the faster the video playback speed. big.
  • the electronic device 100 determines that the interaction frequency is greater than the second threshold, the electronic device 100 plays the video at 2 times the normal playback speed, and displays a 2 times speed indicator 302 .
  • FIG. 15 shows a flowchart of a device control method provided by an embodiment of the present application.
  • the device control method includes but not limited to the following steps S201 to S209.
  • the electronic device 100 plays a video, it continuously detects whether the user's line of sight is fixed on the screen through the image collected by the camera; the electronic device 100 detects that the line of sight leaves the screen (that is, the line of sight is not fixed on the screen) when the electronic device 100 plays the video to the moment t1 of the video, and the electronic device 100 records time t1.
  • the user's line of sight may leave the screen and look in other directions.
  • the camera when the electronic device 100 is playing a video, the camera continuously captures the image in front of the screen; the electronic device 100 can use eye tracking (Eye tracking/gaze tracking) technology to obtain the user's eye position based on the image collected by the above camera. Gaze point and/or gaze direction; based on the gaze direction and/or gaze point, the electronic device 100 may determine whether the user's gaze leaves the screen.
  • the point of gaze may be the focal point of the user's line of sight on the plane of the screen; the line of sight direction may be represented by the viewing angle and/or line of sight vector of the line of sight in the preset coordinate system.
  • the above-mentioned preset coordinate system is the electronic device coordinate system of the electronic device 100, in which the y-axis is the direction from the left side to the right side along the bottom of the screen of the electronic device 100, and x The axis is the direction from the bottom to the top along the left side of the electronic device 100, the z-axis is perpendicular to the plane composed of the x-axis and the y-axis, and the z-axis is perpendicular to the screen of the electronic device 100 from the back of the electronic device 100.
  • the above viewing angle may include: the angle ⁇ between the line of sight direction and the horizontal plane (the plane formed by the y-axis and the z-axis), the angle ⁇ between the line-of-sight direction and the side plane (the plane formed by the x-axis and the z-axis) , the angle ⁇ between the line of sight direction and the vertical plane (the plane formed by the x-axis and y-axis).
  • the line-of-sight vector is a direction vector starting from the position of the eye and ending at the position of the gaze point.
  • the direction vector may include the three-dimensional coordinates of the eye in the preset coordinate system and the three-dimensional coordinates of the gaze point in the preset coordinate system. Exemplarily, as shown in FIG.
  • the gaze point of the user's line of sight 1 is E(a1, b1, 0), and the gaze point is located on the screen of the electronic device 100;
  • the gaze point of the user's line of sight 2 is F(a2, b2, 0), the gaze point is located outside the screen of the electronic device 100;
  • the three viewing angles corresponding to the line of sight direction of line of sight 1 are ⁇ , ⁇ and ⁇ respectively, and the line of sight vector of line of sight 1 is the vector GE;
  • the position of the user's eyes is G (a3, b3, c1).
  • the electronic device 100 determines whether the gaze point coordinates of the user's eyes are located on the screen of the electronic device 100;
  • the electronic device 100 can determine the intersection point of the line of sight 1 and the plane where the screen is located based on the position of the user's eyes (such as G(a3, b3, c1)) and the direction of the line of sight (such as ⁇ , ⁇ , and ⁇ ). (i.e. point of gaze); electronic device 100 judges whether the gaze point coordinates of the user's eyes are located on the screen of electronic device 100; if so, then determine that the user's line of sight is fixed on the screen;
  • the electronic device 100 can obtain the gaze tracking model in advance;
  • the input of the gaze tracking model is an image containing eyes or eye feature parameters in the image (such as the degree of eye opening and closing, the position of the inner and outer corners of the left eye, the position of the right eye inner and outer eye corner positions, pupil positions, etc.)
  • the output of the above-mentioned line-of-sight tracking model is the line-of-sight direction and/or fixation point of the eyes in the preset coordinate system in the above-mentioned image.
  • the training device (such as a cloud server or electronic device 100) collects a large number of training images containing eyes, and obtains eye feature parameters and line-of-sight directions (or gaze points) corresponding to each training image, and uses the above-mentioned eye feature Parameters and gaze direction (or gaze point) to train the above gaze tracking model, so as to obtain a trained gaze tracking model.
  • the aforementioned preset coordinate system, eye feature parameters and coordinates of the gaze point may be two-dimensional or three-dimensional, and are not specifically limited here.
  • the camera collects images in real time; extracts the eye feature parameters in the above images, and inputs the above eye feature parameters into the line of sight tracking model, and the line of sight tracking model outputs the fixation point or line of sight of the user's line of sight direction.
  • the electronic device 100 may determine the three-dimensional position of the user's eyes in the preset coordinate system in various ways, which is not specifically limited here.
  • the electronic device 100 obtains the three-dimensional position of the eyes in the preset coordinate system, which specifically includes: the electronic device performs face detection on the image collected by the camera, and obtains two of the above-mentioned images of at least one feature point of the face. Two-dimensional position; combine the pre-built 3D face model to solve the perspective n-point projection algorithm (Perspective-n-Point, PnP) for the two-dimensional feature points of the face, and obtain the two-dimensional feature points of the face in the preset coordinate system 3D feature points. Based on the 3D feature points of the face, the 3D position of the eyes in the preset coordinate system can be estimated.
  • PnP Perspective n-point projection algorithm
  • the electronic device 100 detects whether the user leaves the viewing range of the screen; if it detects whether the user leaves the viewing range of the screen, execute S203; otherwise, execute S206.
  • the electronic device 100 performs human body detection or human face detection on images collected by the camera; if no human body or human face is detected, the electronic device 100 determines that the user leaves the viewing range of the screen.
  • Human detection or face detection can use a single-frame multi-scale detector (Single Shot MultiBox Detector, SSD) neural network model.
  • the user wears a wearable device (such as a smart bracelet 200), and the electronic device 100 can obtain the position (such as distance and orientation) of the smart bracelet 200 in real time; the electronic device 100 can determine the orientation of the screen according to the orientation of the screen. Viewing range, that is, the distance range and azimuth range where the screen can be seen; when the electronic device 100 determines that the distance of the smart bracelet 200 exceeds the above-mentioned distance range, and/or, when the azimuth of the smart bracelet 200 exceeds the above-mentioned azimuth range, the electronic device 100 Determines how far the user is looking away from the screen.
  • a wearable device such as a smart bracelet 200
  • the electronic device 100 can obtain the position (such as distance and orientation) of the smart bracelet 200 in real time; the electronic device 100 can determine the orientation of the screen according to the orientation of the screen. Viewing range, that is, the distance range and azimuth range where the screen can be seen; when the electronic device 100 determines that the distance of the smart bracelet 200 exceeds
  • the electronic device 100 continuously detects whether the user leaves the viewing range of the screen; if it is detected that the user leaves, the video is rolled back to time t1 when the user's sight leaves the screen and paused.
  • the electronic device 100 pauses the video until time t1, it continues to continuously detect whether the user's sight is fixed on the screen through the image collected by the camera. When it is detected that the user's gaze is fixed on the screen again, the electronic device 100 controls the video to continue playing from time t1.
  • the electronic device 100 judges whether the currently played video segment is exciting; if the video segment is judged to be exciting when the video is played to time t3 of the video, execute S206; otherwise, execute S207.
  • the electronic device 100 determines whether the currently played video segment is exciting or not.
  • the electronic device 100 may acquire the wonderfulness evaluation model in advance.
  • step S205 the electronic device 100 extracts video features of the currently playing video segment; inputs the video features into the wonderfulness assessment model, and the wonderfulness assessment model outputs the wonderfulness assessment result of the video clip.
  • the splendor evaluation model can use the inflated 3D convolution (Inflated 3D conv, I3D) model, the I3D model will migrate the classic model successfully trained on Imagenet (that is, a large-scale visualization database) to the video (video) data set, using 3D
  • the convolution extracts the temporal feature of the RGB stream corresponding to the image captured by the camera, and finally uses the optical-flow to improve the network performance, and finally a good splendor evaluation model can be obtained.
  • the wonderfulness evaluation result is an indicator 1 indicating wonderful or an indicator 2 indicating not exciting.
  • the wonderfulness evaluation result is a numerical value, and if the numerical value is greater than a preset value, it is judged that the above video clip is wonderful, otherwise it is judged that the above video clip is not wonderful.
  • the splendor of the video segment may also be preset by the producer, supplier or copyright owner of the video.
  • the producer of the video intends to focus on showing exciting video clips to users, and attaches a mark indicating the wonderful video to the exciting video clips in the video.
  • the electronic device 100 plays a wonderful video segment, it can pause the video, so as to prevent the user from missing the above-mentioned video segment.
  • the video played by the electronic device 100 may include at least one video clip, and different video clips may or may not overlap, which is not specifically limited here.
  • the video clips are pre-divided, continuous and non-overlapping, and in step S205 the electronic device 100 judges whether the currently playing video clip is exciting.
  • the electronic device 100 intercepts a video segment of the video based on the current playing time of the video to identify the excitement.
  • the video segment includes the current playing time of the video. It can also be set by users according to their own needs.
  • the electronic device 100 pauses the video until time t3 currently played.
  • S205 may be executed. It can be understood that when it is detected that the user's gaze is fixed on the screen again, the electronic device 100 controls the video to continue playing from time t3.
  • the electronic device 100 acquires interaction parameters of the user's line of sight and the screen of the electronic device 100 .
  • the electronic device 100 controls the video playback speed based on the interaction parameter.
  • the interaction parameters between the user's line of sight and the screen of the electronic device 100 include interaction frequency and/or gaze duration.
  • the interaction frequency refers to the frequency at which the user's sight leaves (or returns) the screen within a preset time.
  • Gaze duration refers to the duration of a user's single gaze on the screen.
  • the video playback speed is the first speed, and the first speed is greater than the normal playback speed, for example, the first speed is twice the normal playback speed; when the interaction frequency is less than or equal to the second threshold , the video plays at normal playback speed.
  • the interaction frequency is greater than the second threshold, the greater the interaction frequency, the greater the video playback speed.
  • the gaze duration is less than the third threshold, the video playback speed is the second speed, and the second speed is greater than the normal playback speed, for example, the second speed is 1.5 times the normal playback speed; when the gaze duration is greater than or equal to the third threshold , the video plays at normal playback speed.
  • the fixation duration is less than the third threshold, the shorter the fixation duration, the greater the video playback speed.
  • the playback speed of the video can be appropriately increased; when the interaction frequency is small and/or the gaze duration is long, the video is played at a normal speed.
  • the video playback speed is increased to 2 times speed; when the eyes are fixed on the screen for more than 1 second, the playback speed is normal.
  • the second threshold and the third threshold may be set by default by the electronic device 100 , or may be set by the user according to their own requirements.
  • step S205 if the electronic device 100 determines that the currently played video segment is not exciting, the electronic device 100 returns to perform S202 while performing S207, that is, continues to detect whether the user leaves. It can be understood that, during the execution of S207 and S208, if the electronic device 100 detects that the user leaves, the video will be paused until the current playback moment; Wonderful or not, and so on, repeating the cycle.
  • step S202 shown in FIG. 15 may be replaced with S210.
  • the electronic device 100 judges whether the duration of the user's gaze away from the screen is greater than a first threshold; if yes, execute S203; otherwise, execute S205.
  • the electronic device 100 before step S201, the electronic device 100 also determines whether the currently playing video is an advertisement; if so, execute S210 after step S201; otherwise, execute S202 after S201.
  • the electronic device 100 before step S201, the electronic device 100 also determines whether the current environment is an elevator; if so, execute S210 after step S201; otherwise, execute S202 after S201.
  • the electronic device 100 determines whether the current environment is an elevator by performing elevator recognition on the elevator image collected by the camera.
  • the weightlessness and overweight state in the elevator can be identified by means of detecting acceleration and deceleration deceleration by the acceleration sensor, and detecting changes in air pressure by the air pressure sensor, so as to determine whether the current environment is an elevator.
  • the electronic device 100 continuously performs face recognition on images collected by the camera, and uses gaze tracking technology to perform gaze tracking on each identified user or a specific preset user (such as user 1), and detects Whether the user's line of sight is fixed on the screen; in step 201, when it is detected that user 1's line of sight leaves the screen, the electronic device 100 records the t1 moment when the video is currently played; in step 202, determine whether user 1 leaves the viewing range of the screen by face detection; In step 204, after determining that user 1 has left the viewing range of the screen, continue to perform face recognition on the images collected by the camera.
  • step S207 the electronic device 100 continues to perform face recognition on the image captured by the camera, and obtains interaction parameters between user 1's line of sight and the screen of the electronic device 100 by using gaze tracking technology. It can be understood that if face recognition is performed for a specific preset user, the electronic device 100 stores the user's face features or face images.
  • the video when it is detected that the user's line of sight leaves the screen and the user leaves, the video can be rolled back to the time t1 when the user's line of sight leaves the screen, and then continue to play when the user's line of sight returns to the screen.
  • the playback and pause of the video can also be controlled according to the splendor of the video segment.
  • the electronic device 100 controls the video pause, which can prevent the loss of the exciting video clip; when the video clip is not exciting and the user seldom looks at the screen, the video playback speed can be accelerated to improve the user's viewing efficiency. It can be understood that when the video clip is not exciting, the user can control the playback speed of the video through the interaction frequency and gaze duration of the line of sight and the screen, reflecting the user's autonomous video control.
  • the device control method shown in FIG. 17 is adopted.
  • the above-mentioned specific environment is an elevator.
  • the electronic device 100 on the elevator wall plays a video, since people always come and go in and out of the elevator, it can always be detected that the user is within the viewing range of the screen through human body detection or face detection.
  • the device control method shown in FIG. 17 can be adopted, that is, human body detection (or human face detection) is replaced by line-of-sight time detection.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the present application will be produced in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
  • the processes can be completed by computer programs to instruct related hardware.
  • the programs can be stored in computer-readable storage media.
  • When the programs are executed may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random access memory RAM, magnetic disk or optical disk, and other various media that can store program codes.

Abstract

本申请公开了设备控制方法及相关装置,该方法中,第一电子设备接收第一语音信息;当第一电子设备确定第一语音信息对应的意图为调整幅度时,基于第一语音信息确定幅度调整的目标元素;第一电子设备获取用户的手部移动参数;第一电子设备确定手部移动参数对应的第一幅度调整参数;第一电子设备以第一幅度调整参数调整目标元素的幅度。这样,能够提高用户隔空控制目标元素的幅度的操作效率,有效提高用户体验。

Description

设备控制方法及相关装置
本申请要求于2021年12月28日提交中国专利局、申请号为202111633811.1、申请名称为“设备控制方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及电子技术领域,尤其涉及设备控制方法及相关装置。
背景技术
随着技术的发展,用户可以通过语音与电子设备的语音助手进行交互,以控制电子设备调节目标元素的幅度,例如目标元素可以为音量、亮度等等;电子设备通常按照预设比例或者按照用户的语音中指定的比例,调节上述目标元素的幅度。例如,预设比例为5%,当检测到用户说“调大音量”时,电子设备按照预设比例将音量调大5%;当检测到用户说“调大音量到50%”时,电子设备将音量调大到最大音量的50%。然而,用户不能预知一次幅度调整后目标元素的输出效果怎么样,是否符合预期效果,因此,用户通常难以确定自己预期效果对应的调整比例是多少,用户可能需要与电子设备的语音助手进行多次语音交互,才能将目标元素的幅度调整至预期效果。这样,用户操作繁琐,用户的使用体验较差。
发明内容
本申请提供了设备控制方法及相关装置,能够提高用户隔空控制目标元素的幅度的操作效率,有效提高用户体验。
第一方面,本申请提供了一种设备控制方法,包括:第一电子设备接收第一语音信息;当第一电子设备确定第一语音信息对应的意图为调整幅度时,基于第一语音信息确定幅度调整的目标元素;第一电子设备获取用户的手部移动参数;第一电子设备确定手部移动参数对应的第一幅度调整参数;第一电子设备以第一幅度调整参数调整目标元素的幅度。
实施本申请实施例,当检测到用户说出的语音对应的意图为调整幅度时,第一电子设备可以获取用户的手部移动速度和手部移动参数;伴随着用户的手部移动,第一电子设备能以手部移动参数指示的幅度调整参数持续性地调整目标元素的幅度。这样,不需要用户做出复杂的手势,仅通过简单的手部移动即可以调节目标元素的幅度至预期效果,有效提高了用户隔空控制目标元素的幅度的操作效率,进而有效提高了用户体验;上述方案对任意可进行幅度调整的元素均适用,无需用户针对每种元素设置特定的手势;由于无需第一电子设备识别繁复的手势,上述方案也可以适当降低对第一电子设备的性能要求。
在一种实现方式中,所述第一电子设备以第一幅度调整参数调整目标元素的幅度之前,还包括:第一电子设备获取用户的手部移动方向;第一电子设备确定手部移动方向对应的幅度调整方向为第一幅度调整方向,幅度调整方向包括幅度调大和幅度调小;所述第一电子设备以第一幅度调整参数调整目标元素的幅度,包括:第一电子设备沿第一幅度调整方向以第一幅度调整参数调整目标元素的幅度。实施本申请实施例,当检测到用户说出的语音对应的意图为调整幅度时,第一电子设备还可以获取用户的手部移动方向;伴随着用户的手部移动,第一电子设备能沿上述手部移动方向指示的幅度调整方向持续性地调整目标元素的幅度。这 样,不需要用户做出复杂的手势,仅通过简单的手部移动即可以确定目标元素的幅度调整方向,有效提高了用户隔空控制目标元素的幅度的操作效率。
在一种实现方式中,所述方法还包括:当第一电子设备确定第一语音信息对应的意图为调整幅度时,第一电子设备还基于第一语音信息对应的槽位确定幅度调整方向为第一幅度调整方向,幅度调整方向包括幅度调大和幅度调小;所述第一电子设备以第一幅度调整参数调整目标元素的幅度,包括:第一电子设备沿第一幅度调整方向以第一幅度调整参数调整目标元素的幅度。实施本申请实施例,当检测到用户说出的语音对应的意图为调整幅度时,第一电子设备还可以获取上述语音指示的幅度调整方向。这样,用户说出语音后,即可通过任意方向的手部移动持续性地控制目标元素的幅度调整,直到目标元素的输出效果达到预期效果,有效提高了用户体验。
在一种实现方式中,手部移动参数为手部移动速度,第一幅度调整参数为幅度调整速度;或者,手部移动参数为手部移动距离,第一幅度调整参数为第一幅度调整值。实施本申请实施例,第一电子设备可以基于手部移动速度指示的幅度调整速度调整目标元素的幅度,也可以基于手部移动距离指示的第一幅度调整值调整目标元素的幅度,此处不做具体限定。
在一种实现方式中,所述第一电子设备确定手部移动参数对应的第一幅度调整参数,具体包括:第一电子设备基于手部移动参数和第一灵敏度确定第一幅度调整参数。其中,手部移动参数一定时,第一灵敏度越大,第一幅度调整参数越大;第一灵敏度一定时,手部移动参数越大,第一幅度调整参数越大。
在一种实现方式中,所述第一电子设备确定手部移动参数对应的第一幅度调整参数之后,所述方法还包括:第一电子设备接收第二语音信息;当第一电子设备确定第二语音信息对应的意图为调整灵敏度时,基于第一语音信息对应的槽位确定第一灵敏度调整方向和第一灵敏度调整值;沿第一灵敏度调整方向以第一灵敏度调整值调整第一灵敏度。实施本申请实施例,在幅度调整过程中,用户能够通过语音调整灵敏度。这样,可以满足不同用户的需求,有效提升了用户体验。
在一种实现方式中,所述第一电子设备确定手部移动参数对应的第一幅度调整参数之后,所述方法还包括:第一电子设备检测到第一预设条件时,结束目标元素的本次幅度调整;第一预设条件包括以下一项或多项:接收第一语音信息后的时长超过第一预设时长;第二预设时长内,未检测到用户的手部的有效移动;接收到用于停止幅度调整的第一预设手势;接收到用于停止幅度调整的第三语音信息;其中,手部的有效移动指:手部沿预设移动方向的移动距离大于距离阈值,或者,手部的移动速度大于速度阈值。实施本申请实施例,在检测到第一预设条件时,第一电子设备即可结束目标元素的本次幅度调整,不再获取用户的手部移动方向和手部移动参数;用户通过特定语音才能再次触发目标元素的幅度调整。
在一种实现方式中,所述第一电子设备获取用户的手部移动参数,具体包括:第一电子设备获取摄像头采集的第一图像和第二图像;第一电子设备通过手部识别,获取第一图像和第二图像中的手部的第一特征点的位置;基于第一图像和第二图像中的手部的第一特征点的位置,确定用户的手部移动参数。
在一种实现方式中,手部移动方向是基于第一图像和第二图像中的手部的第一特征点的位置确定的。
在一种实现方式中,第二电子设备佩戴于用户的手部,所述第一电子设备获取用户的手部移动参数,包括:第一电子设备向第二电子设备发送获取请求;第一电子设备接收第二电 子设备发送的第二电子设备在第一坐标系的移动速度和/或移动距离;基于第二电子设备在第一坐标系的移动速度和/或移动距离确定用户的手部移动参数。实施本申请实施例,第一电子设备可以通过第二电子设备获取用户的手部移动方向和手部移动参数。这样,降低了对第一电子设备的性能要求和功耗。
在一种实现方式中,手部移动方向是基于第二电子设备在第一坐标系的移动方向确定的。
在一种实现方式中,幅度调整方向包括幅度调大和幅度调小,第一电子设备中预置有第一映射关系,第一映射关系中幅度调大对应至少一个预设移动方向,第一映射关系中幅度调小对应至少一个预设移动方向,幅度调大对应的预设移动方向不同于幅度调小对应的预设移动方向;当基于第一映射关系确定手部移动方向属于幅度调大对应的预设移动方向时,第一幅度调整方向为幅度调大;当基于第一映射关系确定手部移动方向属于幅度调小对应的预设移动方向时,第一幅度调整方向为幅度调小。实施本申请实施例,幅度调大和幅度调小均可对应一个或多个预设移动方向。这样,可以给用户更多的选择,有效提高了用户体验。
在一种实现方式中,所述方法还包括:当第一电子设备确定第一语音信息对应的意图为调整幅度时,还基于第一语音信息对应的槽位确定幅度调整的目标设备为第三电子设备;所述第一电子设备以第一幅度调整参数调整目标元素的幅度,包括:第一电子设备向第三电子设备发送调整请求,以控制第三电子设备沿第一幅度调整方向以第一幅度调整参数调整目标元素的幅度,调整请求携带目标元素、第一幅度调整参数和第一幅度调整方向。这样,不限于调整本设备上的目标元素的幅度,还可以调整已连接的其他电子设备的目标元素的幅度。
在一种实现方式中,所述第一电子设备获取用户的手部移动参数,包括:第一电子设备获取用户的第二预设手势的手部移动参数。实施本申请实施例,用户通过第二预设手势的移动指示目标元素的幅度调整方向和幅度调整速度,上述目标元素可以为任意可进行幅度调整的元素。这样,第一电子设备仅需识别一种手势(即第二预设手势),对第一电子设备的系统性能要求较低;用户也无需记忆繁复的手势,仅通过一种手势即可实现多种元素的幅度调整,有效提高了用户体验。
在一种实现方式中,所述第一电子设备获取用户的手部移动参数,包括:第一电子设备获取摄像头采集的第一图像和第二图像;第一电子设备通过手势识别,识别第一图像和第二图像是否包含第二预设手势;所述第一电子设备获取用户的第一手势的手部移动参数,包括:当第一图像和第二图像包含第二预设手势时,基于第一图像和第二图像中的手部的第一特征点的位置,确定用户的手部移动参数。
在一种实现方式中,所述方法还包括:第一电子设备播放视频;第一电子设备播放视频至视频的第一时刻时,检测到用户的视线离开第一电子设备的屏幕,第一电子设备记录第一时刻;第一电子设备检测用户是否离开屏幕的观看范围或用户的视线离开屏幕的时长是否超过第一阈值;当检测到用户离开屏幕的观看范围或用户的视线离开屏幕的时长超过第一阈值时,第一电子设备将视频回退至第一时刻并暂停。
实施本申请实施例,在检测到用户视线离开屏幕,以及用户视线离开超时或用户离开屏幕的观看范围时,可以将视频回退至用户视线离开屏幕的时刻。这样,可以避免用户错过视线离开屏幕后播放的视频片段,同时也能避免用户视线意外离开屏幕导致的视频的频繁误暂停,实现了视频播放的智能控制,有效提升了用户的视频观看体验。
在一种实现方式中,所述方法还包括:第一电子设备暂停视频后,当第一电子设备检测到用户的视线再次注视屏幕时,控制视频继续播放。实施本申请实施例,基于用户视线和人 物状态(即是否离开屏幕的观看范围)控制视频暂停后,在用户视线再次回归屏幕时,无需用户操作第一电子设备即可自动控制视频继续播放。这样,减少了用户操作,进一步实现了视频播放的智能控制,有效提升了用户的视频观看体验。
在一种实现方式中,所述方法还包括:当检测到用户未离开屏幕的观看范围或用户的视线离开屏幕的时长未超过第一阈值时,第一电子设备检测当前播放的视频片段是否精彩;当检测到当前播放的视频片段精彩时,第一电子设备将视频暂停。实施本申请实施例,还可以通过视频片段的精彩度控制视频的播放和暂停。在视频片段精彩时,第一电子设备控制视频暂停,可以避免用户错过精彩的视频片段。
在一种实现方式中,所述方法还包括:当检测到当前播放的视频片段不精彩时,获取用户的视线与屏幕的交互参数,并基于交互参数控制视频的播放速度。这样,在视频片段不精彩时,用户能够通过视线与屏幕的交互参数控制视频的播放速度,体现了用户自主化的视频控制,提高了用户的视频观看效率。
在一种实现方式中,交互参数包括交互频率和/或注视时长,交互频率指预设时间内用户的视线离开屏幕的频率,注视时长指用户单次注视屏幕的时长,所述基于交互参数控制视频的播放速度,包括:当交互频率大于第二阈值时,控制视频播放速度为第一速度,第一速度大于正常播放速度;当交互频率小于等于第二阈值时,控制视频播放速度为正常播放速度;当注视时长小于第三阈值时,控制视频播放速度为第二速度,第二速度大于正常播放速度;当注视时长大于等于第三阈值时,控制视频播放速度为正常播放速度。这样,在视频片段不精彩,且交互频率较高和/或注视时长较短时,可以适当提高视频的播放速度,进而提高用户的视频观看效率。
在一种实现方式中,所述方法还包括:当检测到当前播放的视频片段不精彩时,第一电子设备还检测用户是否离开屏幕的观看范围或用户的视线离开屏幕的时长是否超过第一阈值。
在一种实现方式中,所述检测到用户的视线离开第一电子设备的屏幕之前,所述方法还包括:第一电子设备判断当前所处环境是否为电梯;若当前所处环境为电梯,检测到用户的视线离开第一电子设备的屏幕之后,第一电子设备检测用户的视线离开屏幕的时长是否超过第一阈值。
在一种实现方式中,视频包括至少一个预先划分好的视频片段,视频片段是否精彩度可以是视频的制作商、供应商或版权方预先设置的。
第二方面,本申请提供一种设备控制方法,包括:第一电子设备播放视频;第一电子设备播放视频至视频的第一时刻时,检测到用户的视线离开第一电子设备的屏幕,第一电子设备记录第一时刻;第一电子设备检测用户是否离开屏幕的观看范围或用户的视线离开屏幕的时长是否超过第一阈值;当检测到用户离开屏幕的观看范围或用户的视线离开屏幕的时长超过第一阈值时,第一电子设备将视频回退至第一时刻并暂停。
实施本申请实施例,在检测到用户视线离开屏幕,以及用户视线离开超时或用户离开屏幕的观看范围时,可以将视频回退至用户视线离开屏幕的时刻。这样,可以避免用户错过视线离开屏幕后播放的视频片段,同时也能避免用户视线意外离开屏幕导致的视频的频繁误暂停,实现了视频播放的智能控制,有效提升了用户的视频观看体验。
在一种实现方式中,所述方法还包括:第一电子设备暂停视频后,当第一电子设备检测到用户的视线再次注视屏幕时,控制视频继续播放。实施本申请实施例,基于用户视线和人物状态(即是否离开屏幕的观看范围)控制视频暂停后,在用户视线再次回归屏幕时,无需 用户操作第一电子设备即可自动控制视频继续播放。这样,减少了用户操作,进一步实现了视频播放的智能控制,有效提升了用户的视频观看体验。
在一种实现方式中,所述方法还包括:当检测到用户未离开屏幕的观看范围或用户的视线离开屏幕的时长未超过第一阈值时,第一电子设备检测当前播放的视频片段是否精彩;当检测到当前播放的视频片段精彩时,第一电子设备将视频暂停。实施本申请实施例,还可以通过视频片段的精彩度控制视频的播放和暂停。在视频片段精彩时,第一电子设备控制视频暂停,可以避免用户错过精彩的视频片段。
在一种实现方式中,所述方法还包括:当检测到当前播放的视频片段不精彩时,获取用户的视线与屏幕的交互参数,并基于交互参数控制视频的播放速度。这样,在视频片段不精彩时,用户能够通过视线与屏幕的交互参数控制视频的播放速度,体现了用户自主化的视频控制,提高了用户的视频观看效率。
在一种实现方式中,交互参数包括交互频率和/或注视时长,交互频率指预设时间内用户的视线离开屏幕的频率,注视时长指用户单次注视屏幕的时长,所述基于交互参数控制视频的播放速度,包括:当交互频率大于第二阈值时,控制视频播放速度为第一速度,第一速度大于正常播放速度;当交互频率小于等于第二阈值时,控制视频播放速度为正常播放速度;当注视时长小于第三阈值时,控制视频播放速度为第二速度,第二速度大于正常播放速度;当注视时长大于等于第三阈值时,控制视频播放速度为正常播放速度。这样,在视频片段不精彩,且交互频率较高和/或注视时长较短时,可以适当提高视频的播放速度,进而提高用户的视频观看效率。
在一种实现方式中,所述方法还包括:当检测到当前播放的视频片段不精彩时,第一电子设备还检测用户是否离开屏幕的观看范围或用户的视线离开屏幕的时长是否超过第一阈值。
在一种实现方式中,所述检测到用户的视线离开第一电子设备的屏幕之前,所述方法还包括:第一电子设备判断当前所处环境是否为电梯;若当前所处环境为电梯,检测到用户的视线离开第一电子设备的屏幕之后,第一电子设备检测用户的视线离开屏幕的时长是否超过第一阈值。
在一种实现方式中,视频包括至少一个预先划分好的视频片段,视频片段是否精彩度可以是视频的制作商、供应商或版权方预先设置的。
第三方面,本申请提供了一种电子设备,包括一个或多个处理器和一个或多个存储器。该一个或多个存储器与一个或多个处理器耦合,一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,使得电子设备执行上述任一方面任一项可能的实现方式中的设备控制方法。
第四方面,本申请实施例提供了一种计算机存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述任一方面任一项可能的实现方式中的设备控制方法。
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行上述任一方面任一项可能的实现方式中的设备控制方法。
附图说明
图1为本申请实施例提供的一种通信系统的系统架构图;
图2为本申请实施例提供的一种电子设备的结构示意图;
图3A为本申请实施例提供的一种智能手环的结构示意图;
图3B为本申请实施例提供的一种地面坐标系的示意图;
图3C为本申请实施例提供的一种电子设备坐标系的示意图;
图4A至图4C为本申请实施例提供的音量调整的应用场景示意图;
图5A至图5C为本申请实施例提供的音量调整的应用场景示意图;
图6A和图6B为本申请实施例提供的灵敏度调整的应用场景示意图;
图7A和图7B为本申请实施例提供的停止音量调整的应用场景示意图;
图8为本申请实施例提供的一种设备控制方法的流程示意图;
图9为本申请实施例提供的一种联合识别模型的示意图;
图10为本申请实施例提供的一种获取手部移动方向和手部移动速度的流程示意图;
图11A和图11B为本申请实施例提供的获取手部的特征点的示意图;
图11C和图11D为本申请实施例提供的图像中的手部移动示意图;
图11E和图11F为本申请实施例提供的手部移动方向和幅度调整方向的示意图;
图12为本申请实施例提供的另一种获取手部移动方向和手部移动速度的流程示意图;
图13为本申请实施例提供的一种对话系统的系统架构图;
图14A至图14J为本申请实施例提供的控制视频播放的应用场景示意图;
图15为本申请实施例提供的另一种设备控制方法的流程示意图;
图16为本申请实施例提供的用户视线示意图;
图17为本申请实施例提供的另一种设备控制方法的流程示意图。
具体实施方式
下面将结合附图对本申请实施例中的技术方案进行清楚、详尽地描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
首先介绍本申请实施例提供涉及的通信系统10。
图1示例性地示出了本申请实施例提供的一种通信系统10的系统架构图。如图1所示,该通信系统10可以包括电子设备100(本申请实施例涉及的第一电子设备可以为电子设备100),还可以包括与电子设备100连接的至少一个电子设备200(本申请实施例涉及的第二电子设备可以为电子设备200)。其中,
本申请实施例中,用户可以结合语音和手势两种操作方式,与电子设备100(例如智能手机、智能家居设备等)进行交互,以实现电子设备100对电子设备100或其他电子设备的控制,例如,调整电子设备100或其他电子设备的目标元素的幅度。示例性的,目标元素可以为音量、亮度、显示屏亮度、窗帘拉开程度、风扇风速、灯光亮度、空调温度等可以进行幅度调整的元素,本申请实施例对目标元素的类型不作具体限定。例如,电子设备100为智能手机,用户可以结合语音和手势两种操作方式,与智能手机进行交互,以控制智能手机调 整智能窗帘的窗帘拉开程度。
在一种实现方式中,电子设备100可以具备麦克风和语音识别能力,以实现对采集到的环境声音进行语音识别,进而基于识别到的语音指令确定待调整的目标元素;电子设备100还可以具备摄像头和手势识别能力,以实现对采集到的图像进行手势识别,进而确定用户手势的移动方向和移动速度。结合电子设备100识别到的语音指令和用户手势,电子设备100可以调整目标元素的幅度。其中,上述摄像头可以是低功耗摄像头。
在一些实施例中,也可以由电子设备200接收并识别用户的语音指令和/或手势,再发送给电子设备100。电子设备200是可以手持或佩戴于手上的电子设备,例如电子设备200可以是可穿戴设备(例如智能手环)、遥控设备(例如电视机的遥控器)等,此处不做具体限定。后续实施例以电子设备200为智能手环200为例进行说明。本申请实施例涉及的第二电子设备可以为智能手环200。
在一种实现方式中,电子设备100可以具备麦克风和语音识别能力,可以实现对采集到的环境声音进行语音识别,进而基于识别到的语音指令确定待调整的目标元素;智能手环200可以具备加速度传感器和陀螺仪传感器,以获取用户手部的移动方向和移动速度,并发送给电子设备100。结合电子设备100识别到的语音指令,以及智能手环200获取的用户手势的移动方向和移动速度,电子设备100可以调整目标元素的幅度。不限于加速度传感器和陀螺仪传感器,本申请过实施例中,电子设备200也可以通过其他传感器获取用户手部的移动方向和移动速度,此处不做具体限定。
在一些实施例中,电子设备100可以通过近距离无线通信连接或本地有线连接与智能手环200进行直接连接。示例性的,电子设备100和智能手环200可以具有无线保真(wireless fidelity,WiFi)通信模块、超宽带(ultra wide band,UWB)通信模块、蓝牙(bluetooth)通信模块、近场通信(near field communication,NFC)通信模块、ZigBee通信模块中的一项或多项近距离通信模块。电子设备100可以通过近距离通信模块(例如蓝牙通信模块)发射信号来探测、扫描电子设备100附近的电子设备(例如智能手环200),使得电子设备100可以通过近距离无线通信协议发现附近的智能手环200,并与附近的智能手环200建立无线通信连接,以及传输数据至附近的智能手环200。
在一些实施例中,电子设备100也可以通过通信网络400与智能手环200进行间接连接。
在一些实施例中,该通信系统10还可以包括一个或多个应用服务器,例如服务器300。该服务器300可以通过通信网络400与电子设备100进行通信,为电子设备100提供语音识别、手势识别等服务。电子设备100可以将采集到的环境声音和图像发送给服务器300,由服务器进行语音识别和手势识别,以确定待调整的目标元素,以及用户手势的移动方向和移动速度;并将上述目标元素、移动方向和移动速度发送给电子设备100。
通信网络400可以是局域网(local area networks,LAN),也可以是广域网(wide area networks,WAN),例如互联网。该通信网络400可使用任何已知的网络通信协议来实现,上述网络通信协议可以是各种有线或无线通信协议,诸如以太网、通用串行总线(universal serial bus,USB)、火线(FIREWIRE)、全球移动通讯系统(global system for mobile communications,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址接入(code division multiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE)、蓝牙、无线保真(wireless fidelity,Wi-Fi)、NFC、基于互联网协议的语音通 话(voice over Internet protocol,VoIP)、支持网络切片架构的通信协议或任何其他合适的通信协议。
可以理解的,本实施例示出的结构并不构成对通信系统10的具体限定。在本申请另一些实施例中,通信系统10可以包括比图示更多或更少的设备。
下面对本申请实施例涉及的电子设备100的结构进行介绍。
图2示出了电子设备100的结构示意图。电子设备100可以是手机、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备和/或智慧城市设备,本申请实施例对该电子设备的具体类型不作特殊限制。电子设备可以搭载iOS、Android、Microsoft或者其它操作系统。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface, MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号解调以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和手部移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
本申请实施例中,电子设备100可以对摄像头193采集的图像进行手势识别,以获取用户手部的移动速度和移动方向。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
内部存储器121可以包括一个或多个随机存取存储器(random access memory,RAM)和一个或多个非易失性存储器(non-volatile memory,NVM)。
随机存取存储器可以包括静态随机存储器(static random-access memory,SRAM)、动态随机存储器(dynamic random access memory,DRAM)、同步动态随机存储器(synchronous dynamic random access memory,SDRAM)、双倍资料率同步动态随机存取存储器(double data rate synchronous dynamic random access memory,DDR SDRAM,例如第五代DDR SDRAM一般称为DDR5SDRAM)等;非易失性存储器可以包括磁盘存储器件、快闪存储器(flash memory)。
快闪存储器按照运作原理划分可以包括NOR FLASH、NAND FLASH、3D NAND FLASH等,按照存储单元电位阶数划分可以包括单阶存储单元(single-level cell,SLC)、多阶存储单元(multi-level cell,MLC)、三阶储存单元(triple-level cell,TLC)、四阶储存单元(quad-level cell,QLC)等,按照存储规范划分可以包括通用闪存存储(英文:universal flash storage,UFS)、嵌入式多媒体存储卡(embedded multi media Card,eMMC)等。
随机存取存储器可以由处理器110直接进行读写,可以用于存储操作系统或其他正在运行中的程序的可执行程序(例如机器指令),还可以用于存储用户及应用程序的数据等。
非易失性存储器也可以存储可执行程序和存储用户及应用程序的数据等,可以提前加载到随机存取存储器中,用于处理器110直接进行读写。
外部存储器接口120可以用于连接外部的非易失性存储器,实现扩展电子设备100的存储能力。外部的非易失性存储器通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部的非易失性存储器中。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。
麦克风170C,也称“话筒”,“传声器”,用于采集声音(例如周围环境声音,包括人发出的声音、设备发出的声音等),并将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
在一种实现方式中,用户想要语音控制电子设备100时,需要先通过预设唤醒词唤醒电子设备100。检测到预设唤醒词后,电子设备100才能响应用户的语音指令,执行相应的操 作。
可以理解,当电子设备100开启语音唤醒功能时,麦克风170C可以实时采集周围环境声音,获取音频数据。其中,麦克风170C采集声音的情况与所处的环境相关。例如,当周围环境较为嘈杂时,用户说出唤醒词,则麦克风170C采集的声音包括周围环境噪声和用户发出唤醒词的声音。
在一些实施例中,电子设备100的应用处理器保持上电,麦克风170C将采集到的语音信息发送给应用处理器。应用处理器识别上述语音信息,并可以执行上述语音信息对应的操作。例如,应用处理器识别上述语音信息包括预设唤醒词时,可以生成相应的响应信息(例如,语音信息“我在”),以及响应后续的语音指令。
在一些实施例中,电子设备100的麦克风170C连接微处理器,微处理器保持上电,电子设备100的应用处理器未上电。麦克风170C将采集到的语音信息发送给微处理器,微处理器识别上述语音信息,并根据上述语音信息确定是否唤醒应用处理器,即给应用处理器上电。例如,微处理器识别上述语音信息包括预设唤醒词时,唤醒应用处理器。其中,预设唤醒词可以是出厂前电子设备100默认设置的,也可以是用户根据自身需要在电子设备100中预先设置的,此处不做具体限定。
本申请实施例中,用户可以结合语音和手势两种操作方式,与电子设备100进行交互,以实现对指定元素的幅度的调整。
耳机接口170D用于连接有线耳机。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。
陀螺仪传感器180B可以用于确定电子设备100的移动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。
气压传感器180C用于测量气压。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。
环境光传感器180L用于感知环境光亮度。
指纹传感器180H用于采集指纹。
温度传感器180J用于检测温度。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。
骨传导传感器180M可以获取振动信号。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。
马达191可以产生振动提示。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息, 未接来电,通知等。
SIM卡接口195用于连接SIM卡。
图3A示例性的示出了本申请实施例提供的智能手环200的结构示意图。
如图3A所示,智能手环200可以包括:处理器201,存储器202,无线通信模块203,天线204,电源开关205,有线LAN通信处理模块206,USB通信处理模块207,音频模块208、加速度传感器209、陀螺仪传感器210。其中:
处理器201可用于读取和执行计算机可读指令。具体实现中,处理器201可主要包括控制器、运算器和寄存器。其中,控制器主要负责指令译码,并为指令对应的操作发出控制信号。运算器主要负责保存指令执行过程中临时存放的寄存器操作数和中间操作结果等。具体实现中,处理器201的硬件架构可以是专用集成电路(ASIC)架构、MIPS架构、ARM架构或者NP架构等等。
在一些实施例中,处理器201可以用于解析无线通信模块203和/或有线LAN通信处理模块206接收到的信号,如智能手环200广播的探测请求,等等。处理器201可以用于根据解析结果进行相应的处理操作,如生成探测响应,等等。
在一些实施例中,处理器201还可用于生成无线通信模块203和/或有线LAN通信处理模块206向外发送的信号,如蓝牙广播信号。
存储器202与处理器201耦合,用于存储各种软件程序和/或多组指令。具体实现中,存储器202可包括高速随机存取的存储器,并且也可包括非易失性存储器,例如一个或多个磁盘存储设备、闪存设备或其他非易失性固态存储设备。存储器202可以存储操作系统,例如uCOS,VxWorks、RTLinux等嵌入式操作系统。存储器202还可以存储通信程序,该通信程序可用于智能手环200,一个或多个服务器,或附件设备进行通信。
无线通信模块203可以包括UWB通信模块203A、蓝牙通信模块203B、WLAN通信模块203C、红外线通信模块203D中的一项或多项。
在一些实施例中,UWB通信模块203A、蓝牙通信模块203B、WLAN通信模块203C、红外线通信模块203D中的一项或多项可以监听到其他设备(如电子设备100)发射的信号,如测量信号、扫描信号等等,并可以发送响应信号,如测量响应、扫描响应等,使得其他设备(如电子设备100)可以发现智能手环200,并通过UWB、蓝牙、WLAN或红外线中的一种或多种近距离无线通信技术与其他设备(如电子设备100)建立无线通信连接,来进行数据传输。
在另一些实施例中,UWB通信模块203A、蓝牙通信模块203B、WLAN通信模块203C、红外线通信模块203D中的一项或多项也可以发射信号,如广播UWB测量信号、信标信号,使得其他设备(如电子设备100)可以发现智能手环200,并通过UWB、蓝牙、WLAN或红外线中的一种或多种近距离无线通信技术与其他设备(如电子设备100)建立无线通信连接,来进行数据传输。
无线通信模块203还可以包括蜂窝移动通信模块(未示出)。蜂窝移动通信处理模块可以通过蜂窝移动通信技术与其他设备(如服务器)进行通信。
天线204可用于发射和接收电磁波信号。不同通信模块的天线可以复用,也可以相互独立,以提高天线的利用率。
电源开关205可用于控制电源向智能手环200的供电。
有线LAN通信处理模块206可用于通过有线LAN和同一个LAN中的其他设备进行通 信,还可用于通过有线LAN连接到WAN,可与WAN中的设备通信。
USB通信处理模块207可用于通过USB接口(未示出)与其他设备进行通信。
音频模块208可用于通过音频输出接口输出音频信号,这样可使得智能手环200支持音频播放。音频模块还可用于通过音频输入接口接收音频数据。
陀螺仪传感器210可以用于确定智能手环200的姿态。在一些实施例中,可以通过陀螺仪传感器210确定智能手环200围绕三个轴(即,x,y和z轴)的角速度,进而确定智能手环200的姿态。
需要说明的是,陀螺仪传感器210的参考坐标系通常是地面坐标系。示例性的,图3B所示的三轴(Xg轴、Yg轴和Zg轴)坐标系是本申请实施例示出的一种地面坐标系,其中,Xg轴沿当地纬线指向东(east),Yg轴沿当地子午线线指向北(north),Zg轴沿地理垂线指向上,并与Xg轴和Yg轴构成右手直角坐标系。其中,Xg轴与Yg轴构成的平面即为当地水平面,Y轴与Zg轴构成的平面即为当地子午面。本申请实施例涉及的第一坐标系可以为地面坐标系。
图3C所示的三轴(X轴、Y轴和Z轴)坐标系是本申请实施例示出的一种智能手环200的电子设备坐标系,其中,电子设备坐标系的原点可以取电子设备的质心处(例如智能手环200的主体的质心),X轴从上述主体的质心指向主体的右侧;Y轴从上述主体的质心指向主体的顶端,Y轴且垂直于X轴;而Z轴从上述主体的质心指向主体的正面,且垂直于X轴和Y轴。如图3C所示,在一种实现方式中,智能手环200总体上可以包括主体和表带,主体可以配置有屏幕,上述X轴和Y轴构成的XY平面可以平行于智能手环200的主体配置的屏幕。
智能手环200的姿态可以由俯仰角(Pitch)、航向角(Yaw)和翻滚角(Roll)这三个姿态角确定,俯仰角(Pitch)、航向角(Yaw)和翻滚角(Roll)通常指智能手环200绕地面坐标系的三轴旋转的角度。在一种实现方式,俯仰角可以为智能手环200的电子设备坐标系的Y轴与当地水平面的夹角;偏航角可以为上述电子设备坐标系的Y轴在当地水平面的投影与地面坐标系的Yg轴的夹角,翻滚角可以为上述电子设备坐标系的XY平面与地面坐标系的Zg轴的夹角。本申请实施例中,基于陀螺仪传感器采集的围绕地面坐标系的三个轴的角速度,智能手环200可以确定智能手环200的三个姿态角,进而确定智能手环200的当前姿态。
智能手环200根据陀螺仪传感器210检测到的姿态角,可以获取将电子设备坐标系和地面坐标系间的转换矩阵。
加速度传感器209可检测智能手环200在各个方向上(一般为三轴)加速度的大小。当智能手环200静止时可检测出重力的大小及方向。还可以用于识别设备姿态,应用于横竖屏切换,计步器等应用。加速度传感器209可以是压阻式加速度传感器或电容式加速度传感器,本申请实施例对加速度传感器的类型不作具体限定。
加速度传感器209的测量原理可以用一个简单的质量块+弹簧表示,am为加速度测量值,am=f/m,力f通过弹簧形变x以及弹簧的形变系数k可以求得,f=kx。在一种实现方式中,电子设备100获取沿X轴的加速度a x和沿Y轴的加速度a y后,通过公式V x=V x+a xt和V y=V y+a yt,可以获取手部沿X轴的速度V x和沿Y轴的速度V y
加速度传感器209的参考坐标系为智能手环200的电子设备坐标系,加速度传感器209可以检测到智能手环200在电子坐标系的三轴方向上的加速度。智能手环200基于加速度传感器209采集的加速度数据以及加速度数据的时间戳,还可以确定智能手环200在电子设备坐标系的三轴方向上的速度。
本申请实施例中,智能手环200基于前述转换矩阵,可以将智能手环200的电子坐标系的三轴方向上的加速度转换为地面坐标系的三轴方向上的加速度。
在一些实施例中,智能手环200包括惯性测量单元(Inertial Measurement Unit,IMU),IMU包含三轴的加速度传感器(例如加速度传感器209)和三轴的陀螺仪传感器(例如陀螺仪传感器210),上述加速度传感器可以检测在智能手环200的电子设备坐标系的三轴的加速度信号,上述陀螺仪传感器可以检测智能手环200相对于上述地面坐标系的角速度信号,根据测得的智能手环200在三维空间中的角速度和加速度可以确定出智能手环200的姿态、移动方向和移动速度。
应该理解的是,图3A所示的智能手环200仅是一个范例,并且智能手环200可以具有比图3A中所示的更多或更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
下面对本申请实施例涉及的意图和槽位的相关概念进行介绍。
本申请实施例中,电子设备100可以识别用户输入的语音信息对应的意图和槽位,进而基于上述意图和槽位,响应上述语音信息,执行相应的操作。
意图(intent):是指针对指定数据或资源执行的操作。用户每次输入的语音信息可以对应用户的至少一个意图。意图可采用动宾短语来命名,例如,“调大音量”、“调小亮度”、“预定机票”、“查询天气”、“播放音乐”等均是意图的表达。
本申请实施例支持符合用户习惯的任何自然语言表达,不同的语音信息可以对应同一个意图。例如,当用户想要表达“调大音量”的意图时,支持用户采用诸如“将电视的媒体音量调大”等较为规范、格式化的表达方式,也支持用户采用诸如“声音大点”等较为简易、信息量较少的表达方式,还支持用户采用诸如“音量”等关键词式的表达方式。当然,本申请实施例还支持其他方式的表达,此处不做具体限定。
意图识别(Intent Detection):顾名思义就是根据用户输入的语音信息判断用户想要做什么。意图识别本质上是一种文本分类(或语义表达分类)任务,即预设多种意图,确定语音信息对应的意图为上述多种意图中的哪一种。
槽位(shot):简而言之,槽位指是从用户输入的语音信息中抽取出的关键信息;通过槽位识别,可以将用户隐式的意图转化为显式的指令从而让计算机理解。槽位用于存放数据或资源的属性,一个槽位中的具体信息可以简称为槽位信息(或槽位值)。通常一个意图对应至少一个槽位,一个槽位对应着一类属性的槽位信息。示例性的,用户说“将智慧屏的视频音量调大一倍”,上述语音信息对应的意图可以为“调整音量”,这个意图可以对应“目标设备”、“音量类型”、“调整幅度”、“调整方向”等中的至少一个槽位。例如,“目标设备”对应的槽位信息的属性可以为设备名称(例如智慧屏),“调整幅度”对应的槽位信息的属性可以为数值(例如百分比、倍数等)。
槽位填充(slot filling)指提取用户输入的语音信息对应的语言文本中的结构化字段(即语义成分),槽位填充本质上是一种序列标注任务。序列标注就是将给定文本中的每一个字符打上标签,其本质上是对线性序列中每个元素根据上下文内容进行分类的问题;给定特定的标签集合,就可以进行序列标注。本申请实施例中,利用序列标注技术,根据语言文本中每个结构化字段的上下文给该结构化字段打上一个合适的标签,即确定其槽位。
本申请实施例中,用户可以通过语音指示电子设备100调整幅度的意图和待调整的目标元素;然后,通过手部移动方向指示目标元素的幅度调整方向,以及通过手部移动速度指示目标元素的幅度调整速度或者通过手部移动举例指示目标元素的幅度调整值;从而可以实现电子设备100跟随用户手部的移动,实时调整电子设备100或其他电子设备上的目标元素的幅度。上述目标元素可以为音量、亮度、显示屏亮度、窗帘拉开程度、风扇风速、灯光亮度、空调温度等可以进行幅度调整的元素。
本申请实施例中,电子设备100可以获取预设的手部移动方向和幅度调整方向的第一映射关系,以及手部移动速度和幅度调整速度的第二映射关系或者手部移动举例和幅度调整值的第三映射关系。幅度调整方向包括幅度调大和幅度调小。其中,幅度调大可以对应一个或多个预设的手部移动方向,幅度调小也可以对应一个或多个预设的手部移动方向。
示例性的,手部向左移动对应幅度调小,手部向右移动对应幅度调大;或者,手部逆时针移动对应幅度调小,手部顺时针移动对应幅度调大;或者,手部向左移动以及手部逆时针移动对应幅度调小,手部向右移动以及手部顺时针移动对应幅度调大。在一种实现方式中,手部向右指手部移动方向在水平轴上的投影指向水平轴的预设方向,手部向左指手部移动方向在水平轴上的投影指向水平轴的预设方向的反方向。不限于前述手部移动方向和第一映射关系,本申请实施例还可以预设其他手部移动方向以及第一映射关系,此处不做具体限定。
需要说明的是,上述第一映射关系、第二映射关系和第三映射关系可以是电子设备100出厂时默认设置的,也可以是用户预设的,此处不作具体限定。
下面以用户调整电子设备100(例如大屏设备)播放视频的音量为例,结合图4A至图7B对本申请实施例提供的设备控制方法的一种应用场景进行介绍。示例性的,手部向左移动对应幅度调小,手部向右移动对应幅度调大。
在一些实施例中,电子设备100可以通过摄像头采集到用户移动手部的图像序列;对采集的图像序列进行手部识别,获取手部移动方向、手部移动速度和/或手部移动距离。
示例性的,图4A所示的电子设备100配置有语音助手,且大屏设备的语音助手开启了语音唤醒功能。用户观看大屏设备播放的视频时,说出第一语音信息,即“小艺小艺,调整音量”,第一语音信息包括唤醒词和用于指示调整音量的语音。响应于检测到的第一语音信息,电子设备100唤醒语音助手,语音助手确定第一语音信息对应的意图为“幅度调整”,以及待调整的目标元素为音量后,电子设备100显示如图4B所示的音量指示条301,并启动摄像头采集图像。其中,音量指示条301的长度用于指示最大音量,音量指示条301中的阴影部分的长度用于指示当前的音量大小。
如图4B所示,用户意图调大音量时,将手向右移动;电子设备100通过摄像头采集到用户移动手部的图像序列,电子设备100对采集到的图像进行手部识别,进而确定手部移动方向、手部移动速度和/或手部移动距离;电子设备100基于上述第一映射关系确定手部移动方向对应幅度调大,并基于上述第二映射关系计算手部移动速度对应的幅度调整速度或基于上述第三映射关系计算手部移动距离对应的幅度调整值;电子设备100可以根据上述幅度调整速度或幅度调整值,调大音量,并根据调大后的音量,增加音量指示条301中阴影部分的长度。
如图4C所示,音量调整过程中,若音量调整过大,用户可以将手向左移动,以将音量再调小些;电子设备100持续性地对采集到的图像进行手部识别,获取手部移动方向、手部 移动速度和/或手部移动距离;用户转而将手部向左移动后,电子设备100基于上述第一映射关系确定手部移动方向对应幅度调小,并继续基于上述第二映射关系计算手部移动速度对应的幅度调整速度或基于上述第三映射关系计算手部移动距离对应的幅度调整值;电子设备100可以根据上述幅度调整速度或幅度调整值,调小音量,并根据调小后的音量,减小音量指示条301中阴影部分的长度。
在一种实现方式中,用户通过手部移动调整幅度时,必须以特定手势移动手部。即电子设备100对采集到的图像进行手势识别,当识别到第二预设手势时,确定手部移动方向、手部移动速度和/或手部移动距离。可以理解,若用户没有以特定手势移动手部,电子设备100不能随手部移动调整音量。例如,第二预设手势为图4B所示的五指展开的手。
在一种实现方式中,用户意图调整音量时,也可以先通过一条语音信息唤醒大屏设备的语音助手;确定唤醒大屏设备的语音助手后,再通过一条语音信息(例如第一语音信息),指示大屏设备调整音量。示例性的,用户说出语音信息“小艺小艺”;电子设备100检测到用户的语音信息,识别上述语音信息包括预设唤醒词“小艺小艺”,电子设备100唤醒电子设备100的语音助手,语音助手可以发出用于响应用户的语音信息“我在”,以指示用户电子设备100已被唤醒;然后,用户再发出“调整音量”的第一语音信息。
需要说明的是,第一语音信息用于指示调整幅度的意图以及待调整的目标元素(例如音量),不限于“调整音量”这种语言表达,第一语音信息中指示调整音量的语音内容还可以为“调整视频播放音量”、“音量太大了”、“调大音量”等表达,此处不做具体限定。
在一些实施例中,电子设备100的音量分为多种类型,例如铃声音量、媒体音量、闹钟音量等。用户可以在第一语音信息中指明目标元素具体的音量类型;当用户没有指明目标元素的音量类型,则电子设备100可以将前台运行的应用程序使用的音量类型(例如视频应用采用媒体音量)或者默认的音量类型,确定为目标元素的音量类型。
在一些实施例中,用户可以佩戴智能手环200,电子设备100可以通过智能手环200上的IMU获取手部移动方向和手部移动速度。这样,电子设备100无需采集图像,也无需进行手部识别,即可获取用户的手部移动方向、手部移动速度和/或手部移动距离;针对部分不具备摄像头或者性能不足的电子设备,也可以实现通过第一语音信息和手部移动持续性地调整音量幅度。
示例性的,如图5A所示,用户观看大屏设备播放的视频时,说出第一语音信息,即“小艺小艺,调整音量”。响应于检测到的第一语音信息,电子设备100唤醒语音助手,确定第一语音信息对应的意图为“幅度调整”,以及待调整的目标元素为音量后,显示如图5B所示的音量指示条301,并向智能手环200发送获取请求,获取请求用于请求获取用户的手部移动速度和手部移动方向。响应于上述获取请求,智能手环200通过IMU实时获取用户的手部移动方向、手部移动速度和/或手部移动距离,并发送给电子设备100。
如图5B所示,用户意图调小音量时,将手向左移动;智能手环200通过IMU实时获取用户的手部移动方向、手部移动速度和/或手部移动距离,并发送给电子设备100;电子设备100基于上述第一映射关系确定手部移动方向对应幅度调小,并基于上述第二映射关系计算手部移动速度对应的幅度调整速度或基于上述第三映射关系计算手部移动距离对应的幅度调整值;电子设备100可以根据该幅度调整速度或幅度调整值,调小音量,并根据调小后的音量,减少音量指示条301中阴影部分的长度。
如图5C所示,音量调整过程中,若音量调整过小,用户也可以将手向右移动,以将音 量再调大些;智能手环200通过IMU实时获取用户的手部移动方向、手部移动速度和/或手部移动距离,并发送给电子设备100;用户转而将手部向右移动后,电子设备100基于上述第一映射关系确定手部移动方向对应幅度调大,并继续基于上述第二映射关系计算手部移动速度对应的幅度调整速度或基于上述第三映射关系计算手部移动距离对应的幅度调整值;电子设备100可以根据该幅度调整速度或幅度调整值,调大音量,并根据调大后的音量,增大小音量指示条301中阴影部分的长度。
在一些实施例中,音量调整过程中,若幅度调整的灵敏度过大或过小,用户可以通过第二语音信息,来调整灵敏度。需要说明的是,在一种实现方式中,手部移动速度一定时,幅度调整的灵敏度越小,则手部移动速度对应的幅度调整速度越小;或者,手部移动速度一定时,幅度调整的灵敏度越大,手部移动速度对应的幅度调整速度越小,此处不做具体限定。在一种实现方式中,手部移动距离一定时,幅度调整的灵敏度越小,手部移动距离对应的幅度调整值越小;或者,手部移动速度一定时,幅度调整的灵敏度越大,手部移动距离对应的幅度调整值越小,此处不做具体限定。本申请实施例中,幅度调整的灵敏度也可以被称为第一灵敏度。
示例性的,如图6A所示,用户通过手部移动调整音量时,说出第二语音信息,即“降低灵敏度”;响应于检测到的第二语音信息,电子设备100按照预设比例降低幅度调整的灵敏度。参考图6B,电子设备100按照降低后的灵敏度以及第二映射关系,确定用户的手部移动速度对应的幅度调整速度,进而基于幅度调整速度调整音量。图6B和图6A中用户的手部以相同速度移动了相同的距离,由于图6B使用的灵敏度低于图6A使用的灵敏度,图6B所示的音量调整幅度D2小于图6A所示的音量调整幅度D1。
在实现方式一中,每次音量调整流程中灵敏度的初始值均相同,即本次音量调整流程中对灵敏度的调整,不会沿用至下一次的音量调整流程。在实现方式二中,电子设备100的首次音量调整的灵敏度为预设初始值,在后续的每次音量调整流程中用户均可以对灵敏度进行调整,且本次的灵敏度调整,会沿用至下一次的音量调整流程。这样,在实现方式二中,调节后的灵敏度符合用户的使用习惯,在下次音量调整流程中该用户不需要再次进行灵敏度调节。
在一些实施例中,也可以将灵敏度作为可进行幅度调整的目标元素,通过单独的灵敏度调整流程进行调整。示例性的,用户说出的第一语音信息可以为“调整灵敏度”,然后,用户通过手部移动指示灵敏度的幅度调整方向和幅度调整速度。具体的,可以参考图4A至图5C的相关描述,此处不再赘述。该实现方式中,电子设备100可以保存最新的调整后的灵敏度,并在后续的幅度调整流程中使用该调整后的灵敏度。
在一种实施例中,音量调整过程中,当电子设备100检测到第一预设条件时,电子设备100结束本次的音量调整,并停止显示音量指示条301。其中第一预设条件可以包括以下一项或多项:接收第一语音信息后的时长超过第一预设时长(例如,第一预设时长为15s);第二预设时长内(例如,第二预设时长为5s),未检测到用户手部的有效移动;接收到用于停止幅度调整的第一预设手势;接收到用于停止幅度调整的第三语音信息。其中,用户手部的有效移动指:用户手部沿预设的手部移动方向的移动距离大于距离阈值,或者,用户的手部移动速度大于速度阈值;第一预设手势不同于第二预设手势,例如,第一预设手势为握拳的手。
示例性的,如图7A所示,音量调整过程中,若音量输出达到预期效果,用户停止手部运动,并示出图7A所示的第一预设手势。如图7B所示,当电子设备100检测到第一预设手 势时,电子设备100结束本次的音量调整,并停止显示音量指示条301。
参见图4A至图4C,以及图5A至图5C,用户说出用于调整音量的第一语音信息后,电子设备100可以跟随用户的手部移动方向、手部移动速度和/或手部移动距离,持续性地调大或调小音量,以便于用户可以获取符合预期效果的视频音量。参见图6A和图6B,在音量调整过程中,用户通过手部移动控制音量的调整时,用户还可以通过第二语音信息控制幅度调整的灵敏度。参见图7A和图7B,电子设备100检测到第一预设条件(例如第一预设手势)时,可以自动结束本次的音量调整,用户可以通过第一语音信息再次启动前述音量调整流程。
结合前述通信系统、设备结构和应用场景,下面介绍本申请实施例提供的一种设备控制方法。
示例性的,图8示出了本申请实施例提供的一种设备控制方法的流程图。该设备控制方法包括但不限于下述步骤S101至S109。
S101、用户说出第一语音信息,电子设备100接收第一语音信息。
参考图4A,第一语音信息可以包括电子设备100的语音助手的唤醒词。示例性的,电子设备100的语音助手的唤醒词为“小艺小艺”,第一语音信息可以为“小艺小艺,调整音量”。可以理解,用户通过唤醒词唤醒电子设备100的语音助手后,电子设备100的语音助手才会响应用户说出的语音信息,执行相应的操作。
S102、电子设备100识别第一语音信息对应的语言文本1。
语音识别指通过计算机将人类的语音转换为相应的文本。在一种实现方式中,电子设备100利用自动语音识别(automatic speech recognition,ASR)技术获取第一语音信息对应的语言文本1。
在一些实施例中,电子设备100识别第一语音信息对应的语言文本1,包括;提取第一语音信息的音频特征;将第一语音信息的音频特征输入声学模型(acoustic model,AM),声学模型输出上述音频特征对应的音素和文字;将上述音频特征对应的音素和文字输入语言模型(language model,LM),语言模型输出第一语音信息对应的一组token序列,即语言文本1。
其中,token可以为字母(Grapheme,书写的基本单位)、词(word)、语素(Morpheme,可以传达意思的最小单位,小于词,大于字母)或字节(bytes),此不做具体限定。声学模型可以获取音频特征属于某个声学单元(如音素或字词)的概率,进而能够把语音输入(例如第一语音信息)的音频特征解码为音素或字词这样的单元。语言模型可以获取一组token序列为这条语音信息对应的语言文本的概率,进而能把第一语音信息对应的声学单元解码成一组token序列。本申请实施例中,声学模型和语言模型可以采用神经网络模型。
在一些实施例中,可以对联合声学模型和语言模型的神经网络模型1进行联合训练;利用联合训练后的神经网络模型1可以直接识别第一语音信息对应的语言文本1。具体的,上述电子设备100识别第一语音信息对应的语言文本1,包括;提取第一语音信息的音频特征;将第一语音信息的音频特征输入神经网络模型1,神经网络模型1输出第一语音信息对应的语言文本1。
在一些实施例中,上述提取第一语音信息的音频特征,包括:对第一语音信息的输入语音流进行分离和降噪;然后,通过分帧、开窗、短时傅里叶变换等音频处理算法,获取第一语音信息对应的音频特征。
S103、对上述语言文本1进行意图识别和槽位填充。
本申请实施例中,电子设备100将语言文本1转换为可被电子设备100理解和执行的意图和槽位,通过意图和槽位表征用户的诉求。
在一些实施例中,电子设备100将语言文本1输入意图分类器,意图分类器可以输出上述语言文本1对应的意图。上述意图分类器可以为支持向量机(SVM)、决策树或深度神经网络(Deep Neural Networks,DNN)。其中,深度神经网络可以是卷积神经网络(convolutional neural network,CNN)或循环神经网络(recurrent neural network,RNN)等,此处不做具体限定。
在一些实施例中,电子设备100可以通过槽位分类器标注语言文本1的至少一个槽位。上述槽位分类器可以为最大熵马尔可夫模型(Maximum Entropy Markov Model,MEMM),条件随机场(conditional random field,CRF)以及循环神经网络(RNN)等。
本申请实施例中,意图识别和槽位填充既可以作为两个单独的任务处理,也可以联合处理。例如,利用联合训练模型对意图识别和槽位填充进行联合处理,上述联合训练模型的输入为语言文本1或语言文本1对应的文本特征,上述联合训练模型的输出为语言文本1对应的意图,以及语言文本1对应的至少一个槽位。由于同一个语音信息对应的意图和槽位通常相关联,通过联合处理,可以提高意图识别和槽位填充的准确率。
示例性的,图9为本申请实施例提供的一种意图和槽位的联合识别模型。如图9所示,将语言文本1输入该识别模型,该识别模型利用词嵌入(word embedding)算法(例如,word2vec算法)生成语言文本1对应的词向量(word embedding);将语言文本1对应的词向量输入到BERT(Bidirectional Encoder Representations from Transformers,基于变换器的双向编码器表示)模型,BERT模型输出上述词向量对应的隐藏特征;将上述隐藏特征分别输入意图分类器(例如,LSTM)和槽位分类器,意图分类器输出语言文本1对应的意图,槽位分类器输出语言文本1对应的至少一个槽位。例如:语言文本1为“调整电视的亮度”,语言文本1对应的意图为“调整亮度(ADJUST_LUMINANCE)”,语言文本1对应的槽位包括“电视”和“亮度”。
需要说明的是,图9所示的识别模型的文本输入中的[cls]用于指示文本分类任务,BERT模型在文本前插入一个[cls]符号,并将该符号对应的输出向量作为整个文本输入的语义表示;[SEP]用于指示语句对分类任务,对于该任务,BERT模型对输入的两句话用一个[SEP]符号作分割,并分别对两句话附加两个不同的词向量以作区分。
S104、确定语言文本1对应的意图是否为调整幅度;若否,则执行步骤S105;若是,则执行步骤S106。
在一种实施例中,本申请实施例将对预设的至少一个的元素的幅度调整的意图均识别为“调整幅度”。示例性的,步骤S103中,电子设备100可以识别“调整智慧屏的音量”和“调整智慧屏的亮度”这两个语言文本对应的意图均为“调整幅度”;步骤S104中,电子设备100可以直接根据步骤S103的识别结果,确定语言文本1对应的意图是否为“调整幅度”。
其中,上述预设的至少一个元素可以为音量、亮度、显示屏亮度、窗帘拉开程度、风扇风速、灯光亮度、空调温度等可以进行幅度调整的元素,此处不做具体限定。
在一种实施例中,电子设备100按照现有的意图分类对第一语音信息进行意图识别,并将对预设的至少一个元素的幅度调整对应的意图均归类为“调整幅度”;电子设备100可存储文件1,文件1指示了归类为“调整幅度”的意图有哪些。示例性的,文件1指示了归类为“调整幅度”的意图包括“调整音量”和“调整亮度”;步骤S103中,电子设备100识别“调整智慧屏的音量”这个语言文本对应的意图为“调整音量”,步骤S104中,若电子设备100根据文件1确定“调整音量”归类于“调整幅度”,则确定语言文本1对应的意图为“调整幅度”。
S105、基于语言文本1对应的意图和槽位,进行语音指令处理。
可以理解,针对意图不是“调整幅度”的语音信息,按照现在的处理流程,进行语音指令处理,此处不再赘述。
S106、电子设备100基于语言文本1对应的槽位确定待调整的目标元素。
示例性的,用户说“调整音量”,上述语音信息对应的意图可以为“调整幅度”,该意图对应的槽位包括“音量”;电子设备100基于语言文本1对应的槽位可以确定待调整的目标元素为媒体音量。示例性的,用户说“调整视频的音量”,上述语音信息对应的意图可以为“调整音量”,该意图归类于“调整幅度”,该意图对应的槽位包括“音量类型(即媒体音量)”;电子设备100基于语言文本1对应的槽位可以确定待调整的目标元素为媒体音量。
需要说明的是,本申请实施例中,语言文本1对应的槽位也可以被称为第一语音信息对应的槽位。
S107、电子设备100获取用户的手部移动方向和手部移动速度。
在一些实施例中,确定用户的意图为“调整幅度后”,电子设备100即开始获取用户的手部移动方向和手部移动速度。
本申请实施例中,电子设备100可以获取预设的手部移动方向和幅度调整方向的第一映射关系,以及手部移动速度和幅度调整速度的第二映射关系。幅度调整方向包括幅度调大和幅度调小;第一映射关系中,幅度调大可以对应手部的一个或多个预设移动方向,幅度调小也可以对应手部的一个或多个预设移动方向,幅度调大对应的预设移动方向不同于幅度调小对应的预设移动方向。其中,预设移动方向包括但不限于向上移动、向下移动、向右移动、向左移动、顺时针移动、逆时针移动等。
后续实施例中,以“手部的预设移动方向1对应幅度调大,手部的预设移动方向2对应幅度调小”为例,进行示例性说明。
下面对如何获取用户的手部移动方向和手部移动速度进行具体介绍。
在一些实施例中,电子设备100通过摄像头采集的图像获取用户的手部移动方向和手部移动速度。示例性的,图10示出了本申请实施例提供的一种获取用户的手部移动方向和手部移动速度的流程图。
S107A、电子设备100获取摄像头采集的图像序列。
在一种实现方式中,电子设备100持续性地通过前置的摄像头(例如低功耗摄像头)采集图像,确定用户的意图为“调整幅度”后,电子设备100获取摄像头实时采集的图像序列。在一种实现方式中,电子设备100确定用户的意图为“调整幅度后”,才启动前置的摄像头实时采集图像序列。
S107B、电子设备100通过手部识别,识别上述图像序列中每帧图像中的手部。
S107C、基于上述每帧图像中的手部的位置,确定手部移动方向和手部移动速度。
在一些实施例中,电子设备100通过手部识别,获取实时采集的每帧图像中的手部的第一特征点的位置,电子设备100根据上述图像序列中第一特征点的位置变化,获取用户的手部移动方向和手部移动速度。
在一些实施例中,上述图像序列包括图像1,电子设备100将图像1输入手部识别模型,手部识别模型输出图像1中手部对应的预设形状的手部检测框,该手部检测框用于指示手部在图像1中的所在区域;电子设备100基于手部检测框确定第一特征点的位置。上述手部识别模型可以采用训练好的神经网络模型,上述预设形状可以为预设的长方形、椭圆形或圆形 等,此处不做具体限定。第一特征点可以为手部检测框的特定位置。例如,上述预设形状为预设的长方形时,第一特征点可以为手部检测框的中心位置、左上角或右上角等位置。
示例性的,参考图11A,上述预设形状为长方形,电子设备100通过对图像1进行手部识别,获取图像1对应的长方形的手部检测框,并取手部检测框的左上角作为第一特征点。
在一些实施例中,上述图像序列包括图像1,电子设备100将图像1输入手部识别模型,手部识别模型识别图像1中的手部后,利用骨骼点识别算法输出图像1中的手部的至少一个骨骼点的位置。示例性的,图11B示出了手部的21个骨骼点。第一特征点可以为电子设备100识别到的至少一个预设骨骼点的平均位置。例如,电子设备100识别到图像1中手部的两个预设骨骼点的位置分别为(x1、y1)和(x2、y2),第一特征点的位置可以为(0.5*(x1+x2)、0.5*(y1+y2))。
示例性的,参见图11C,在T1时刻、T2时刻和T3时刻,电子设备100的摄像头分别采集了图像1、图像2和图像3。电子设备100每采集一帧图像,即可通过手部识别获取第一特征点的位置。在上述图像1、图像2和图像3中,电子设备100获取到的第一特征点的位置分别为A(X(T1),Y(T1))、B(X(T2),Y(T2))和C(X(T3),Y(T3))。需要说明的是,图11C中的XY坐标系是本申请给出的一种示例性的二维图像坐标系,图11C中X轴平行于图像(例如图像1)的上边和下边,且X轴的正方向从图像的第一列像素垂直指向图像的最后一列像素,图11C中的Y轴平行于图像的两个侧边,且Y轴的正方向从图像的最后一行像素指向图像的第一行像素,X轴垂直于Y轴。
基于图11C中A、B、C三点的坐标,图11D示出了T1时刻至T3时刻,用户手部的第一特征点的移动轨迹。可以理解,T2时刻,用户手部在上述二维图像坐标系中的移动方向可以通过二维移动向量AB指示,即(X(T2)-X(T1),Y(T2)-Y(T1))。类似的,T3时刻,用户手部在上述二维坐标系中的移动方向可以通过二维移动向量BC指示,即(X(T3)-X(T2),Y(T3)-Y(T2))。
本申请实施例中,电子设备100基于第一特征点的位置变化,确定用户的手部移动方向为:手部在上述二维图像坐标系中的二维移动向量对应的预设移动方向(即预设移动方向1或预设移动方向2)。
在一种实现方式中,预设移动方向1为向右移动,预设移动方向2为向左移动。若图像1中手部的二维移动向量在X轴上的投影指向X轴的预设方向(例如X轴的反方向),则电子设备100可以确定手部移动方向为预设移动方向1(即向右移动);若手部向手部移动方向在X轴上的投影指向X轴的预设方向的反方向(例如X轴的正方向),则电子设备100可以确定手部移动方向为预设移动方向2(即向左移动)。用户的手部移动速度可以为沿二维的移动向量(例如向量AB或向量BC)的速度,也可以为沿X轴(即预设移动方向1或预设移动方向2)的速度,此处不做具体限定。
需要说明的是,如图11E所示,鉴于摄像头的镜像拍摄,若用户手部在实际环境中向右移动(参考图4B),则在摄像头采集的图像(参考图11D)中手部向左移动;反之,若用户手部在实际环境中向左移动,则在摄像头采集的图像中手部向右移动。
在一种示例中,参见图11D和图11E,电子设备100先后采集图像1和图像2后,确定第一特征点在T1时刻至T2时刻的二维向量AB在X轴上的投影指向水平轴的反方向,进而可以确定用户的手部移动方向属于预设移动方向1,即向右移动。可选的,用户的手部移动速度为沿X轴的速度,第一特征点在T1时刻至T2时刻沿X轴移动了(X(T2)-X(T1)),电 子设备100确定用户的手部移动速度为(X(T2)-X(T1))/(T2-T1)。可选的,用户的手部移动速度为沿二维移动向量AB的速度,第一特征点在T1时刻至T2时刻沿向量AB的方向移动了|AB|,电子设备100确定用户的手部移动速度为|AB|/(T2-T1)。其中,|*|表示向量*的模值。类似的,电子设备100采集图像3后,电子设备100可以确定用户的手部移动方向属于预设移动方向1;确定用户的手部移动速度为(X(T3)-X(T2))/(T3-T2)或|BC|/(T3-T2)。
在一种实现方式中,预设移动方向1为顺时针方向,预设移动方向2为逆时针方向。若移动向量BC和移动向量AB的矢量叉积大于零,则移动向量BC在移动向量AB的顺时针方向;即图像中用户的手部顺时针移动,实际环境中用户的手部逆时针移动,属于预设移动方向2;若移动向量BC和移动向量AB的矢量叉积小于零,则移动向量BC在移动向量AB的逆时针方向;即图像中用户的手部逆时针移动,实际环境中用户的手部顺时针移动,属于预设移动方向1;若移动向量BC和移动向量AB的矢量叉积等于零,则移动向量BC和移动向量AB共线。电子设备100确定用户的手部移动速度为沿二维移动向量(例如向量BC或向量AC)的速度。
在一种实现方式中,当移动向量BC和移动向量AB共线时,确定用户的手部移动是无效移动。在一种实现方式中,预设移动方向1包括顺时针方向和其他预设方向(例如向右移动和向上移动),预设移动方向2也包括逆时针方向和其他预设方向(例如向左移动和向下移动),当移动向量BC和移动向量AB共线时,可以判断用户的手部移动方向是否为向右移动或向上移动;若是,则确定用户的手部移动方向属于预设移动方向1,否则确定用户的手部移动方向属于预设移动方向2。
需要说明的是,如图11F所示,鉴于摄像头的镜像拍摄,若用户手部在实际环境中顺时针移动(参考图4B),则在摄像头采集的图像中(参考图11D)手部逆时针移动;反之,若用户手部在实际环境中逆时针移动,则在摄像头采集的图像中手部顺时针移动;
在一种示例中,参见图11D和图11F,电子设备100先后采集图像1、图像2和图像3后,确定第一特征点T1时刻至T2时刻的二维移动向量AB和T2时刻至T3时刻的二维移动向量BC,向量BC和向量AB的矢量叉积小于零,电子设备100确定用户的手部移动方向属于预设移动方向1,即顺时针移动。电子设备100确定用户的手部移动速度为|BC|/(T3-T2)或者|AB|+|BC|/(T3-T1)。
在一些实施例中,电子设备100通过手势识别,确定摄像头采集的图像序列中的图像是否包含第二预设手势;若包含,则确定上述图像中的第二预设手势的第一特征点的位置;电子设备100根据上述图像序列中第一特征点的位置变化,获取用户的手部移动方向和手部移动速度;若不包含,电子设备100判断未检测到用户手部的有效移动,即未检测到手部移动方向和手部移动速度。
可以理解,用户以第二预设手势的移动可以指示目标元素的幅度调整方向和幅度调整速度,上述目标元素可以为音量、亮度、显示屏亮度、窗帘拉开程度、风扇风速、灯光亮度、空调温度等任意可以进行幅度调整的元素。相比于现有技术(即为每一种元素预设一种手势,通过不同的手势调整不同的元素的幅度),本申请实施例所提方案中电子设备100仅需识别一种手势(即第二预设手势),对电子设备100的系统性能要求更低;用户通过一种手势就可实现多种元素的幅度调整,无需记忆繁复的手势。
在一些实施例中,电子设备100通过智能手环200获取用户的手部移动方向和手部移动速度。示例性的,图12示出了本申请实施例提供的另一种获取用户的手部移动速度和手部移 动方向的流程图。
S107D、电子设备100向智能手环200发送获取请求。
本申请实施例中,确定用户的意图为“调整幅度”后,电子设备100向智能手环200发送获取请求,以请求获取用户的手部移动速度和手部移动方向。
S107E、响应于上述获取请求,智能手环200获取智能手环200的移动速度和移动方向。
在一些实施例中,响应于上述获取请求,智能手环200通过陀螺仪传感器可以检测智能手环200的姿态角,并根据上述姿态角可以确定转换矩阵;智能手环200通过加速度传感器可以检测智能手环200沿电子设备坐标系的三轴的加速度;根据上述转换矩阵可以将沿电子设备坐标系的三轴的加速度转换为沿地面坐标系的三轴的加速度;进而可以确定智能手环200在的地面坐标系中移动方向,沿上述移动方向的移动速度或者沿地面坐标系的三轴的移动速度,上述移动方向可以通过三维移动向量指示。
在一种实现方式中,智能手环200配置有IMU,IMU包括上述加速度传感器和/或陀螺仪传感器,智能手环200通过IMU可以检测智能手环200在上述地面坐标系中的移动速度和移动方向。
S107F、电子设备100接收智能手环200发送的智能手环200的移动方向和移动速度,并基于智能手环200的移动方向和移动速度,确定用户的手部移动方向和手部移动速度。
本申请实施例中,电子设备100确定用户的手部移动方向为三维移动向量对应的预设移动方向(即预设移动方向1或预设移动方向2)。
在一种实现方式中,预设移动方向1为向右移动,预设移动方向2为向左移动。电子设备100可以确预设移动方向1在地面坐标系中对应三维的移动方向3,以及预设移动方向2在地面坐标系中对应移动方向3的反方向。若智能手环200的移动方向在地面坐标系的移动方向3上的投影指向移动方向3的正方向,则电子设备100可以确定手部移动方向为预设移动方向1(即向右移动);若智能手环200的移动方向在上述移动方向3上的投影指向移动方向3的反方向,则电子设备100可以确定手部移动方向为预设移动方向2(即向左移动)。用户的手部移动速度可以为沿上述三维的移动方向的速度,也可以为沿移动方向3的速度,此处不做具体限定。
在一种实现方式中,预设移动方向1为顺时针方向,预设移动方向2为逆时针方向。电子设备100获取智能手环200在连续两个时刻(例如T4时刻和T5时刻)的三维的移动方向(即移动方向4和移动方向5),若智能手环200的移动方向5和移动方向4的矢量叉积大于零,则移动方向5在移动方向4的顺时针方向,用户的手部移动方向属于预设移动方向1,即顺时针移动;若智能手环200的移动方向5和移动方向4的矢量叉积小于零,则移动方向5在移动方向4的逆时针方向,用户的手部移动方向属于预设移动方向2,即逆时针移动;若移动方向2和移动方向1的矢量叉积等于零,则移动方向5和移动方向4共线。电子设备100确定用户的手部移动速度为沿三维的移动方向(例如移动方向5)的速度,即T4时刻至T5时刻沿移动方向5移动的距离比上(T5-T4)。
在一种实现方式中,当移动方向5和移动方向4共线时,确定用户的手部移动是无效移动。在一种实现方式中,预设移动方向1包括顺时针方向(例如向右移动和向上移动),预设移动方向2包括逆时针方向(例如向左移动和向下移动),当移动方向5和移动方向4共线时,可以判断用户的手部移动方向是否为向右移动或向上移动;若是,则确定用户的手部移动方向属于预设移动方向1,否则确定用户的手部移动方向属于预设移动方向2。
S108、基于用户的手部移动方向确定目标元素的第一幅度调整方向,基于用户的手部移动速度和灵敏度确定目标元素的第一幅度调整速度。
在一些实施例中,当用户的手部移动方向属于预设移动方向1,电子设备100基于前述第一映射关系确定预设移动方向1对应的幅度调整方向为幅度调大;当用户的手部移动方向属于预设移动方向2,电子设备100基于前述第一映射关系确定预设移动方向2对应的幅度调整方向为幅度调小。
在一些实施例中,电子设备100基于第二映射关系确定手部移动速度v h对应的幅度调整速度v e,第二映射关系中v e=sen*w 0*v h,其中,w 0为预设的手部移动速度到幅度调整速度的映射系数,sen为幅度调整的灵敏度。例如,音量大小为0至100,音量调节过程中,手部移动速度为10cm/s,sen取值为1,w 0取值为2,则音量的第一幅度调整速度为20/s。
在一种实现方式中,每次幅度调整流程开始时,灵敏度的取值均为初始值。例如,灵敏度的初始值为1。在另一种实现方式中,若本次幅度调整流程结束时,灵敏度的取值已从初始值调整为第一值,则下次幅度调整流程开始时,灵敏度的取值为第一值。在一种实现方式中,上述第二映射关系中也可以没有灵敏度(即sen取值为1),或者灵敏度为不可调的固定值,此处均不作具体限定。
在一些实施例中,电子设备100基于第二映射关系确定手部移动速度v h对应的幅度调整速度v e,第二映射关系中v e=w 0*v h,其中,w 0为预设的手部移动速度到幅度调整速度的映射系数,w 0也可以视为幅度调整的灵敏度。
可以理解,上述第二映射关系的示例中,用户的手部移动速度一定时,幅度调整的灵敏度越大,相应地幅度调整速度越大;幅度调整的灵敏度一定时,用户的手部移动速度越大,相应地幅度调整速度越大。
S109、电子设备100沿第一幅度调整方向以第一幅度调整速度调整目标元素的幅度。
示例性的,音量的第一幅度调整方向为幅度调大,音量的第一幅度调整速度为20/s,则电子设备100以20/s调大音量。
在一些实施例中,电子设备100调整目标元素的幅度时,可以通过语音、视频、图片、文本等方式,来播报用户正在调整目标元素和/或目标元素调整后的输出效果。示例性的,参考图4A至图4C,电子设备100跟随用户的手部移动来调整音量时,显示音量指示条301,并根据调整后的音量实时刷新音量指示条301中阴影部分的长度。可以理解,音量指示条301用于实时播报音量的幅度调整效果。
在一些实施例中,当电子设备100确定第一语音信息对应的意图为调整幅度时,还基于第一语音信息对应的槽位确定幅度调整的目标设备(例如第三电子设备);步骤S109中电子设备100向目标设备发送调整请求,以控制目标设备沿第一幅度调整方向以第一幅度调整速度调整目标元素的幅度,调整请求携带目标元素、第一幅度调整速度和第一幅度调整方向。
在一些实施例中,第一语音信息还可以指示幅度调整方向。用户说出第一语音信息,响应于检测到的第一语音信息,电子设备100确定第一语音信息指示的幅度调整的意图、待调整的目标元素以及幅度调整方向。用户说出第一语音信息后,通过手部移动速度指示幅度调整速度;电子设备100通过图像识别或智能手环200获取手部移动速度,并基于灵敏度、前述第二映射关系和手部移动速度获取幅度调整速度,进而按照该幅度调整速度调整目标元素的幅度。具体的,可以参考图8的相关描述,此处不再赘述。在一种实现方式中,由于仅通过手部移动速度指示幅度调整速度,该实施例中无需限定用户的手部移动方向。这样,用户 说出第一语音信息后,即可通过任意方向的手部移动持续性地控制目标元素的幅度调整,直到目标元素的输出效果达到预期效果。
在一些实施例中,用户调整目标元素的幅度期间,用户可以调节幅度调整的灵敏度。手部移动速度一定时,若幅度调整的灵敏度越小,则手部移动速度对应的幅度调整速度越小。步骤S109之后还可以包括S110至S114。
S110、用户说出第二语音信息,电子设备100接收第二语音信息。
在一些实施例中,幅度调整期间,用户可以通过第二语音信息调整灵敏度(即sen或w 0)。示例性的,参见图6A,用户通过手部移动调整音量的过程中,意图降低幅度调整速度时,用户说出“降低灵敏度”。
S111、电子设备100识别第二语音信息对应的语言文本2。
S112、电子设备100对上述语言文本2进行意图识别和槽位填充。
S113、电子设备100确定语言文本2对应的意图是否为调整灵敏度;若否,则执行步骤S105;若是,则执行步骤S114。
S114、电子设备100基于语音文本2对应的意图和槽位调整灵敏度,并基于调整后的灵敏度执行S108。
步骤S111至S113的具体实现,可以参考步骤S102至S104的相关描述,此处不再赘述是。
需要说明的是,电子设备100预设了“调整灵敏度”的意图,“调整灵敏度”对应的槽位可以包括“灵敏度调整值”、“灵敏度调整方向”。“灵敏度调整值”可以为百分比、倍数等数值,灵敏度的“灵敏度调整方向”包括调大和调小。
需要说明的是,本申请实施例中,语言文本2对应的“灵敏度调整值”的槽位值为第一灵敏度调整值,“灵敏度调整方向”的槽位值为第一灵敏度调整方向。语言文本2对应的槽位也可以被称为第二语音信息对应的槽位。
在一种实现方式中,当通过槽位填充确定用户在语言文本2中指明了“灵敏度调整值”(例如第一灵敏度调整值)时,电子设备100基于上述第一灵敏度调整值调整灵敏度;当通过槽位填充确定用户在语言文本2中未指明“灵敏度调整值”,电子设备100可以基于预设比例调整灵敏度。
例如,预设比例为10%。可以理解,若“灵敏度调整方向”为调大,且当前的灵敏度超过最大灵敏度的90%,则电子设备100将灵敏度调大至最大灵敏度;若“灵敏度调整方向”为调小,且当前的灵敏度小于最大灵敏度的10%,则电子设备100将灵敏度调小至最小灵敏度。
示例性的,参见图6A和图6B,用户通过手部移动调整音量的过程中,用户说出“降低灵敏度”;通过意图识别和槽位填充,电子设备100识别到上述语音信息对应的意图为“调整灵敏度”,灵敏度调整方向为调小;电子设备100响应于上述语音信息,降低灵敏度;灵敏度降低后,用户的手部移动速度一定时,音量的幅度调整速度降低。
本申请实施例中,步骤S106之后,电子设备100也可以基于手部移动距离调整目标元素的幅度。具体的,步骤S107中,电子设备100可以获取用户的手部移动距离;步骤S108中,电子设备100基于用户的手部移动距离和灵敏度确定目标元素的第一幅度调整值;步骤S109中,电子设备100电子设备100沿第一幅度调整方向以第一幅度调整值调整目标元素的幅度。
在一些实施例中,步骤S107中电子设备100以预设周期1周期性地确定用户的手部移动距离,然后根据预设的第三映射关系确定手部移动距离对应的目标元素的第一幅度调整值,以实现随着用户手部移动,以预设周期1周期性的更新目标元素的幅度。例如预设周期为1s。
在一些实施例中,步骤S107中电子设备100可以根据摄像头采集的图像中的第一特征点的位置确定用户手部移动距离。具体的,可以参考前述图10的相关描述。
示例性的,参考图11C和图11D,电子设备100可以分别在T1、T2和T3时刻,采集用户的手部图像。类似于手部移动速度,预设移动方向1为向右移动,预设移动方向2为向左移动时,用户的手部移动距离可以为沿二维的移动向量(例如向量AB或向量BC)的移动距离,也可以为沿X轴(即预设移动方向1或预设移动方向2)的移动距离,此处不做具体限定。例如,在T2时刻,电子设备100确定T1时刻至T2时刻,用户的手部移动距离为|AB|。
在一些实施例中,步骤S107中电子设备100也可以向智能手环200请求获取智能手环200的移动距离,进而可以基于智能手环200的移动距离确定用户的手部移动距离;智能手环200可以向电子设备100反馈沿地面坐标系的三轴的移动距离或沿地面坐标系中的三维移动向量的移动距离。具体的,可以参考前述图12的相关描述。
在一种实现方式中,预设移动方向1为向右移动,预设移动方向2为向左移动。电子设备100可以确定预设移动方向1在地面坐标系中对应三维的移动方向3,以及预设移动方向2在地面坐标系中对应移动方向3的反方向。电子设备100确定用户的手部移动距离可以为沿智能手环200的三维移动向量的距离,也可以为沿移动方向3的距离,此处不做具体限定。
在一些实施例中,步骤S108中电子设备100基于第三映射关系确定手部移动距离D h对应的幅度调整值D e,第三映射关系中D e=sen*w 1*D h,其中,w 1为预设的手部移动距离到幅度调整值的映射系数,sen为幅度调整的灵敏度。例如,音量大小为0至100,音量调节过程中,手部移动距离为5cm,sen取值为1,w 1取值为2,则音量的第一幅度调整值为10。
在一些实施例中,步骤S108中电子设备100基于第三映射关系确定手部移动距离D h对应的幅度调整值D e,第三映射关系中D e=w 1*D h,其中,w 1为预设的手部移动距离到幅度调整值的映射系数,w 1也可以视为幅度调整的灵敏度。
可以理解,用户的手部移动距离一定时,幅度调整的灵敏度越大,相应地幅度调整值越大;幅度调整的灵敏度一定时,用户的手部移动距离越大,相应地幅度调整值越大。
在一些实施例中,用户的手部移动过程中,电子设备100检测到第二预设条件时,才基于用户的手部移动方向和手部移动参数(即手部移动速度或手部移动方向)调整目标元素的幅度。
在一种实现方式中,第二预设条件为用户停止手部移动的时长达到第三预设时长(例如,第三预设时长为2s)。具体的,用户说出第一语音信息后,在T1时刻至T2时刻将手部持续移动距离1后停止移动;电子设备100在T2时刻后检测到第二预设条件,电子设备100确定用户在T1时刻至T2时刻内的手部移动方向和手部移动参数(即手部移动速度或手部移动距离),进而基于用户的手部移动方向和手部移动参数调整目标元素的幅度(具体的,参考前述实施例)。用户判断调整后的目标元素的幅度是否符合预期效果,如果符合则停止调整目标元素的幅度,电子设备100检测到第一预设条件时,停止本次幅度调整流程;如果不符合,则可以接着在T3时刻至T4时刻将手部持续移动距离2后再次停止移动;电子设备100在T4时刻后检测到第二预设条件,电子设备100确定用户在T3时刻至T4时刻内的手部移动方向和手部移动参数(即手部移动速度或手部移动距离),进而基于用户的手部移动方向和手部移 动参数调整目标元素的幅度。以此类推,直至目标元素的幅度的输出效果符合用户的预期效果。
可以理解,用户停止移动时手部可以有轻微的抖动,上述停止移动包括完全静止和在预设阈值内的轻微移动。
在一些实施例中,电子设备100接收第一语音信息;当电子设备100确定第一语音信息对应的意图为调整幅度时,基于第一语音信息确定幅度调整的目标元素;电子设备100获取用户的手部移动参数;电子设备100确定手部移动参数对应的第一幅度调整参数;电子设备100以第一幅度调整参数调整目标元素的幅度。
其中,手部移动参数为手部移动速度,第一幅度调整参数为前述第一幅度调整速度;或者,手部移动参数为手部移动距离,第一幅度调整参数为前述第一幅度调整值。
在一些实施例中,电子设备100获取用户的手部移动参数,具体包括:电子设备100获取摄像头采集的第一图像和第二图像;电子设备100通过手部识别,获取第一图像和第二图像中的手部的第一特征点的位置;基于第一图像和第二图像中的手部的第一特征点的位置,确定用户的手部移动参数(即前述手部移动速度或前述手部移动距离)。手部移动方向是基于第一图像和所述第二图像中的手部的第一特征点的位置确定的。示例性的,第一图像为前述图像1,第二图像为前述图像2;或者,第一图像为前述图像2,第二图像为前述图像3。
本申请实施例中,当检测到用户说出的语音对应的意图为调整幅度时,电子设备100可以实时获取用户的手部移动参数和手部移动方向;伴随着用户的手部移动,电子设备100可以沿手部移动方向指示的幅度调整方向,以手部移动参数指示的幅度调整参数持续性地调整目标元素的幅度。这样,不需要用户做出复杂的手势,仅通过简单的手部移动即可以调节目标元素的幅度至预期效果;上述方案对任意可进行幅度调整的元素均适用,无需用户针对每种元素设置特定的手势;由于无需电子设备100识别繁复的手势,上述方案也可以适当降低对电子设备100的性能要求。此外,在幅度调节过程中,能够通过语音调整幅度调整的灵敏度,以满足不同用户的需求,有效提升了用户体验。
示例性的,图13示出了本申请实施例提供的一种对话系统的系统架构。通过该对话系统,电子设备100可以通过语音识别,确定用户调整目标元素的幅度的意图;进而可以跟随用户手部的移动,调整目标元素的幅度。该对话系统中,电子设备100包括对话输入的前端处理模块、语音识别(Automatic Speech Recognition,ASR)模块、语义理解(Natural Language Understanding,NLU)模块、对话管理(Dialogue Management,DM)模块、自然语言生成(Natural Language Generation,NLG)模块、播报模块和图像处理模块,智能手环200包括惯性测量单元(Inertial measurement unit,IMU)。其中,
前端处理模块:该模块用于将输入的语音流处理成ASR模块的网络模型所需的数据格式。在一种实现方式中,前端处理模块对接收的语音信息(例如第一语音信息、第二语音信息)进行音频解码,利用声纹或其他特征对输入的音频解码后的语音流进行分离和降噪;然后,对分离和降噪后的语音流进行音频特征提取,例如,通过分帧、开窗、短时傅里叶变换等音频处理算法得到音频特征(即ASR模块的网络模型所需的数据格式)。
ASR模块:该模块可以获取前端处理模块输出的音频特征,通过声学模型和语言模型将上述音频特征转换为文本,以供NLU模块进行文本理解。在一种实现方式中,ASR模块包 括声学模型和语言模型,声学模型和语言模型可以是分开训练的,声学模型用于将音频特征中的音频特征转化为音素和/或文字,语言模型用于通过串联的方式将上述音素和/或文字转化为用户语音对应的文本。在一种实现方式中,ASR模块包括联合模型,联合模型是对声学模型和语言模型进行联合训练生成的模型,联合模型用于将上述音频特征转化为用户语音对应的文本。
NLU模块:该模块用于将用户的自然语言转换为机器能理解的结构化信息,即将上述语音文本转化为可执行的意图和槽位。上述意图用于指示用户诉求,上述槽位可以理解为完成上述用户诉求的相关参数。在一种实现方式中,NLU模块包括分类模型和序列标注模型,NLU模块通过分类模型(例如前述意图分类器)将上述语言文本分类为系统支持的意图,再使用序列标注模型(例如前述槽位分类器)标注上述语言文本中的槽位。
图像处理模块:该模块用于对摄像头采集的图像序列进行手部识别,并基于图像序列中手部的位置变化,确定用户的手部移动方向和手部移动速度。
DM模块:该模块用于基于对话的状态判断系统的执行策略(即电子设备100的下一步执行动作),例如继续询问用户、执行用户指令或推荐用户其他指令等。该模块可以包括对话状态追踪(Dialogue State Tracking,DST)模块,对话状态追踪模块记录了所有对话历史和状态信息,辅助系统结合上下文理解用户请求并给出合适的反馈。本申请实施例中,DM模块判断NLU模块识别的意图是否为“调节幅度”;若是,则向图像处理模块获取用户的手部移动方向和手部移动速度,进而获取手部移动方向对应的幅度调整方向和手部移动速度对应的幅度调整速度。DM模块还可以确定NLU模块识别的意图是否为“调节灵敏度”;若是,则基于NLU模块识别的槽位调整灵敏度。
NLG模块:该模块用于获取由DM模块选择的下一步执行动作和DST模块维护的当前对话状态(用户输入的语言文本、意图、槽位),并利用从文本到语音(TextToSpeech,TTS)模块将下一步执行动作和/或意图转化为语音播报,还可以由语音助手生成相应的播报卡片后,展示给用户。该模块可采用配置特定意图、特定场景下的模板,填入当前对话状态和执行动作后输出文本的方式来实现,也可以采用基于模型的黑盒实现方式,此处不做具体限定。
播报模块:播报模块用于根据NLG模块生成的播报内容,进行播报。播报内容有多种类型,例如语音播报、播报卡片的文字显示等,此处不做具体限定。
IMU模块:该模块用于检测在智能手环200的电子设备坐标系的三轴的加速度信号,以及智能手环200在上述地面坐标系中的姿态角。根据上述加速度信号和姿态角,IMU模块可以获取智能手环200沿地面坐标系的三轴的速度,以及智能手环200在地面坐标系中的移动方向;进而基于智能手环200在地面坐标系中的移动方向和预设移动方向(例如前述移动方向3)可以确定用户的手部移动方向,以及用户的手部移动速度。
在另一种应用场景中,在播放视频时,电子设备100(例如大屏设备)除了“基于用户的语音信息和手部移动,调整目标元素(例如音量、亮度、视频播放速度、视频播放进度等)的幅度”,电子设备100还可以控制视频的播放状态。
目前,现有的方案1中,电子设备在视频的播放过程中,判断进行视频播放的屏幕前是否有人脸存在;如果有人脸存在,则继续播放该视频;如果没有人脸存在,则将该视频暂停至该视频的当前播放时刻。通过实施上述方案,当视频的观看者离开时,电子设备可以自动将视频暂停在该视频的当前播放时刻,使得用户再次观看该视频时只需点击播放按钮即可从 上述当前播放时刻继续播放。然而若只根据人脸是否存在来控制视频的播放与暂停,当用户未离开屏幕的观看范围,却未观看屏幕播放的视频时,由于视频依然在播放,会导致用户错过部分视频片段。
在本申请提供的另一种方案(简称为方案2)中,电子设备100可以检测用户的视线是否注视屏幕;如果检测到视线注视屏幕,则继续播放该视频;如果检测到视线未注视屏幕,则将该视频暂停至该视频的当前播放时刻。通过实施上述方案,当用户的视线离开屏幕时,电子设备可以自动将视频暂停在该视频的当前播放时刻。相比于方案1,这样,可以避免用户错过视线离开屏幕后的视频片段。然而,用户观看视频时通常会适时地看向其他目标(例如手机、其他用户),仅根据视线控制视频暂停会导致视频的频繁误暂停。
本申请还提供了另一种方案(简称为方案3),该方案中电子设备100还可以通过摄像头收集用户的视线状态和人物状态,并结合上述视线状态、人物状态和当前视频片段的精彩度,智能控制视频的播放状态,有效提升用户的视频观看体验。
下面结合图14A至图14J对本申请实施例提供的方案3的应用场景进行介绍。
示例性的,参见图14A,电子设备100播放视频时,持续性地检测用户视线是否注视屏幕。参见图14B,电子设备100播放视频至视频的t1时刻(本申请实施例涉及的第一时刻可以为t1时刻)时,检测到用户视线离开屏幕,电子设备100记录t1时刻(例如,图14B所示的00:11:08)。参见图14C,检测到用户视线离开屏幕后,电子设备100持续性地检测用户是否离开屏幕的观看区域。参见图14D,电子设备100播放视频至视频的t2时刻(例如00:11:10)时,检测到用户离开屏幕的观看区域,电子设备100将视频回退至t1时刻(即图14D所示的00:10:23)并暂停。参见图14E,电子设备100将视频暂停至t1时刻后,若检测到用户视线重新注视屏幕,则电子设备100从t1时刻继续播放视频。
示例性的,参见图14F,t1时刻检测到用户视线离开屏幕后,在视频播放至t2时刻(例如00:11:10)时,电子设备100检测到用户未离开屏幕的观看区域;参见图14G,确定用户未离开屏幕的观看区域后,在t3时刻(例如00:11:11)电子设备100判断当前的视频片段是否精彩,若精彩,则在当前时刻(即t3时刻)暂停视频播放。参见图14H,电子设备100将视频暂停至t3时刻后,若检测到用户视线重新注视屏幕,则电子设备100从t3时刻继续播放视频。
示例性的,参见图14I,确定用户未离开屏幕的观看区域后,在t3时刻电子设备100判断当前的视频片段是否精彩,若不精彩,则确定用户视线与屏幕的交互参数,例如交互频率、注视时长等。其中,交互频率指单位时间内用户视线离开屏幕的频率,注视时长为用户视线本次注视屏幕的时长。上述交互参数可以是电子设备100默认设置的,也可以是用户按照自己的需求设置的。本申请实施例中,电子设备100基于交互参数控制视频的播放速度。在一种实现方式中,交互频率越大,视频播放速度越大;注视时长大于等于预设阈值时,播放速度为正常速度;注视时长小于预设阈值时,注视时长越小,视频播放速度越大。示例性的,参见图14J,电子设备100确定交互频率大于第二阈值时,以正常播放速度的2倍速播放视频,并显示2倍速的指示符302。
示例性的,结合图14A至图14J所示的示例性的应用场景,图15示出了本申请实施例提供的一种设备控制方法的流程图。该设备控制方法包括但不限于下述步骤S201至S209。
S201、电子设备100播放视频时,通过摄像头采集的图像持续性检测用户的视线是否注 视屏幕;电子设备100播放视频至视频的t1时刻,检测到视线离开屏幕(即视线未注视屏幕),电子设备100记录t1时刻。
示例性的,参考图14A和图14B,用户观看电子设备100播放的视频时,用户的视线可能离开屏幕,看向其他方向。
本申请实施例中,电子设备100播放视频时,通过摄像头持续性的采集屏幕前的图像;电子设备100可以利用视线追踪(Eye tracking/gaze tracking)技术,基于上述摄像头采集的图像获取用户眼睛的注视点和/或视线方向;电子设备100基于视线方向和/或注视点,可以判断用户的视线是否离开屏幕。其中,注视点可以为用户的视线在屏幕的所在平面上的聚焦点;视线方向可通过视线在预设坐标系中的视角和/或视线向量来表示。
示例性的,参照图16,上述预设坐标系为电子设备100的电子设备坐标系,该坐标系中y轴为沿电子设备100的屏幕的下边从左侧边指向右侧边的方向,x轴为沿电子设备100的左侧边从下边指向上边的方向,z轴垂直于x轴和y轴的组成平面,z轴从电子设备100的背面垂直指向电子设备100的屏幕。
在一种实现方式中,上述视角可以包括:视线方向与水平面(y轴与z轴组成的平面)的夹角α、视线方向与侧平面(x轴和z轴组成的平面)的夹角β、视线方向与竖平面(x轴和y轴组成的平面)的夹角γ。视线向量是以眼睛的位置为起点、以注视点的位置为终点的方向矢量,该方向矢量中可包含眼睛在预设坐标系中的三维坐标和注视点在预设坐标系中的三维坐标。示例性的,如图16所示,用户的视线1的注视点为E(a1,b1,0),该注视点位于电子设备100的屏幕上;用户的视线2的注视点为F(a2,b2,0),该注视点位于电子设备100的屏幕之外;视线1的视线方向对应的3个视角分别为α,β和γ,视线1的视线向量为向量GE;用户眼睛的位置为G(a3,b3,c1)。
在一些实施例中,参见图16,电子设备100判断用户眼睛的注视点坐标是否位于电子设备100的屏幕上;若是,则确定用户的视线注视屏幕;若否,则确定用户的视线离开屏幕。
在一些实施例中,参见图16,电子设备100基于用户眼睛的位置(例如G(a3,b3,c1))和视线方向(例如α,β和γ)可以确定视线1与屏幕所在平面的交点(即注视点);电子设备100判断用户眼睛的注视点坐标是否位于电子设备100的屏幕上;若是,则确定用户的视线注视屏幕;若否,则确定用户的视线离开屏幕。
在一些实施例中,电子设备100可以提前获取视线追踪模型;上述视线追踪模型的输入为包含眼睛的图像或该图像中的眼睛特征参数(例如眼睛开合程度、左眼内外眼角位置、右眼内外眼角位置、瞳孔位置等),上述视线追踪模型的输出为上述图像中眼睛在预设坐标系中的视线方向和/或注视点。在一种实现方式中,训练设备(例如云端服务器或电子设备100)采集大量包含眼睛的训练图像,并获取各训练图像对应的眼睛特征参数和视线方向(或注视点),并利用上述眼睛特征参数和视线方向(或注视点)对上述视线追踪模型进行训练,从而获取已训练的视线追踪模型。上述预设坐标系、眼睛特征参数和注视点的坐标可以是二维的,也可以是三维的,此处不做具体限定。在一种实现方式,电子设备100播放视频时,通过摄像头实时采集图像;提取上述图像中的眼睛特征参数,并将上述眼睛特征参数输入视线追踪模型,视线追踪模型输出用户视线的注视点或视线方向。
本申请实施例中,电子设备100可以通过各种方式确定用户的眼睛在预设坐标系中的三维位置,此处不做具体限定。在一些实施例中,电子设备100获得眼睛在预设坐标系中的三维位置,具体包括:电子设备对摄像头采集的图像进行人脸检测,获取人脸的至少一个特征 点的上述图像中的二维位置;结合预先构建的3D人脸模型对人脸的二维特征点进行透视n点投影算法(Perspective-n-Point,PnP)的求解,获取人脸的二维特征点在预设坐标系的三维特征点。基于人脸的三维特征点可以估计眼睛在预设坐标系的三维位置。
S202、电子设备100检测用户是否离开屏幕的观看范围;若检测到用户是否离开屏幕的观看范围,则执行S203,否则执行S206。
在一些实施例中,电子设备100对摄像头采集的图像进行人体检测或人脸检测;若未检测到人体或人脸,则电子设备100确定用户离开屏幕的观看范围。人体检测或人脸检测可以采用单帧多尺度检测器(Single Shot MultiBox Detector,SSD)神经网络模型。
在一些实施例中,用户佩戴有可穿戴设备(例如智能手环200),电子设备100可以实时获取智能手环200的位置(例如距离和方位);电子设备100根据屏幕的朝向可以确定屏幕的观看范围,即可以看到屏幕的距离范围和方位范围;当电子设备100确定智能手环200的距离超过上述距离范围,和/或,智能手环200的方位超过上述方位范围时,电子设备100确定用户离开屏幕的观看范围。
S203、电子设备100视频回退至t1时刻并暂停。
可以理解,用户视线未注视屏幕时,电子设备100持续性地检测用户是否离开屏幕的观看范围;若检测到用户离开,则将视频回退至视线离开屏幕的t1时刻并暂停。
S204、电子设备100检测到用户视线注视屏幕时,控制视频继续播放。
可以理解,电子设备100将视频并暂停至t1时刻后,继续通过摄像头采集的图像持续性的检测用户的视线是否注视屏幕。当检测到用户视线再次注视屏幕时,电子设备100控制视频从t1时刻继续播放。
S205、电子设备100判断当前播放的视频片段是否精彩;若视频播放至视频的t3时刻时,判断视频片段精彩,则执行S206,否则执行S207。
可以理解,用户视线未注视屏幕时,若检测到用户未离开屏幕的观看范围,则电子设备100判断当前播放的视频片段是否精彩。
在一些实施例中,电子设备100可以提前获取精彩度评估模型。步骤S205中,电子设备100提取当前播放的视频片段的视频特征;将上述视频特征输入上述精彩度评估模型,上述精彩度评估模型输出上述视频片段的精彩度评估结果。其中,精彩度评估模型可以使用膨胀3D卷积(Inflated 3D conv,I3D)模型,I3D模型将在Imagenet(即大型可视化数据库)上训练成功的经典模型迁移到视频(video)数据集上,利用3D卷积提取摄像头采集的图像对应的RGB流(stream)的时态特征(temporal feature),最后再利用光流(optical-flow)提升网络性能,最终可以得到良好的精彩度评估模型。在一种实现方式中,精彩度评估结果为指示精彩的标识1或指示不精彩的标识2。在一种实现方式中,精彩度评估结果为数值,若该数值大于预设值,则判断上述视频片段精彩,否则判断上述视频片段不精彩。
在一些实施例中,视频片段的精彩度也可以是视频的制作商、供应商或版权方预先设置的。视频的制作商意图着重将精彩的视频片段展现给用户,给视频中精彩的视频片段附加了指示精彩的标识。这样,步骤S206中电子设备100在播放至精彩的视频片段时,可以暂停视频,避免用户错过上述视频片段。
需要说明的是,电子设备100播放的视频可以包括至少一个视频片段,不同视频片段间可以重合,也可以不重合,此处不做具体限定。
在一种实现方式中,视频片段是预先划分好的,连续且不重合的,步骤S205中电子设备 100判断当前播放的视频片段是否精彩。在一种实现方式中,电子设备100基于视频当前播放的时刻,截取该视频的一个视频片段进行精彩度识别,该视频片段包括视频当前播放的时刻,该视频片段的时长可以是电子设备100预先设置的,也可以是用户按照自己的需求设置的。
S206、电子设备100将视频暂停至当前播放的t3时刻。
本申请实施例中,电子设备100将视频暂停后,可以执行S205。可以理解,当检测到用户视线再次注视屏幕时,电子设备100控制视频从t3时刻继续播放。
S207、电子设备100获取用户的视线和电子设备100的屏幕的交互参数。
S208、电子设备100基于交互参数控制视频播放速度。
可以理解,当视频片段不精彩时,用户倾向于看其他目标(例如手机)等,但是又怕错过精彩片段,因此,用户可能时不时看向屏幕。
在一些实施例中,用户的视线和电子设备100的屏幕的交互参数包括交互频率和/或注视时长。其中,交互频率指预设时间内用户视线离开(或回归)屏幕的频率。注视时长指用户单次注视屏幕的时长。
在一种实现方式,交互频率大于第二阈值时,视频播放速度为第一速度,第一速度大于正常播放速度,例如第一速度为正常播放速度的2倍速;交互频率小于等于第二阈值时,视频播放速度为正常播放速度。在一种实现方式,交互频率大于第二阈值时,交互频率越大,视频播放速度越大。在一种实现方式,注视时长小于第三阈值时,视频播放速度为第二速度,第二速度大于正常播放速度,例如第二速度为正常播放速度的1.5倍速;注视时长大于等于第三阈值时,视频播放速度为正常播放速度。在一种实现方式,注视时长小于第三阈值时,注视时长越小,视频播放速度越大。
可以理解,交互频率较大和/或注视时长较小时,可以适当提升视频的播放速度;交互频率较小和/或注视时长较大时,以正常速度播放视频。
示例性的,当视线离开屏幕的频率大于等于每10秒3次,将视频播放速度提升至2倍速;当视线注视屏幕的时长大于1秒时,播放速度为正常播放速度。第二阈值、第三阈值可以是电子设备100默认设置的,也可以用户按照自己的需求设置的。
在一些实施例中,步骤S205中,若电子设备100判断当前播放的视频片段不精彩,则电子设备100在执行S207的同时,还返回执行S202,即继续检测用户是否离开。可以理解,电子设备100在执行S207和S208期间,若电子设备100检测到用户离开,则将视频暂停至当前播放的时刻;若电子设备100未检测到用户离开,则继续执行S205,即判断视频是否精彩,以此类推,循环往复。
在一些实施例,参考图17,可以将图15所示的步骤S202替换为S210。
S210、电子设备100判断用户的视线离开屏幕的时长是否大于第一阈值;若是,则执行S203,否则执行S205。
在一些实施例中,步骤S201之前,电子设备100还判断当前播放的视频判断是否为广告;若是,则在步骤S201之后执行S210;否则,在S201之后执行S202。
在一些实施例中,步骤S201之前,电子设备100还判断当前所处环境是否为电梯;若是,则在步骤S201之后执行S210;否则,在S201之后执行S202。
在一种实现方式中,电子设备100通过对摄像头采集的电梯图像进行电梯识别,确定当前所处环境是否为电梯。在一种实现方式中,通过加速度传感器检测加速和减速减、气压传 感器检测气压变化等手段可以识别电梯中的失重和超重状态,从而确定当前所处环境是否为电梯。
在一些实施例中,电子设备100持续性地对摄像头采集的图像进行人脸识别,并利用视线追踪技术对识别到的每个用户或特定的预设用户(例如用户1)进行视线追踪,检测该用户的视线是否注视屏幕;步骤201中当检测到用户1的视线离开屏幕时,电子设备100记录视频当前播放的t1时刻;步骤202中通过人脸检测确定用户1是否离开屏幕的观看范围;步骤204中确定用户1离开屏幕的观看范围后,继续对摄像头采集的图像进行人脸识别,当识别到用户1时,通过视线追踪技术继续检测用户1的视线是否再次注视屏幕;若检测到用户1的视线再次注视屏幕,则控制视频继续播放。步骤S207中电子设备100继续对摄像头采集的图像进行人脸识别,并利用视线追踪技术获取用户1的视线和电子设备100的屏幕的交互参数。可以理解,若针对特定的预设用户进行人脸识别,电子设备100存储有该用户的人脸特征或人脸图像。
本申请实施例中,在检测到用户视线离开屏幕以及用户离开时,可以将视频回退至用户视线离开屏幕的t1时刻,待用户视线回归屏幕时再继续播放。这样,可以避免用户错过视线离开屏幕后播放的视频片段。和上述方案1相比,可以避免用户视线离开屏幕到用户离开屏幕的观看范围的这段时间内的视频片段丢失;和上述方案2相比,也避免了用户视线意外离开屏幕导致的视频的频繁误暂停。
本申请实施例中,在检测到用户视线离开屏幕但用户未离开屏幕的观看范围时,还可以通过视频片段的精彩度控制视频的播放和暂停。在视频片段精彩时,电子设备100控制视频暂停,可以防止精彩的视频片段的丢失;在视频片段不精彩,且用户较少看屏幕时,可以加快视频播放速度,提高用户的观看效率。可以理解,在视频片段不精彩时,用户能够通过视线与屏幕的交互频率和注视时长控制视频的播放速度,体现了用户自主化的视频控制。
此外,本申请实施例中,针对一些特定环境(例如电梯)或特定视频(例如广告),采用图17所示的设备控制方法。
例如,上述特定环境为电梯,电梯墙上的电子设备100播放视频时,由于电梯总是有人进进出出,通过人体检测或人脸检测总是可以检测到用户在屏幕的观看范围内,此时可以采用图17所示的设备控制方法,即利用视线离开时长检测替代人体检测(或人脸检测)。
此外,本申请实施例中,通过持续性地对摄像头采集的图像进行人脸识别,针对识别到的每个用户(例如用户1),可以基于该用户的视线实现视频播放的控制。这样,针对多用户观看视频的情况,有效实现了个性化的视频控制。
本申请的各实施方式可以任意进行组合,以实现不同的技术效果。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介 质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。
总之,以上所述仅为本申请技术方案的实施例而已,并非用于限定本申请的保护范围。凡根据本申请的揭露,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (26)

  1. 一种设备控制方法,其特征在于,所述方法包括:
    第一电子设备接收第一语音信息;
    当所述第一电子设备确定所述第一语音信息对应的意图为调整幅度时,基于所述第一语音信息确定幅度调整的目标元素;
    所述第一电子设备获取用户的手部移动参数;
    所述第一电子设备确定所述手部移动参数对应的第一幅度调整参数;
    所述第一电子设备以所述第一幅度调整参数调整所述目标元素的幅度。
  2. 根据权利要求1所述的方法,其特征在于,所述第一电子设备以所述第一幅度调整参数调整所述目标元素的幅度之前,还包括:
    所述第一电子设备获取所述用户的手部移动方向;
    所述第一电子设备确定所述手部移动方向对应的幅度调整方向为第一幅度调整方向,所述幅度调整方向包括幅度调大和幅度调小;
    所述第一电子设备以所述第一幅度调整参数调整所述目标元素的幅度,包括:
    所述第一电子设备沿所述第一幅度调整方向以所述第一幅度调整参数调整所述目标元素的幅度。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当所述第一电子设备确定所述第一语音信息对应的意图为调整幅度时,第一电子设备还基于所述第一语音信息对应的槽位确定幅度调整方向为第一幅度调整方向,幅度调整方向包括幅度调大和幅度调小;
    所述第一电子设备以所述第一幅度调整参数调整所述目标元素的幅度,包括:
    所述第一电子设备沿所述第一幅度调整方向以所述第一幅度调整参数调整所述目标元素的幅度。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述手部移动参数为手部移动速度,所述第一幅度调整参数为第一幅度调整速度;
    或者,所述手部移动参数为手部移动距离,所述第一幅度调整参数为第一幅度调整值。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述第一电子设备确定所述手部移动参数对应的第一幅度调整参数,具体包括:
    所述第一电子设备基于所述手部移动参数和第一灵敏度确定所述第一幅度调整参数;
    所述手部移动参数一定时,所述第一灵敏度越大,所述第一幅度调整参数越大;所述第一灵敏度一定时,所述手部移动参数越大,所述第一幅度调整参数越大。
  6. 根据权利要求5所述的方法,其特征在于,所述第一电子设备确定所述手部移动参数对应的第一幅度调整参数之后,所述方法还包括:
    所述第一电子设备接收第二语音信息;
    当所述第一电子设备确定所述第二语音信息对应的意图为调整灵敏度时,基于所述第一语音信息对应的槽位确定第一灵敏度调整方向和第一灵敏度调整值;
    沿所述第一灵敏度调整方向以所述第一灵敏度调整值调整所述第一灵敏度。
  7. 根据权利要求1至5任一项所述的方法,其特征在于,所述第一电子设备确定所述手部移动参数对应的第一幅度调整参数之后,所述方法还包括:
    所述第一电子设备检测到第一预设条件时,结束所述目标元素的本次幅度调整;所述第一预设条件包括以下一项或多项:接收所述第一语音信息后的时长超过第一预设时长;第二预设时长内,未检测到所述用户的手部的有效移动;接收到用于停止幅度调整的第一预设手势;接收到用于停止幅度调整的第三语音信息;其中,手部的有效移动指:手部沿预设移动方向的移动距离大于距离阈值,或者,手部的移动速度大于速度阈值。
  8. 根据权利要求2所述的方法,其特征在于,所述第一电子设备获取用户的手部移动参数,具体包括:
    所述第一电子设备获取摄像头采集的第一图像和第二图像;
    所述第一电子设备通过手部识别,获取所述第一图像和所述第二图像中的手部的第一特征点的位置;
    基于所述第一图像和所述第二图像中的手部的第一特征点的位置,确定所述用户的所述手部移动参数。
  9. 根据权利要求8所述的方法,其特征在于,所述手部移动方向是基于所述第一图像和所述第二图像中的手部的所述第一特征点的位置确定的。
  10. 根据权利要求2所述的方法,其特征在于,第二电子设备佩戴于所述用户的手部,所述第一电子设备获取用户的手部移动参数,包括:
    所述第一电子设备向所述第二电子设备发送获取请求;
    所述第一电子设备接收所述第二电子设备发送的所述第二电子设备在第一坐标系的移动速度和/或移动距离;
    基于所述第二电子设备在所述第一坐标系的移动速度和/或移动距离确定所述用户的所述手部移动参数。
  11. 根据权利要求10所述的方法,其特征在于,所述手部移动方向是基于所述第二电子设备在所述第一坐标系的移动方向确定的。
  12. 根据权利要求2以及权利要求8至11中的任一项所述的方法,其特征在于,幅度调整方向包括幅度调大和幅度调小,所述第一电子设备中预置有第一映射关系,第一映射关系中幅度调大对应至少一个预设移动方向,所述第一映射关系中幅度调小对应至少一个预设移动方向,幅度调大对应的预设移动方向不同于幅度调小对应的预设移动方向;
    当基于所述第一映射关系确定所述手部移动方向属于幅度调大对应的预设移动方向时,所述第一幅度调整方向为幅度调大;
    当基于所述第一映射关系确定所述手部移动方向属于幅度调小对应的预设移动方向时,所述第一幅度调整方向为幅度调小。
  13. 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:
    当第一电子设备确定所述第一语音信息对应的意图为调整幅度时,还基于所述第一语音信息对应的槽位确定幅度调整的目标设备为第三电子设备;
    所述第一电子设备以第一幅度调整参数调整所述目标元素的幅度,包括:
    所述第一电子设备向所述第三电子设备发送调整请求,以控制所述第三电子设备沿所述第一幅度调整方向以所述第一幅度调整参数调整目标元素的幅度,所述调整请求携带所述目标元素、所述第一幅度调整参数和所述第一幅度调整方向。
  14. 根据权利要求1至13任一项所述的方法,其特征在于,所述第一电子设备获取用户的手部移动参数,包括:
    所述第一电子设备获取所述用户的第二预设手势的手部移动参数。
  15. 根据权利要求14所述的方法,其特征在于,所述第一电子设备获取用户的手部移动参数,包括:
    所述第一电子设备获取摄像头采集的第一图像和第二图像;
    所述第一电子设备通过手势识别,识别所述第一图像和所述第二图像是否包含第二预设手势;
    所述第一电子设备获取用户的第一手势的手部移动参数,包括:
    当所述第一图像和所述第二图像包含所述第二预设手势时,基于所述第一图像和所述第二图像中的手部的第一特征点的位置,确定所述用户的所述手部移动参数。
  16. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述第一电子设备播放视频;
    所述第一电子设备播放所述视频至所述视频的第一时刻时,检测到用户的视线离开所述第一电子设备的屏幕,所述第一电子设备记录所述第一时刻;
    所述第一电子设备检测所述用户是否离开所述屏幕的观看范围或所述用户的视线离开所述屏幕的时长是否超过第一阈值;
    当检测到所述用户离开所述屏幕的观看范围或所述用户的视线离开所述屏幕的时长超过所述第一阈值时,所述第一电子设备将所述视频回退至所述第一时刻并暂停。
  17. 根据权利要求16所述的方法,其特征在于,所述方法还包括:
    所述第一电子设备暂停所述视频后,当所述第一电子设备检测到所述用户的视线再次注视所述屏幕时,控制所述视频继续播放。
  18. 根据权利要求16或17所述的方法,其特征在于,所述方法还包括:
    当检测到所述用户未离开所述屏幕的观看范围或所述用户的视线离开所述屏幕的时长未超过所述第一阈值时,所述第一电子设备检测当前播放的视频片段是否精彩;
    当检测到当前播放的视频片段精彩时,所述第一电子设备将所述视频暂停。
  19. 根据权利要求18所述的方法,其特征在于,所述方法还包括:
    当检测到当前播放的视频片段不精彩时,获取所述用户的视线与所述屏幕的交互参数,并基于所述交互参数控制所述视频的播放速度。
  20. 根据权利要求19所述的方法,其特征在于,所述交互参数包括交互频率和/或注视时长,交互频率指预设时间内所述用户的视线离开所述屏幕的频率,所述注视时长指所述用户单次注视所述屏幕的时长,所述基于所述交互参数控制所述视频的播放速度,包括:
    当所述交互频率大于第二阈值时,控制视频播放速度为第一速度,所述第一速度大于正常播放速度;当所述交互频率小于等于所述第二阈值时,控制视频播放速度为所述正常播放速度;
    当所述注视时长小于第三阈值时,控制视频播放速度为第二速度,所述第二速度大于所述正常播放速度;当所述注视时长大于等于所述第三阈值时,控制视频播放速度为所述正常播放速度。
  21. 根据权利要求18所述的方法,其特征在于,所述方法还包括:
    当检测到当前播放的视频片段不精彩时,所述第一电子设备还检测所述用户是否离开所述屏幕的观看范围或所述用户的视线离开所述屏幕的时长是否超过所述第一阈值。
  22. 根据权利要求16至21任一项所述的方法,其特征在于,所述检测到用户的视线离开所述第一电子设备的屏幕之前,所述方法还包括:
    所述第一电子设备判断当前所处环境是否为电梯;
    若当前所处环境为电梯,所述检测到用户的视线离开所述第一电子设备的屏幕之后,所述第一电子设备检测所述用户的视线离开所述屏幕的时长是否超过所述第一阈值。
  23. 根据权利要求18所述的方法,其特征在于,所述视频包括至少一个预先划分好的视频片段,视频片段是否精彩度可以是视频的制作商、供应商或版权方预先设置的。
  24. 一种电子设备,其特征在于,包括收发器、处理器和存储器,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如权利要求1-23任一项所述的方法。
  25. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1-23中任一项所述的设备控制方法。
  26. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-23任一项所述的设备控制方法。
PCT/CN2022/142260 2021-12-28 2022-12-27 设备控制方法及相关装置 WO2023125514A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111633811.1A CN116360583A (zh) 2021-12-28 2021-12-28 设备控制方法及相关装置
CN202111633811.1 2021-12-28

Publications (1)

Publication Number Publication Date
WO2023125514A1 true WO2023125514A1 (zh) 2023-07-06

Family

ID=86905688

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142260 WO2023125514A1 (zh) 2021-12-28 2022-12-27 设备控制方法及相关装置

Country Status (2)

Country Link
CN (1) CN116360583A (zh)
WO (1) WO2023125514A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100079508A1 (en) * 2008-09-30 2010-04-01 Andrew Hodge Electronic devices with gaze detection capabilities
CN106527671A (zh) * 2015-09-09 2017-03-22 广州杰赛科技股份有限公司 一种设备隔空控制方法
CN110415695A (zh) * 2019-07-25 2019-11-05 华为技术有限公司 一种语音唤醒方法及电子设备
WO2021017836A1 (zh) * 2019-07-30 2021-02-04 华为技术有限公司 控制大屏设备显示的方法、移动终端及第一系统
WO2021170062A1 (zh) * 2020-02-28 2021-09-02 华为技术有限公司 隔空手势的调节方法及终端
CN113467735A (zh) * 2021-06-16 2021-10-01 荣耀终端有限公司 图像调整方法、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100079508A1 (en) * 2008-09-30 2010-04-01 Andrew Hodge Electronic devices with gaze detection capabilities
CN106527671A (zh) * 2015-09-09 2017-03-22 广州杰赛科技股份有限公司 一种设备隔空控制方法
CN110415695A (zh) * 2019-07-25 2019-11-05 华为技术有限公司 一种语音唤醒方法及电子设备
WO2021017836A1 (zh) * 2019-07-30 2021-02-04 华为技术有限公司 控制大屏设备显示的方法、移动终端及第一系统
WO2021170062A1 (zh) * 2020-02-28 2021-09-02 华为技术有限公司 隔空手势的调节方法及终端
CN113467735A (zh) * 2021-06-16 2021-10-01 荣耀终端有限公司 图像调整方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN116360583A (zh) 2023-06-30

Similar Documents

Publication Publication Date Title
WO2021063343A1 (zh) 语音交互方法及装置
US11922935B2 (en) Voice interaction method and apparatus, terminal, and storage medium
CN112567457B (zh) 语音检测方法、预测模型的训练方法、装置、设备及介质
WO2020168929A1 (zh) 对特定路线上的特定位置进行识别的方法及电子设备
WO2020151387A1 (zh) 一种基于用户运动状态的推荐方法及电子设备
US11031005B2 (en) Continuous topic detection and adaption in audio environments
WO2020073288A1 (zh) 一种触发电子设备执行功能的方法及电子设备
WO2021209047A1 (zh) 传感器调整方法、装置和电子设备
WO2021052139A1 (zh) 手势输入方法及电子设备
WO2022127787A1 (zh) 一种图像显示的方法及电子设备
WO2022042766A1 (zh) 信息显示方法、终端设备及计算机可读存储介质
WO2022052776A1 (zh) 一种人机交互的方法、电子设备及系统
CN114242037A (zh) 一种虚拟人物生成方法及其装置
WO2022143258A1 (zh) 一种语音交互处理方法及相关装置
WO2022062884A1 (zh) 文字输入方法、电子设备及计算机可读存储介质
WO2021190225A1 (zh) 一种语音交互方法及电子设备
CN111524528B (zh) 防录音检测的语音唤醒方法及装置
CN115145436A (zh) 一种图标处理方法及电子设备
WO2023125514A1 (zh) 设备控制方法及相关装置
WO2022095983A1 (zh) 一种防止手势误识别的方法及电子设备
WO2022267783A1 (zh) 确定推荐场景的方法及电子设备
CN113380240B (zh) 语音交互方法和电子设备
CN114822543A (zh) 唇语识别方法、样本标注方法、模型训练方法及装置、设备、存储介质
CN115083401A (zh) 语音控制方法及装置
CN113742460B (zh) 生成虚拟角色的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914767

Country of ref document: EP

Kind code of ref document: A1