WO2023151360A1 - 一种电子设备控制方法、装置及电子设备 - Google Patents

一种电子设备控制方法、装置及电子设备 Download PDF

Info

Publication number
WO2023151360A1
WO2023151360A1 PCT/CN2022/136694 CN2022136694W WO2023151360A1 WO 2023151360 A1 WO2023151360 A1 WO 2023151360A1 CN 2022136694 W CN2022136694 W CN 2022136694W WO 2023151360 A1 WO2023151360 A1 WO 2023151360A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
gesture recognition
target
electronic device
recognition device
Prior art date
Application number
PCT/CN2022/136694
Other languages
English (en)
French (fr)
Inventor
李凌飞
唐晨
邰彦坤
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023151360A1 publication Critical patent/WO2023151360A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present application relates to the technical field of terminals, and in particular to an electronic equipment control method, device and electronic equipment.
  • Gesture recognition has a wide range of applications in human-computer interaction, game entertainment, and vehicle applications.
  • the gesture recognition device perceives the gesture environment by means of vision, infrared, radar, etc.
  • the entire device is in the working state, resulting in increased power consumption of the system, which is not suitable for use in low-power devices such as vehicle-mounted devices, mobile devices, or portable devices.
  • multi-modal interaction technologies such as visual, auditory, tactile and other interactive technologies
  • the present application provides an electronic device control method, device, electronic device, computer readable storage medium, and computer program product, which can realize the control of the gesture recognition device based on the number of users who make sounds in the environment, thereby avoiding The situation that the identification device is continuously in the working state causes the power consumption of the system to increase.
  • the present application provides a method for controlling an electronic device, the method includes: acquiring a first sound of a first duration through a sound collection device; judging whether the first sound contains a target sound, and the target sound is a sound made by a real person; When a sound contains the target sound, the second sound of the second duration is obtained through the sound collection device; whether the second sound contains the target sound is judged, and whether the number of voiceprints contained in the second sound is greater than the preset number; when When the second sound contains the target sound, and the number of voiceprints contained in the second sound is greater than the preset number, the gesture recognition device matched with the electronic device is controlled to work.
  • the gesture recognition device supporting the electronic equipment is controlled to work, so that it can be activated when necessary. Gesture recognition, and gesture recognition can be restricted in unnecessary situations, thereby reducing power consumption of the gesture recognition device.
  • controlling the operation of a gesture recognition device matched with the electronic device specifically includes: controlling the gesture recognition device to switch from an off state to an on state.
  • controlling the operation of the gesture recognition device matched with the electronic device specifically includes: controlling the detection frequency of the gesture recognition device to switch from a first frequency to a second frequency, and the first frequency is lower than the second frequency.
  • controlling the operation of the gesture recognition device matched with the electronic device specifically includes: controlling the detection precision of the gesture recognition device to switch from the first precision to the second precision, and the first precision is smaller than the second precision.
  • the method further includes: acquiring a third sound with a third duration through the sound collection device; judging whether the third sound contains the target sound, or , whether the number of voiceprints contained in the third voice is greater than the preset number; when the third voice does not contain the target voice, or the number of voiceprints contained in the third voice is less than or equal to the preset number, limit
  • the gesture recognition device works; when the third sound contains the target sound, and the number of voiceprints contained in the third sound is greater than the preset number, continue to control the gesture recognition device to work.
  • the method further includes: when the first sound does not contain the target sound, restricting the gesture recognition device from working.
  • the method further includes: when the target sound is not included in the second sound, and/or the number of voiceprints included in the second sound is less than or equal to a preset number, restricting the gesture recognition device Work.
  • the method further includes: judging whether the first sound or the second sound contains the target sound through voiceprint recognition.
  • the present application provides a method for controlling an electronic device, the method comprising: acquiring a first sound of a first duration through a sound collection device; determining the number of target users in the environment according to the first sound, and the target user is the user who made the sound ; When the number of target users is greater than the preset number, control the gesture recognition device matched with the electronic device to work.
  • the gesture recognition device supporting the electronic equipment is controlled to work, so that it can be activated when necessary. Gesture recognition, and gesture recognition can be restricted in unnecessary situations, thereby reducing power consumption of the gesture recognition device.
  • determining the number of target users in the environment specifically includes: judging whether the first sound contains the target sound, and the target sound is a sound made by a real person; when the first sound contains the target When the voice is used, the number of target users is determined according to the number of voiceprints included in the first voice.
  • determining the number of target users in the environment specifically includes: judging whether the first sound contains the target sound, and the target sound is a sound made by a real person; when the first sound contains the target When the sound is heard, the second sound of the second duration is obtained by the sound acquisition device; whether the target sound is judged in the second sound; when the target sound is included in the second sound, according to the number of voiceprints contained in the second sound The number of target users.
  • controlling the operation of the gesture recognition device matched with the electronic device specifically includes: controlling the gesture recognition device to switch from the off state to the on state; or, controlling the detection frequency of the gesture recognition device to switch from the first frequency to The second frequency, the first frequency is lower than the second frequency; or, the detection accuracy of the control gesture recognition device is switched from the first accuracy to the second accuracy, and the first accuracy is smaller than the second accuracy.
  • the method further includes: acquiring a third sound with a third duration through the sound collection device; judging whether the third sound contains the target sound, or , whether the number of voiceprints contained in the third voice is greater than the preset number; when the third voice does not contain the target voice, or the number of voiceprints contained in the third voice is less than or equal to the preset number, limit
  • the gesture recognition device works; when the third sound contains the target sound, and the number of voiceprints contained in the third sound is greater than the preset number, continue to control the gesture recognition device to work.
  • the method further includes: when the number of target users is less than or equal to a preset number, restricting the gesture recognition device from working.
  • the present application provides an electronic device control device, including: at least one processor and an interface; at least one processor obtains program instructions or data through the interface; at least one processor is used to execute program instructions to achieve the first Aspect or the method provided in the second aspect.
  • the present application provides an electronic device, which includes at least one memory for storing programs and at least one processor for executing the programs stored in the memory. Wherein, when the program stored in the memory is executed, the processor is configured to execute the method provided in the first aspect or the second aspect.
  • the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is run on an electronic device, the electronic device executes the computer program as provided in the first aspect or the second aspect.
  • the present application provides a computer program product.
  • the computer program product runs on an electronic device, the electronic device executes the method as provided in the first aspect or the second aspect.
  • FIG. 1 is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present application
  • Fig. 2 is a schematic diagram of gesture recognition by a gesture recognition device provided in an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a method for controlling an electronic device provided in an embodiment of the present application
  • FIG. 4 is a schematic diagram of a process of an electronic device control method provided in an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a method for controlling an electronic device provided in an embodiment of the present application
  • Fig. 6 is a schematic structural diagram of an electronic equipment control device provided by an embodiment of the present application.
  • first and second and the like in the specification and claims herein are used to distinguish different objects, rather than to describe a specific order of objects.
  • first response message and the second response message are used to distinguish different response messages, rather than describing a specific order of the response messages.
  • words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design scheme described as “exemplary” or “for example” in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner.
  • multiple means two or more, for example, multiple processing units refer to two or more processing units, etc.; multiple A component refers to two or more components or the like.
  • the gesture recognition device matched with the electronic device can be controlled to acquire the user's gesture and perform gesture recognition.
  • the gesture recognition device can be in a low power consumption state in a scenario where multi-person dialogue is not satisfied, such as in a state of stopping work, or monitoring the state of the user's gestures at a lower frequency, thereby reducing the power consumption of the gesture recognition device.
  • the electronic device may be a mobile phone, a tablet computer, a wearable device, a smart TV, a Huawei smart screen, a smart speaker, a car machine, and the like.
  • Exemplary embodiments of electronic devices include, but are not limited to, electronic devices equipped with iOS, android, Windows, Harmony OS, or other operating systems. The embodiment of the present application does not specifically limit the type of the electronic device.
  • FIG. 1 shows a schematic diagram of a hardware structure of an electronic device.
  • the electronic device 100 may include: the electronic device 100 may include: a processor 110 , a memory 120 , a pickup 130 and a gesture recognition device 140 .
  • the processor 110 may be a general purpose processor or a special purpose processor.
  • the processor 110 may include a central processing unit (central processing unit, CPU) and/or a baseband processor.
  • the baseband processor can be used to process communication data
  • the CPU can be used to implement corresponding control and processing functions, execute software programs, and process data of the software programs.
  • the processor 110 may perform echo cancellation processing based on the audio signal collected by the pickup 130, extract voiceprint features in the audio signal, determine whether the audio signal is a sound signal from a real person, and determine whether the audio signal contains The number of voiceprints and so on.
  • the processor 110 can also perform speech recognition (automatic speech recognition, ASR), natural language understanding (natural language understanding, NLU), dialog management (dialogue management, DM), natural language Generate (natural language generation, NLG), speech synthesis (text to speech, TTS), etc., to recognize the voice of the target user.
  • ASR automatic speech recognition
  • NLU natural language understanding
  • dialog management dialogue management
  • DM natural language Generate
  • NLG natural language generation
  • TTS text to speech, TTS
  • processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem (modem), a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video encoder One or more of a decoder, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc.
  • electronic device 100 may include one or more processors 110 . Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • the memory 120 may store a program, and the program may be executed by the processor 110, so that the processor 110 executes the method provided in this application.
  • the memory 120 may also store data.
  • the processor 110 can read the data stored in the memory 120 (for example, the target user's voiceprint feature data, etc.).
  • the memory 120 and the processor 110 may be provided separately.
  • the memory 120 may also be integrated in the processor 110 .
  • the pickup 130 may also be called a "microphone” or a “microphone”, and is used to convert sound signals into electrical signals.
  • Electronic device 100 may include one or more pickups 130 .
  • the sound pickup 130 can collect the sound in the environment where the electronic device 100 is located, and transmit the collected sound to the processor 110 for processing by the processor 110 .
  • the sound pickup 130 may be a microphone.
  • the sound pickup 130 may be built in the electronic device 100, or may be externally placed in the electronic device 100, which is not limited here.
  • the gesture recognition device 140 may be used to detect gestures made by a user.
  • the gesture recognition device 140 may, but not limited to, detect the gestures made by the user by methods such as computer vision-based camera tracking, comparison of transmitted signals and reflected signals, and the like. Exemplarily, taking the comparison method based on transmitted signals and reflected signals as an example, as shown in FIG. 2 , the gesture recognition device 140 may include a signal transmitter 141 , a signal receiver 142 and a signal processing unit 143 .
  • the signal transmitter 141 can send signals (such as signals such as millimeter wave, infrared light, ultrasound, wireless fidelity (Wi-Fi)); then, the signal receiver 142 can receive The signal reflected by the hand 22 of user A; finally, the signal processing unit 143 compares the original signal sent by the signal transmitter 141 with the reflected signal received by the signal receiver 142, and utilizes principles such as the Doppler effect, phase shift, and time difference The movement of the hand 22 of the user A is tracked, and then the gesture made by the user A is determined.
  • signals such as signals such as millimeter wave, infrared light, ultrasound, wireless fidelity (Wi-Fi)
  • the signal receiver 142 can receive The signal reflected by the hand 22 of user A
  • the signal processing unit 143 compares the original signal sent by the signal transmitter 141 with the reflected signal received by the signal receiver 142, and utilizes principles such as the Doppler effect, phase shift, and time difference The movement of the hand 22 of the user A is tracked, and then the gesture made by
  • the gesture recognition device 140 when the gesture recognition device 140 uses a camera tracking method based on computer vision to detect the gestures made by the user, the gesture recognition device 140 may also include a camera, which can collect images of the user's hand and A set image processing algorithm detects gestures made by the user.
  • at least some components in the gesture recognition device 140 can be integrated in the processor 110 .
  • the gesture recognition device 140 may be integrated on the electronic device 100 or arranged separately, or a part may be integrated on the electronic device 100 while another part is arranged separately, which is not limited here.
  • at least part of the functions (such as data processing functions, etc.) of the gesture recognition device 140 may be implemented by the processor 110 .
  • the gesture recognition device 140 may include peripheral components such as an electromyogram (electromyogram, EMG) bracelet or an infrared remote control pen.
  • peripheral components such as an electromyogram (electromyogram, EMG) bracelet or an infrared remote control pen.
  • EMG electromyogram
  • the signal processing unit in the gesture recognition device 140 can obtain the signals emitted by the peripheral components, and detect the user's gestures and gestures according to the signals emitted by the peripheral components. sports.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • Fig. 3 shows a method for controlling an electronic device.
  • the electronic device in FIG. 3 may be the electronic device 100 described above in FIG. 1 .
  • the electronic device control method may include the following steps:
  • the sound collection device matched with the electronic device can continuously or periodically collect sounds in the environment where the electronic device is located, so that the first sound of the first duration can be obtained through the sound collection device.
  • the sound collection device may be the sound pickup 130 described in FIG. 1 .
  • preprocessing may be performed on the first sound to eliminate interference.
  • the first sound can be processed by an echo cancellation (acoustic echo cancellation, AEC) algorithm to eliminate interference, such as excluding the sound played by the cockpit media, especially the human voice interference included, so as to subsequently determine whether the first sound is Whether to include voiceprint.
  • AEC acoustic echo cancellation
  • S302. Determine whether the first voice contains a target voice through voiceprint recognition, and the target voice is a voice from a real person.
  • voiceprint recognition can be used to determine whether the first sound is made by a real person (that is, the target sound).
  • S303 may be executed; otherwise, S307 may be executed.
  • the voice uttered by a real person may be a voice uttered by a real person, rather than a user's voice played by a sound playback device.
  • the first voice can be input into a pre-trained neural network model or other voiceprint models, so as to determine whether the first voice is made by a real person through voiceprint recognition the sound of. Wherein, when included, it indicates that there is a user speaking, and there is a possibility that gesture recognition is required at this time, so S303 can be executed.
  • the preset number may be 1.
  • the second sound can be input into a voiceprint encoder or other voiceprint extraction models to determine the target sound contained in the second sound.
  • the number of voiceprints so as to determine whether the number is greater than the preset number.
  • S306 can be executed.
  • the gesture recognition device matched with the electronic device may be controlled to perform gesture recognition.
  • the gesture recognition device is a camera
  • the camera can be controlled to be turned on.
  • controlling the operation of the gesture recognition device matched with the electronic device may be to control the gesture recognition device to be turned on.
  • controlling the operation of the gesture recognition device matched with the electronic device can be controlled by controlling the detection frequency of the gesture recognition device by Low frequency switches to high frequency.
  • controlling the operation of the gesture recognition device matched with the electronic device can be controlled by switching the gesture recognition device from low-precision detection For high-precision detection.
  • the gesture recognition device supporting the electronic device can be restricted from working, for example, the gesture recognition device is controlled to be in an off state, the detection frequency of the gesture recognition device is controlled to be in a low frequency state, and the gesture recognition device is controlled to be in a low frequency state.
  • the device is in a low-precision detection state, etc., to save power consumption.
  • the gesture recognition device matched with the electronic device after controlling the operation of the gesture recognition device matched with the electronic device, when the sound acquired within a certain period of time does not contain the target sound (that is, the sound made by a real person), or, although the target sound is included, it does not contain When there are multiple voiceprints, it indicates that the scene of multi-person dialogue is not satisfied at this time. At this time, the gesture recognition device matched with the electronic device can be restricted from working to save power consumption.
  • a voice assistant when a user talks to a voice assistant that is matched with an electronic device, although the voice assistant is not a real person, it can communicate with the user, so it can also be counted as a "real person", that is, the semantic
  • the voice of the assistant is regarded as the voice of a real person, and the scene at this time can also be regarded as a "multi-person dialogue scene".
  • gestures are often used instead of language, so conversations with voice assistants can also be considered a high-frequency scene using gestures.
  • the camera can be used to capture and recognize changes in the driver's facial features (movement of the eyes and head, etc.) so as to determine whether the driver is fatigued and whether a corresponding early warning mechanism is required.
  • this judging mechanism based only on changes in facial features has a single dimension, and the whole process of detection and calculation imposes a large power consumption load on the system.
  • the system is an organic and flexible state, through the coordinated operation of multiple sensors, under different conditions, different sensors are triggered to start operating, thereby reducing the load on the entire system.
  • whether the user has not communicated with the voice assistant for a long time and whether the user has not made a certain decibel sound for a long time can be used as a condition for whether fatigue monitoring needs to be turned on. If the user frequently interacts with the voice assistant when starting or driving, he can control various hardware devices on the car through the voice assistant. It means that the user's brain is in a flexible state, and fatigue monitoring can not be turned on at this time. When the ambient sound is too high, or there are many people in the car detected, the corresponding fatigue monitoring can also be disabled first. Because in a multi-person environment, the interaction between people will also effectively reduce fatigue.
  • All of the above-listed methods help the system reasonably allocate the detection hardware’s open state by adding fixed conditions manually set to cooperate with the user’s multi-modal interaction and reduce the power consumption of the system.
  • users can also help the system reduce power consumption by setting conditions independently. For example: before driving, users set exclusive conditions for enabling fatigue monitoring based on their own habits. For example, if there is high-decibel noise in the car for more than half an hour, fatigue monitoring will be turned on again.
  • a sound-receiving device such as the aforementioned sound pickup, etc.
  • the interference sound such as the sound played by the media, especially the human voice interference is eliminated.
  • judge whether it is a real person speaking through voiceprint recognition When it is not a real person speaking, it may not be determined as a multi-person dialogue scene, and at this time, the gesture recognition device may be restricted from working.
  • the gesture recognition device When a real person is speaking, you can monitor for one end of time, record the number of voiceprints that appear, and judge whether there are multiple people speaking based on the number of recorded voiceprints.
  • the gesture recognition device may be restricted from working.
  • multi-modal gesture monitoring can be enabled and the frequency of gesture monitoring can be increased.
  • FIG. 5 is a schematic flow chart of another electronic device control method provided by an embodiment of the present application. It can be understood that the method can be executed by any device, device, platform, or device cluster that has computing and processing capabilities. As shown in Figure 5, the electronic device control method may include:
  • S502. Determine the number of target users in the environment according to the first sound, where the target user is the user who made the sound.
  • the number of target users in the environment may be determined according to the first sound, where the target user is the user who made the sound.
  • the number of target users in the environment is determined according to the first sound, which may be specifically: first determine whether the first sound contains the target sound, and the target sound is a sound made by a real person; when the first sound contains the target
  • the number of target users is determined according to the number of voiceprints included in the first voice. Exemplarily, it may be determined whether the target voice is contained in a manner of voiceprint recognition.
  • the number of target users in the environment is determined according to the first sound. Specifically, it may be: first determine whether the first sound contains the target sound, and the target sound is a sound made by a real person; when the first sound contains When the target sound is detected, the second sound of the second duration can be obtained through the sound collection device; then it is judged whether the target sound is contained in the second sound; when the target sound is contained in the second sound, according to the voiceprint contained in the second sound Quantity, to determine the number of target users. In this way, resource consumption caused by misjudgment caused by directly judging the number of voiceprints included in the voice is reduced. Exemplarily, it may be determined whether the target voice is contained in a manner of voiceprint recognition.
  • the gesture recognition device supporting the electronic device can be controlled to work. For example, control the gesture recognition device to switch from the closed state to the open state; or, control the detection frequency of the gesture recognition device to switch from the first frequency to the second frequency, and the first frequency is less than the second frequency; or, control the detection accuracy of the gesture recognition device Switch from the first precision to the second precision, and the first precision is smaller than the second precision.
  • the operation of the gesture recognition device may be restricted to reduce power consumption of the gesture recognition device. For example, control the gesture recognition device to switch from the on state to the off state, or continue to control the gesture recognition device to be in the off state; or, control the detection frequency of the gesture recognition device to switch from the second frequency to the first frequency, and the first frequency is less than the second frequency. frequency; or, continue to control the detection frequency of the gesture recognition device to be the second frequency; or, control the detection accuracy of the gesture recognition device to switch from the second precision to the first precision, and the first precision is less than the second precision; or, continue to control the gesture recognition The detection accuracy of the device is the second accuracy.
  • the gesture recognition device supporting the electronic equipment is controlled to work, so that it can be used when necessary.
  • Gesture recognition is activated, and gesture recognition can be restricted in unnecessary situations, thereby reducing power consumption of the gesture recognition device.
  • a third sound with a third duration can also be obtained through the sound collection device; and it is judged whether the third sound contains the target sound, or the third sound Whether the number of voiceprints contained in the third voice is greater than the preset number; when the third voice does not contain the target voice, or when the number of voiceprints contained in the third voice is less than or equal to the preset number, limit the gesture recognition device to work ; When the third sound contains the target sound, and the number of voiceprints contained in the third sound is greater than the preset number, continue to control the gesture recognition device to work. Thereby, the continuous operation of the gesture recognition device is avoided.
  • sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any obligation for the implementation process of the embodiment of the present application. limited.
  • the steps in the foregoing embodiments may be selectively executed according to actual conditions, may be partially executed, or may be completely executed, which is not limited here.
  • FIG. 6 is a schematic structural diagram of an electronic equipment control device provided by an embodiment of the present application.
  • an electronic equipment control device 600 includes one or more processors 601 and an interface circuit 602 .
  • the electronic device control apparatus 600 may also include a bus 603 . in:
  • the processor 601 may be an integrated circuit chip and has signal processing capability. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 601 or instructions in the form of software.
  • the above-mentioned processor 601 can be a general-purpose processor, a neural network processor (Neural Network Processing Unit, NPU), a digital communicator (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • NPU neural network processor
  • DSP digital communicator
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the interface circuit 602 can be used for sending or receiving data, instructions or information.
  • the processor 601 can process the data, instructions or other information received by the interface circuit 602 , and can send the processing completion information through the interface circuit 602 .
  • the electronic device control apparatus 600 further includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor.
  • a portion of the memory may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory may be coupled with the processor 601 .
  • the memory stores executable software modules or data structures
  • the processor 601 can execute corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).
  • the interface circuit 602 may be used to output an execution result of the processor 601 .
  • the corresponding functions of the processor 601 and the interface circuit 602 can be realized by hardware design, software design, or a combination of software and hardware, which is not limited here.
  • the electronic device control apparatus 600 may be applied in the electronic device 100 shown in FIG. 2 , but is not limited to.
  • processor in the embodiments of the present application may be a central processing unit (central processing unit, CPU), and may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor can be a microprocessor, or any conventional processor.
  • the method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable rom) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or known in the art any other form of storage medium.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted via a computer-readable storage medium.
  • the computer instructions may be transmitted from one website site, computer, server, or data center to another website site by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) , computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种电子设备控制方法,涉及终端技术领域。该方法包括:通过声音采集装置获取第一时长的第一声音;根据第一声音,确定环境中目标用户的数量,目标用户为发出声音的用户;当目标用户的数量大于预设数量时,控制与电子设备配套的手势识别装置工作。由此,基于环境中发出声音的用户的数量控制手势识别装置是否工作,从而避免了因手势识别装置持续处于工作状态而致使系统功耗增大的情况发生。

Description

一种电子设备控制方法、装置及电子设备
本申请要求于2022年2月9日提交中国国家知识产权局、申请号为202210121313.7、申请名称为“一种电子设备控制方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种电子设备控制方法、装置及电子设备。
背景技术
手势识别在人机交互、游戏娱乐、车载应用等方面有着广泛的应用。手势识别装置以视觉、红外、雷达等方式感知手势环境。一般的,手势识别系统在工作时,整个装置都处于工作状态,致使系统功耗增大,不适合车载设备、移动设备或便携式设备等低功耗设备的使用。同时,也导致多模态交互技术(比如视觉、听觉、触觉等交互技术)难以大规模应用。因此,如何降低手势识别装置的功耗是目前亟需解决的技术问题。
发明内容
本申请提供了一种电子设备的控制方法、装置、电子设备、计算机存储可读存储介质及计算机程序产品,能够实现基于环境中发出声音的用户的数量控制手势识别装置工作,从而避免了因手势识别装置持续处于工作状态而致使系统功耗增大的情况发生。
第一方面,本申请提供一种电子设备控制方法,方法包括:通过声音采集装置获取第一时长的第一声音;判断第一声音中是否包含目标声音,目标声音为真人发出的声音;当第一声音中包含目标声音时,通过声音采集装置获取第二时长的第二声音;判断第二声音中是否包含目标声音,且第二声音中所包含的声纹的数量是否大于预设数量;当第二声音中包含目标声音,且第二声音中所包含的声纹的数量大于预设数量时,控制与电子设备配套的手势识别装置工作。
这样,通过声音采集装置采集到的声音,判断是否为多人对话情形,并在判断为多人对话情形时,控制与电子设备配套的手势识别装置工作,从而可以使得可以在必要的情况下启动手势识别,而在不必要的情况下可以限制手势识别,进而降低了手势识别装置的功耗。
在一种可能的实现方式中,控制与电子设备配套的手势识别装置工作,具体包括:控制手势识别装置由关闭状态切换为开启状态。
在一种可能的实现方式中,控制与电子设备配套的手势识别装置工作,具体包括:控制手势识别装置的检测频率由第一频率切换为第二频率,第一频率小于第二频率。
在一种可能的实现方式中,控制与电子设备配套的手势识别装置工作,具体包括:控制手势识别装置的检测精度由第一精度切换为第二精度,第一精度小于第二精度。
在一种可能的实现方式中,在控制与电子设备配套的手势识别装置工作之后,方法还包括:通过声音采集装置获取第三时长的第三声音;判断第三声音中是否包含目标声音,或者,第三声音中所包含的声纹的数量是否大于预设数量;当第三声音中不包含目标声音,或者,第三声音中所包含的声纹的数量小于或等于预设数量时,限制手势识别装置工作;当第三声 音中包含目标声音,且第三声音中所包含的声纹的数量大于预设数量时,继续控制手势识别装置工作。
在一种可能的实现方式中,方法还包括:当第一声音中未包含目标声音时,限制手势识别装置工作。
在一种可能的实现方式中,方法还包括:当第二声音中未包含目标声音,和/或,第二声音中所包含的声纹的数量小于或等于预设数量时,限制手势识别装置工作。
在一种可能的实现方式中,方法还包括:通过声纹识别判断第一声音或第二声音中是否包含目标声音。
第二方面,本申请提供一种电子设备控制方法,方法包括:通过声音采集装置获取第一时长的第一声音;根据第一声音,确定环境中目标用户的数量,目标用户为发出声音的用户;当目标用户的数量大于预设数量时,控制与电子设备配套的手势识别装置工作。
这样,通过声音采集装置采集到的声音,判断是否为多人对话情形,并在判断为多人对话情形时,控制与电子设备配套的手势识别装置工作,从而可以使得可以在必要的情况下启动手势识别,而在不必要的情况下可以限制手势识别,进而降低了手势识别装置的功耗。
在一种可能的实现方式中,根据第一声音,确定环境中目标用户的数量,具体包括:判断第一声音中是否包含目标声音,目标声音为真人发出的声音;当第一声音中包含目标声音时,根据第一声音中所包含的声纹的数量,确定目标用户的数量。
在一种可能的实现方式中,根据第一声音,确定环境中目标用户的数量,具体包括:判断第一声音中是否包含目标声音,目标声音为真人发出的声音;当第一声音中包含目标声音时,通过声音采集装置获取第二时长的第二声音;判断第二声音中是否包含目标声音;当第二声音中包含目标声音时,根据第二声音中所包含的声纹的数量,确定目标用户的数量。
在一种可能的实现方式中,控制与电子设备配套的手势识别装置工作,具体包括:控制手势识别装置由关闭状态切换为开启状态;或者,控制手势识别装置的检测频率由第一频率切换为第二频率,第一频率小于第二频率;或者,控制手势识别装置的检测精度由第一精度切换为第二精度,第一精度小于第二精度。
在一种可能的实现方式中,在控制与电子设备配套的手势识别装置工作之后,方法还包括:通过声音采集装置获取第三时长的第三声音;判断第三声音中是否包含目标声音,或者,第三声音中所包含的声纹的数量是否大于预设数量;当第三声音中不包含目标声音,或者,第三声音中所包含的声纹的数量小于或等于预设数量时,限制手势识别装置工作;当第三声音中包含目标声音,且第三声音中所包含的声纹的数量大于预设数量时,继续控制手势识别装置工作。
在一种可能的实现方式中,方法还包括:当目标用户的数量小于或等于预设数量时,限制手势识别装置工作。
第三方面,本申请提供一种电子设备控制装置,包括:至少一个处理器和接口;至少一个处理器通过接口获取程序指令或者数据;至少一个处理器用于执行程序行指令,以实现如第一方面或第二方面中所提供的方法。
第四方面,本申请提供一种电子设备,该电子设备包括至少一个用于存储程序的存储器和至少一个用于执行存储器存储的程序的处理器。其中,当存储器存储的程序被执行时,处理器用于执行如第一方面或第二方面中所提供的方法。
第五方面,本申请提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程 序,当计算机程序在电子设备上运行时,使得电子设备执行如第一方面或第二方面中所提供的方法。
第六方面,本申请提供一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行如第一方面或第二方面中所提供的方法。
可以理解的是,上述第二方面至第六方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。
附图说明
下面对实施例或现有技术描述中所需使用的附图作简单地介绍。
图1是本申请实施例提供的一种电子设备的硬件结构示意图;
图2是本申请实施例提供的一种手势识别装置识别手势的示意图;
图3是本申请实施例提供的一种电子设备控制方法的流程示意图;
图4是本申请实施例提供的一种电子设备控制方法的过程示意图;
图5是本申请实施例提供的一种电子设备控制方法的流程示意图;
图6是本申请实施例提供的一种电子设备控制装置的结构示意图。
具体实施方式
本文中术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。本文中符号“/”表示关联对象是或者的关系,例如A/B表示A或者B。
本文中的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一响应消息和第二响应消息等是用于区别不同的响应消息,而不是用于描述响应消息的特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或者两个以上,例如,多个处理单元是指两个或者两个以上的处理单元等;多个元件是指两个或者两个以上的元件等。
示例性的,当多个用户在一些场景(比如座舱和其他较密闭空间等场景)下进行多人对话时,用户较倾向于使用手势进行辅助操作,以免打断对话。因此,本申请实施例中,在检测到当前用户的对话场景是多人对话时,可以控制与电子设备配套的手势识别装置工作,以进行获取用户的手势,并进行手势识别。这样使得手势识别装置在未满足多人对话的场景下可以处于低功耗状态,比如处于停止工作的状态,或者以较低频率监测用户的手势的状态等,从而降低手势识别装置的功耗。
可以理解的是,本申请实施例中,电子设备可以为手机,平板电脑,可穿戴设备,智能电视,华为智慧屏,智能音箱,车机等。电子设备的示例性实施例包括但不限于搭载iOS、android、Windows、鸿蒙系统(Harmony OS)或者其他操作系统的电子设备。本申请实施例对电子设备的类型不做具体限定。
示例性的,图1示出了一种电子设备的硬件结构示意图。如图1所示,该电子设备100可以包括:该电子设备100可以包括:处理器110、存储器120、拾音器130和手势识别装置140。
其中,处理器110可以是通用处理器或者专用处理器。例如,处理器110可以包括中央处理器(central processing unit,CPU)和/或基带处理器。其中,基带处理器可以用于处理通信数据,CPU可以用于实现相应的控制和处理功能,执行软件程序,处理软件程序的数据。示例性地,处理器110可以基于拾音器130采集到的音频信号进行回声消除处理,提取音频信号中的声纹特征,判断该音频信号是不是真人发出的声音信号,判断该音频信号中所包含的声纹的数量等等。示例性地,处理器110还可以对拾音器130采集到的音频信号进行语音识别(automatic speech recognition,ASR),自然语音理解(natural language understanding,NLU),对话管理(dialogue management,DM),自然语言生成(natural language generation,NLG),语音合成(text to speech,TTS)等,以识别出目标用户发出的语音。
在一个例子中,处理器110可以包括一个或多个处理单元。例如,处理器110可以包括应用处理器(application processor,AP)、调制解调器(modem)、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、和/或神经网络处理器(neural-network processing unit,NPU)等中的一项或多项。在一些实施例中,电子设备100可以包括一个或多个处理器110。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
存储器120可以存储有程序,程序可被处理器110运行,使得处理器110执行本申请提供的方法。存储器120还可以存储有数据。处理器110可以读取存储器120中存储的数据(例如,目标用户的声纹特征数据等)。存储器120和处理器110可以单独设置。可选地,存储器120也可以集成在处理器110中。
拾音器130也可以称为“话筒”,“传声器”,用于将声音信号转换为电信号。电子设备100可以包括一个或多个拾音器130。拾音器130可以采集电子设备100所处环境中的声音,并将采集到的声音传输至处理器110,以供处理器110进行处理。示例性的,拾音器130可以为麦克风。示例性的,拾音器130可以内置于电子设备100中,也可以外置于电子设备100中,此处不做限定。
手势识别装置140可以用于检测用户做出的手势。手势识别装置140可以但不限于基于计算机视觉的摄像头追踪、基于发射信号和反射信号的对比等方法对用户做出的手势进行检测。示例性的,以基于发射信号和反射信号的对比方法为例,如图2所示,手势识别装置140可以包括信号发射器141、信号接收器142和信号处理单元143。手势识别装置140在工作过程中,信号发射器141可以发送信号(如:毫米波、红外光、超声、无线保真(wireless fidelity,Wi-Fi)等信号);然后,信号接收器142可以接收用户A的手部22反射的信号;最后,信号处理单元143通过对比信号发射器141发出的原始信号和信号接收器142接收到的反射信号,利用多普勒效应、相位偏移、时间差等原理追踪用户A的手部22的动作,进而确定出用户A做出的手势。在一个例子中,当手势识别装置140采用基于计算机视觉的摄像头追踪方法对用户做出的手势进行检测时,手势识别装置140还可以包括摄像头,该摄像头可以采集用户手部的图像,并基于预先设定的图像处理算法检测出用户做出的手势。在一个例子中,手势识别装置140中的至少部分组件(比如信号处理单元143)可以集成在处理器110中。在一 个例子中,手势识别装置140可以集成在电子设备100上,也可以单独布置,亦可以一部分集成在电子设备100上,而另一部分单独布置,此处不做限定。在一个例子中,手势识别装置140中的至少部分功能(比如数据处理功能等)可以由处理器110实现。
在一个实施例中,手势识别装置140可以包括肌电图(electromyogram,EMG)手环或红外遥控笔等外设组件。此外,用户可以使用手势识别装置140中的外设组件过程中,手势识别装置140中的信号处理单元可以获取外设组件发射的信号,并根据外设组件发射的信号,来检测用户的手势和运动。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
示例性的,图3示出了一种电子设备控制方法。图3中的电子设备可以为前述图1中所描述的电子设备100。如图3所示,该电子设备控制方法可以包括以下步骤:
S301、通过声音采集装置获取第一时长的第一声音。
具体地,与电子设备配套的声音采集装置可以持续或周期性采集电子设备所处环境中的声音,从而可以通过该声音采集装置获取到第一时长的第一声音。示例性的,声音采集装置可以为前述图1中所描述的拾音器130。
在一些实施例中,获取到第一声音后,可以对第一声音进行预处理,以消除干扰。示例性的,可以通过回声消除(acoustic echo cancellation,AEC)算法对第一声音进行处理,以消除干扰,比如排除座舱媒体播放的声音,特别是包含的人声干扰,以便后续判断第一声音中是否包含声纹。
S302、通过声纹识别判断第一声音中是否包含目标声音,目标声音为真人发出的声音。
具体地,获取到第一声音后,可以通过声纹识别判断第一声音中是否真人发出的声音(即目标声音)。当包含时,可以执行S303;否则,执行S307。示例性的,真人发出的声音可以为真实的人发出的声音,而不是通过声音播放设备播放的用户的声音。
作为一种可能的实现方式,在获取到第一声音后,可以将第一声音输入至预先训练的神经网络模型或者其他的声纹模型中,以通过声纹识别确定第一声音中是否真人发出的声音。其中,当包含时,表明有用户在说话,此时存在需要手势识别的可能性,因此可以执行S303。
S303、通过声音采集装置获取第二时长的第二声音。
S304、通过声纹识别判断第二声音中是否包含目标声音。当包含时,执行S305,否则,执行S307。
S305、判断目标声音中声纹的数量是否大于预设数量。
具体地,在判断出第二声音中包含目标声音后,可以判断目标声音中声纹的数量是否大于预设数量。当大于预设数量时,可以执行S306;否则,执行S307。示例性的,预设数量可以为1。
作为一种可能的实现方式,在判断出第二声音中包含目标声音后,可以将第二声音输入至声纹编码器,或者其他的声纹提取模型中,以确定第二声音中所包含的声纹的数量,从而可以确定出该数量是否大于预设数量。其中,当第二声音中所包含的声纹的数量大于预设数量时,表明当前的场景为多人对话场景,此时需要手势识别,因此可以执行S306。
S306、控制与电子设备配套的手势识别装置工作。
具体地,当判断出需要进行手势识别时,可以控制与电子设备配套的手势识别装置工作,进而进行手势识别。例如,当手势识别装置为摄像头时,可以控制开启摄像头。
作为一种可能的实现方式,若手势识别装置当前处于关闭状态,控制与电子设备配套的手势识别装置工作可以为控制手势识别装置开启。
作为一种可能的实现方式,若手势识别装置当前处于开启状态,且手势识别装置当前的检测频率处于低频率状态,控制与电子设备配套的手势识别装置工作可以为控制手势识别装置的检测频率由低频率切换为高频率。
作为另一种可能的实现方式,若手势识别装置当前处于开启状态,且手势识别装置当前处于低精度检测状态,控制与电子设备配套的手势识别装置工作可以为控制手势识别装置由低精度检测切换为高精度检测。
S307、限制与电子设备配套的手势识别装置工作。
具体地,当判断出不需要进行手势识别时,可以限制与电子设备配套的手势识别装置工作,例如,控制手势识别装置处于关闭状态,控制手势识别装置的检测频率处于低频率状态,控制手势识别装置处于低精度检测状态等,以节省功耗。
由此,通过声音采集装置采集到的声音,以及声纹识别判断是否为多人对话情形,并在判断为多人对话情形时,控制与电子设备配套的手势识别装置工作,从而可以使得可以在必要的情况下启动手势识别,而在不必要的情况下可以限制手势识别,进而降低了手势识别装置的功耗。
在一些实施例中,在控制与电子设备配套的手势识别装置工作后,当一定时间内获取到的声音中未包含目标声音(即真人发出的声音),或者,虽然包括目标声音,但未包含多个声纹时,表明此时不满足多人对话场景,此时可以限制与电子设备配套的手势识别装置工作,以节省功耗。
在一些实施例中,当用户和与电子设备配套的语音助手对话时,虽然语音助手不属于真人,但是它可以和用户进行对话交流,因此也可以将其算作“真人”,即可以将语义助手发出的声音看作是真人发出的声音,此时的场景也可以看作是“多人对话场景”。而在人与人对话中,经常会用手势来代替语言,因此和语音助手对话过程中也可以认为是一个使用手势的高频场景。
在一些实施例中,目前的智能驾驶舱里有多样的传感器,比如图像摄像头,红外传感器等。示例性的,可以通过摄像头来捕捉和识别驾驶员脸部特征的变化(眼睛和头部的运动等)从而可以判断驾驶员是否疲劳驾驶,是否需要进行对应的预警机制。但这种仅通过脸部特征的变化的判断机制,维度是单一的,并且,全程的检测及运算,对系统的功耗负载较大。
而在全程的驾驶过程中,其实可以通过多维度的判断去决定什么时候开始进行对应的疲劳监测,从而降低整个系统的功耗。理想情况下,系统是一个有机,灵活的状态,通过多种传感器的协调运作,在不同的条件下,触发不同的传感器开始运作,从而降低整个系统的负载。
作为一种可能的实现方式,可以将用户是否长时间没有与语音助手交流,用户是否长时间没有发出一定分贝的声音作为是否需要开启疲劳监测的一个条件。如果用户在开始驾驶或者驾驶途中,频繁的与语音助手进行互动,通过语音助手对车上的各种硬件设备进行控制。则代表,用户的大脑处于灵活的状态,此时可以暂不开启疲劳监测。当环境声音过高,或者 监测到车上有多人存在的场景下,对应的疲劳监测也可以先不开启。因为多人环境下,人与人之间的互动,也会有效减低疲劳感。
以上列举的都属于通过增加人工设定的固定条件,来帮助系统合理分配检测型硬件的开启状态,来配合用户的多模态交互,降低系统的功耗。当然,用户也可以通过自主设定条件,来帮助系统降低功耗。例如:用户在开车前,就基于自己的习惯设定了专属的开启疲劳监测的条件,例如,车内有高分贝噪音超过半小时,再开启疲劳监测等。
在一些实施例中,通过上述实施例中的方法,可以为支持多模态的设备,提供触发多模态交互的条件设定,以及基于多种判断条件,去决定是否开始进行多模态交互的指令,从而平衡整个系统的功耗。
为便于理解,下面举例进行说明。
如图4所示,先利用收音装置(比如前述的拾音器等)收音。接着,通过回声消除算法,排除干扰音,比如媒体播放的声音,特别是包含的人声干扰。然后,在通过声纹识别判断是否是真人说话。当不是真人说话的时候,可以不判定为多人对话场景,此时可以限制手势识别装置工作。当是真人说话的时候,可以进行一端时间的监听,并记录出现的声纹的数量,以及根据记录的声纹数量判断是不是多人说话。当不是多人说话的时候,可以不判定为多人对话场景,此时可以限制手势识别装置工作。当是多人说话的时候,可以判定为是多人对话场景,此时可以开启多模态手势监测,并增大手势监测的频率。
接下来,基于上文所描述的电子设备控制方法,对本申请实施例提供的另一种电子设备控制方法进行介绍。可以理解的是,该方法是基于上文所描述的电子设备控制方法提出,该方法中的部分或全部内容可以参见上文中电子设备控制方法的描述。
请参阅图5,图5是本申请实施例提供的另一种电子设备控制方法的流程示意图。可以理解,该方法可以通过任何具有计算、处理能力的装置、设备、平台、设备集群来执行。如图5所示,该电子设备控制方法可以包括:
S501、通过声音采集装置获取第一时长的第一声音。
S502、根据第一声音,确定环境中目标用户的数量,目标用户为发出声音的用户。
具体地,获取到第一声音后,可以根据第一声音确定环境中目标用户的数量,目标用户为发出声音的用户。
作为一种可能的实现方式,根据第一声音确定环境中目标用户的数量,具体可以为:先判断第一声音中是否包含目标声音,目标声音为真人发出的声音;当第一声音中包含目标声音时,根据第一声音中所包含的声纹的数量,确定目标用户的数量。示例性的,可以通过声纹识别的方式判断是否包含目标声音。
作为另一种可能的实现方式,根据第一声音确定环境中目标用户的数量,具体可以为:先判断第一声音中是否包含目标声音,目标声音为真人发出的声音;当第一声音中包含目标声音时,可以通过声音采集装置获取第二时长的第二声音;然后判断第二声音中是否包含目标声音;当第二声音中包含目标声音时,根据第二声音中所包含的声纹的数量,确定目标用户的数量。由此,以降低直接判断声音中所包含的声纹的数量而导致误判时所带来的资源消耗。示例性的,可以通过声纹识别的方式判断是否包含目标声音。
S503、当目标用户的数量大于预设数量时,控制与电子设备配套的手势识别装置工作。
具体地,当目标用户的数量大于预设数量时,表明是多人对话场景,此时可以控制与电 子设备配套的手势识别装置工作。例如,控制手势识别装置由关闭状态切换为开启状态;或者,控制手势识别装置的检测频率由第一频率切换为第二频率,第一频率小于第二频率;或者,控制手势识别装置的检测精度由第一精度切换为第二精度,第一精度小于第二精度。
在一些实施例中,当目标用户的数量小于或等于预设数量时,可以限制手势识别装置工作,以降低手势识别装置的功耗。例如,控制手势识别装置由开启状态切换为关闭状态,或者,继续控制手势识别装置处于关闭状态;或者,控制手势识别装置的检测频率由第二频率切换为第一频率,第一频率小于第二频率;或者,继续控制手势识别装置的检测频率为第二频率;或者,控制手势识别装置的检测精度由第二精度切换为第一精度,第一精度小于第二精度;或者,继续控制手势识别装置的检测精度为第二精度。
由此,通过声音采集装置采集到的声音,判断是否为多人对话情形,并在判断为多人对话情形时,控制与电子设备配套的手势识别装置工作,从而可以使得可以在必要的情况下启动手势识别,而在不必要的情况下可以限制手势识别,进而降低了手势识别装置的功耗。
在一些实施例中,在控制与电子设备配套的手势识别装置工作之后,还可以通过声音采集装置获取第三时长的第三声音;以及判断第三声音中是否包含目标声音,或者,第三声音中所包含的声纹的数量是否大于预设数量;当第三声音中不包含目标声音,或者,第三声音中所包含的声纹的数量小于或等于预设数量时,限制手势识别装置工作;当第三声音中包含目标声音,且第三声音中所包含的声纹的数量大于预设数量时,继续控制手势识别装置工作。由此,以避免手势识别装置持续工作。
可以理解的是,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。此外,在一些可能的实现方式中,上述实施例中的各步骤可以根据实际情况选择性执行,可以部分执行,也可以全部执行,此处不做限定。
基于上述实施例中的描述的方法,本申请实施例还提供了一种电子设备控制装置。请参阅图6,图6为本申请实施例提供的一种电子设备控制装置的结构示意图。如图6所示,电子设备控制装置600包括一个或多个处理器601以及接口电路602。可选的,电子设备控制装置600还可以包含总线603。其中:
处理器601可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器601中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器601可以是通用处理器、神经网络处理器(Neural Network Processing Unit,NPU)、数字通信器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
接口电路602可以用于数据、指令或者信息的发送或者接收,处理器601可以利用接口电路602接收的数据、指令或者其它信息,进行加工,可以将加工完成信息通过接口电路602发送出去。
可选的,电子设备控制装置600还包括存储器,存储器可以包括只读存储器和随机存取存储器,并向处理器提供操作指令和数据。存储器的一部分还可以包括非易失性随机存取存储器(NVRAM)。其中,该存储器可以与处理器601耦合。
可选的,存储器存储了可执行软件模块或者数据结构,处理器601可以通过调用存储器 存储的操作指令(该操作指令可存储在操作系统中),执行相应的操作。
可选的,接口电路602可用于输出处理器601的执行结果。
需要说明的,处理器601、接口电路602各自对应的功能既可以通过硬件设计实现,也可以通过软件设计来实现,还可以通过软硬件结合的方式来实现,这里不作限制。示例性的,电子设备控制装置600可以但不限于应用在图2中所示的电子设备100中。
应理解,上述方法实施例的各步骤可以通过处理器中的硬件形式的逻辑电路或者软件形式的指令完成。
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable rom,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。

Claims (10)

  1. 一种电子设备控制方法,其特征在于,所述方法包括:
    通过声音采集装置获取第一时长的第一声音;
    根据所述第一声音,确定环境中目标用户的数量,所述目标用户为发出声音的用户;
    当所述目标用户的数量大于预设数量时,控制与电子设备配套的手势识别装置工作。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一声音,确定环境中目标用户的数量,具体包括:
    判断所述第一声音中是否包含目标声音,所述目标声音为真人发出的声音;
    当所述第一声音中包含所述目标声音时,根据所述第一声音中所包含的声纹的数量,确定所述目标用户的数量。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述第一声音,确定环境中目标用户的数量,具体包括:
    判断所述第一声音中是否包含目标声音,所述目标声音为真人发出的声音;
    当所述第一声音中包含目标声音时,通过所述声音采集装置获取第二时长的第二声音;
    判断所述第二声音中是否包含所述目标声音;
    当所述第二声音中包含所述目标声音时,根据所述第二声音中所包含的声纹的数量,确定所述目标用户的数量。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述控制与电子设备配套的手势识别装置工作,具体包括:
    控制所述手势识别装置由关闭状态切换为开启状态;
    或者,控制所述手势识别装置的检测频率由第一频率切换为第二频率,所述第一频率小于所述第二频率;
    或者,控制所述手势识别装置的检测精度由第一精度切换为第二精度,所述第一精度小于所述第二精度。
  5. 根据权利要求1-4任一所述的方法,其特征在于,在控制与电子设备配套的手势识别装置工作之后,所述方法还包括:
    通过所述声音采集装置获取第三时长的第三声音;
    判断所述第三声音中是否包含目标声音,所述目标声音为真人发出的声音,或者,所述第三声音中所包含的声纹的数量是否大于预设数量;
    当所述第三声音中不包含所述目标声音,或者,所述第三声音中所包含的声纹的数量小于或等于所述预设数量时,限制所述手势识别装置工作;
    当所述第三声音中包含所述目标声音,且所述第三声音中所包含的声纹的数量大于所述预设数量时,继续控制所述手势识别装置工作。
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述方法还包括:
    当所述目标用户的数量小于或等于所述预设数量时,限制所述手势识别装置工作。
  7. 一种电子设备控制装置,其特征在于,包括:
    至少一个处理器和接口;
    所述至少一个处理器通过所述接口获取程序指令或者数据;
    所述至少一个处理器用于执行所述程序行指令,以实现如权利要求1-6任一所述的方法。
  8. 一种电子设备,其特征在于,包括:
    至少一个存储器,用于存储程序;
    至少一个处理器,用于执行存储器存储的程序,当存储器存储的程序被执行时,处理器用于执行如权利要求1-6中任一所述的方法。
  9. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1-8任一所述的方法。
  10. 一种计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-6任一所述的方法。
PCT/CN2022/136694 2022-02-09 2022-12-05 一种电子设备控制方法、装置及电子设备 WO2023151360A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210121313.7 2022-02-09
CN202210121313.7A CN116610206A (zh) 2022-02-09 2022-02-09 一种电子设备控制方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2023151360A1 true WO2023151360A1 (zh) 2023-08-17

Family

ID=87563601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/136694 WO2023151360A1 (zh) 2022-02-09 2022-12-05 一种电子设备控制方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN116610206A (zh)
WO (1) WO2023151360A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150346828A1 (en) * 2014-05-28 2015-12-03 Pegatron Corporation Gesture control method, gesture control module, and wearable device having the same
CN106960161A (zh) * 2017-03-23 2017-07-18 全椒县志宏机电设备设计有限公司 一种应用加密的方法及移动终端
CN109804429A (zh) * 2016-10-13 2019-05-24 宝马股份公司 机动车中的多模式对话

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150346828A1 (en) * 2014-05-28 2015-12-03 Pegatron Corporation Gesture control method, gesture control module, and wearable device having the same
CN109804429A (zh) * 2016-10-13 2019-05-24 宝马股份公司 机动车中的多模式对话
CN106960161A (zh) * 2017-03-23 2017-07-18 全椒县志宏机电设备设计有限公司 一种应用加密的方法及移动终端

Also Published As

Publication number Publication date
CN116610206A (zh) 2023-08-18

Similar Documents

Publication Publication Date Title
US20210327436A1 (en) Voice Interaction Method, Device, and System
US11502859B2 (en) Method and apparatus for waking up via speech
WO2020024885A1 (zh) 一种语音识别的方法、语音断句的方法及装置
US20170243585A1 (en) System and method of analyzing audio data samples associated with speech recognition
US11152001B2 (en) Vision-based presence-aware voice-enabled device
DE102016122719A1 (de) Nutzerfokus aktivierte Spracherkennung
US20160019886A1 (en) Method and apparatus for recognizing whisper
CN109032345B (zh) 设备控制方法、装置、设备、服务端和存储介质
US11430447B2 (en) Voice activation based on user recognition
CN109032554B (zh) 一种音频处理方法和电子设备
CN112634895A (zh) 语音交互免唤醒方法和装置
US11437031B2 (en) Activating speech recognition based on hand patterns detected using plurality of filters
WO2021212388A1 (zh) 一种交互沟通实现方法、设备和存储介质
CN115831155A (zh) 音频信号的处理方法、装置、电子设备及存储介质
CN114360527A (zh) 车载语音交互方法、装置、设备及存储介质
CN114333774B (zh) 语音识别方法、装置、计算机设备及存储介质
US10923123B2 (en) Two-person automatic speech recognition training to interpret unknown voice inputs
WO2023151360A1 (zh) 一种电子设备控制方法、装置及电子设备
WO2023006033A1 (zh) 语音交互方法、电子设备及介质
US20200090663A1 (en) Information processing apparatus and electronic device
US11929081B2 (en) Electronic apparatus and controlling method thereof
CN115083396A (zh) 音频尾端检测的语音处理方法、装置、电子设备及介质
CN114220420A (zh) 多模态语音唤醒方法、装置及计算机可读存储介质
WO2020102943A1 (zh) 手势识别模型的生成方法、装置、存储介质及电子设备
CN115331672B (zh) 设备控制方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22925711

Country of ref document: EP

Kind code of ref document: A1