CN116610206A - Electronic equipment control method and device and electronic equipment - Google Patents

Electronic equipment control method and device and electronic equipment Download PDF

Info

Publication number
CN116610206A
CN116610206A CN202210121313.7A CN202210121313A CN116610206A CN 116610206 A CN116610206 A CN 116610206A CN 202210121313 A CN202210121313 A CN 202210121313A CN 116610206 A CN116610206 A CN 116610206A
Authority
CN
China
Prior art keywords
sound
gesture recognition
target
electronic device
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210121313.7A
Other languages
Chinese (zh)
Inventor
李凌飞
唐晨
邰彦坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210121313.7A priority Critical patent/CN116610206A/en
Priority to PCT/CN2022/136694 priority patent/WO2023151360A1/en
Publication of CN116610206A publication Critical patent/CN116610206A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides an electronic equipment control method and device and electronic equipment, and relates to the technical field of terminals. The method comprises the following steps: acquiring a first sound of a first duration through a sound acquisition device; determining the number of target users in the environment according to the first sound, wherein the target users are users making sound; and when the number of the target users is greater than the preset number, controlling a gesture recognition device matched with the electronic equipment to work. Therefore, whether the gesture recognition device works or not is controlled based on the number of users making sounds in the environment, and the situation that the power consumption of the system is increased due to the fact that the gesture recognition device is continuously in a working state is avoided.

Description

Electronic equipment control method and device and electronic equipment
Technical Field
The present application relates to the field of terminal technologies, and in particular, to a method and an apparatus for controlling an electronic device, and an electronic device.
Background
Gesture recognition has wide application in man-machine interaction, game entertainment, vehicle applications and the like. The gesture recognition device senses gesture environments in a visual, infrared, radar and other modes. Generally, when the gesture recognition system works, the whole device is in a working state, so that the power consumption of the system is increased, and the gesture recognition system is not suitable for low-power consumption equipment such as vehicle-mounted equipment, mobile equipment or portable equipment. At the same time, multi-modal interaction techniques (such as visual, auditory, tactile, etc. interaction techniques) are also rendered difficult to apply on a large scale. Therefore, how to reduce the power consumption of the gesture recognition apparatus is a technical problem that needs to be solved.
Disclosure of Invention
The application provides a control method and device of electronic equipment, the electronic equipment, a computer storage readable storage medium and a computer program product, which can realize the control of the operation of a gesture recognition device based on the number of users making sound in the environment, thereby avoiding the situation that the power consumption of a system is increased because the gesture recognition device is continuously in an operation state.
In a first aspect, the present application provides a method for controlling an electronic device, including: acquiring a first sound of a first duration through a sound acquisition device; judging whether the first sound contains a target sound or not, wherein the target sound is a sound emitted by a real person; when the first sound contains the target sound, acquiring a second sound with a second duration through the sound acquisition device; judging whether the second sound contains target sound or not, and judging whether the number of voiceprints contained in the second sound is larger than a preset number or not; when the second sound contains target sound and the number of voiceprints contained in the second sound is larger than the preset number, controlling the gesture recognition device matched with the electronic equipment to work.
In this way, whether the voice is a multi-person conversation situation is judged through the voice collected by the voice collecting device, and when the voice is judged to be the multi-person conversation situation, the gesture identifying device matched with the electronic equipment is controlled to work, so that gesture identification can be started under the necessary condition, and the gesture identification can be limited under the unnecessary condition, and further the power consumption of the gesture identifying device is reduced.
In one possible implementation manner, the gesture recognition apparatus matched with the electronic device is controlled to work, and specifically includes: the control gesture recognition device is switched from an off state to an on state.
In one possible implementation manner, the gesture recognition apparatus matched with the electronic device is controlled to work, and specifically includes: the detection frequency of the gesture recognition device is controlled to be switched from a first frequency to a second frequency, and the first frequency is smaller than the second frequency.
In one possible implementation manner, the gesture recognition apparatus matched with the electronic device is controlled to work, and specifically includes: the detection precision of the gesture recognition device is switched from the first precision to the second precision, and the first precision is smaller than the second precision.
In one possible implementation, after controlling the operation of the gesture recognition apparatus associated with the electronic device, the method further includes: acquiring a third sound of a third duration through a sound acquisition device; judging whether the third sound contains target sound or not, or whether the number of voiceprints contained in the third sound is larger than a preset number; when the third sound does not contain the target sound or the number of voiceprints contained in the third sound is smaller than or equal to the preset number, the gesture recognition device is limited to work; and when the third sound contains the target sound and the number of the voiceprints contained in the third sound is larger than the preset number, continuing to control the gesture recognition device to work.
In one possible implementation, the method further includes: when the first sound does not contain the target sound, the gesture recognition device is limited to work.
In one possible implementation, the method further includes: and when the second sound does not contain the target sound and/or the number of voiceprints contained in the second sound is smaller than or equal to the preset number, limiting the gesture recognition device to work.
In one possible implementation, the method further includes: and judging whether the first sound or the second sound contains the target sound or not through voiceprint recognition.
In a second aspect, the present application provides a method for controlling an electronic device, including: acquiring a first sound of a first duration through a sound acquisition device; determining the number of target users in the environment according to the first sound, wherein the target users are users making sound; and when the number of the target users is greater than the preset number, controlling a gesture recognition device matched with the electronic equipment to work.
In this way, whether the voice is a multi-person conversation situation is judged through the voice collected by the voice collecting device, and when the voice is judged to be the multi-person conversation situation, the gesture identifying device matched with the electronic equipment is controlled to work, so that gesture identification can be started under the necessary condition, and the gesture identification can be limited under the unnecessary condition, and further the power consumption of the gesture identifying device is reduced.
In one possible implementation manner, determining the number of target users in the environment according to the first sound specifically includes: judging whether the first sound contains a target sound or not, wherein the target sound is a sound emitted by a real person; when the first sound contains the target sound, the number of target users is determined according to the number of voiceprints contained in the first sound.
In one possible implementation manner, determining the number of target users in the environment according to the first sound specifically includes: judging whether the first sound contains a target sound or not, wherein the target sound is a sound emitted by a real person; when the first sound contains the target sound, acquiring a second sound with a second duration through the sound acquisition device; judging whether the second sound contains the target sound or not; when the target sound is contained in the second sound, the number of target users is determined according to the number of voiceprints contained in the second sound.
In one possible implementation manner, the gesture recognition apparatus matched with the electronic device is controlled to work, and specifically includes: the gesture recognition device is controlled to be switched from the off state to the on state; or, controlling the detection frequency of the gesture recognition device to be switched from a first frequency to a second frequency, wherein the first frequency is smaller than the second frequency; alternatively, the detection accuracy of the control gesture recognition apparatus is switched from the first accuracy to the second accuracy, and the first accuracy is smaller than the second accuracy.
In one possible implementation, after controlling the operation of the gesture recognition apparatus associated with the electronic device, the method further includes: acquiring a third sound of a third duration through a sound acquisition device; judging whether the third sound contains target sound or not, or whether the number of voiceprints contained in the third sound is larger than a preset number; when the third sound does not contain the target sound or the number of voiceprints contained in the third sound is smaller than or equal to the preset number, the gesture recognition device is limited to work; and when the third sound contains the target sound and the number of the voiceprints contained in the third sound is larger than the preset number, continuing to control the gesture recognition device to work.
In one possible implementation, the method further includes: and when the number of the target users is smaller than or equal to the preset number, limiting the gesture recognition device to work.
In a third aspect, the present application provides an electronic device control apparatus, including: at least one processor and an interface; at least one processor obtains program instructions or data through an interface; at least one processor is configured to execute program line instructions to implement the method as provided in the first or second aspect.
In a fourth aspect, the present application provides an electronic device comprising at least one memory for storing a program and at least one processor for executing the program stored in the memory. Wherein the processor is adapted to perform the method as provided in the first or second aspect, when the program stored in the memory is executed.
In a fifth aspect, the present application provides a computer readable storage medium storing a computer program which, when run on an electronic device, causes the electronic device to perform the method as provided in the first or second aspect.
In a sixth aspect, the application provides a computer program product for, when run on an electronic device, causing the electronic device to perform the method as provided in the first or second aspect.
It will be appreciated that the advantages of the second to sixth aspects may be found in the relevant description of the first aspect, and are not described here again.
Drawings
Fig. 1 is a schematic hardware structure of an electronic device according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a gesture recognition apparatus for recognizing a gesture according to an embodiment of the present application;
fig. 3 is a schematic flow chart of a control method of an electronic device according to an embodiment of the present application;
fig. 4 is a schematic process diagram of a control method of an electronic device according to an embodiment of the present application;
fig. 5 is a schematic flow chart of a control method of an electronic device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device control apparatus according to an embodiment of the present application.
Detailed Description
The term "and/or" herein is an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. The symbol "/" herein indicates that the associated object is or is a relationship, e.g., A/B indicates A or B.
The terms "first" and "second" and the like in the description and in the claims are used for distinguishing between different objects and not for describing a particular sequential order of objects. For example, the first response message and the second response message, etc. are used to distinguish between different response messages, and are not used to describe a particular order of response messages.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present application, unless otherwise specified, the meaning of "plurality" means two or more, for example, the meaning of a plurality of processing units means two or more, or the like; the plurality of elements means two or more elements and the like.
For example, when multiple users conduct a multi-person conversation in some scenes (such as a cabin and other more confined space, etc.), the users prefer to use gestures to assist in the conversation so as not to interrupt the conversation. Therefore, in the embodiment of the application, when the conversation scene of the current user is detected to be a multi-person conversation, the gesture recognition device matched with the electronic equipment can be controlled to work so as to acquire the gesture of the user and perform gesture recognition. In this way, the gesture recognition device can be in a low power consumption state, such as a state of stopping working or a state of monitoring gestures of a user at a lower frequency, etc. under the scene of not satisfying the multi-person conversation, so that the power consumption of the gesture recognition device is reduced.
It can be appreciated that in the embodiment of the application, the electronic device may be a mobile phone, a tablet computer, a wearable device, an intelligent television, a smart screen, an intelligent sound box, a car machine and the like. Exemplary embodiments of the electronic device include, but are not limited to, electronic devices that carry iOS, android, windows, hong System (Harmony OS), or other operating systems. The embodiment of the application does not limit the type of the electronic equipment in detail.
By way of example, fig. 1 shows a schematic diagram of a hardware architecture of an electronic device. As shown in fig. 1, the electronic device 100 may include: the electronic device 100 may include: processor 110, memory 120, microphone 130, and gesture recognition device 140.
Wherein the processor 110 may be a general purpose processor or a special purpose processor. For example, the processor 110 may include a central processing unit (central processing unit, CPU) and/or a baseband processor. The baseband processor may be used to process communication data, and the CPU may be used to implement corresponding control and processing functions, execute software programs, and process data of the software programs. Illustratively, the processor 110 may perform echo cancellation processing based on the audio signal collected by the sound pickup 130, extract voiceprint features in the audio signal, determine whether the audio signal is a sound signal emitted by a person, determine the number of voiceprints included in the audio signal, and the like. Illustratively, the processor 110 may also perform speech recognition (automatic speech recognition, ASR), natural speech understanding (natural language understanding, NLU), dialogue management (dialogue management, DM), natural language generation (natural language generation, NLG), speech synthesis (TTS), etc. on the audio signal collected by the microphone 130 to recognize the speech uttered by the target user.
In one example, the processor 110 may include one or more processing units. For example, the processor 110 may include one or more of an application processor (application processor, AP), a modem (modem), a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural-Network Processor (NPU), etc. In some embodiments, the electronic device 100 may include one or more processors 110. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The memory 120 may store a program that is executable by the processor 110 to cause the processor 110 to perform the method provided by the present application. Memory 120 may also store data. The processor 110 may read data stored in the memory 120 (e.g., voiceprint feature data of a target user, etc.). The memory 120 and the processor 110 may be separately provided. Optionally, the memory 120 may also be integrated in the processor 110.
The sound pickup 130 may also be referred to as a "microphone" or "microphone" for converting sound signals into electrical signals. The electronic device 100 may include one or more microphones 130. The sound pickup 130 may collect sound in the environment of the electronic device 100 and transmit the collected sound to the processor 110 for processing by the processor 110. Illustratively, the pickup 130 may be a microphone. The pickup 130 may be disposed in the electronic device 100, or may be disposed outside the electronic device 100, which is not limited herein.
Gesture recognition device 140 may be used to detect gestures made by a user. The gesture recognition device 140 may detect gestures made by a user by methods such as, but not limited to, camera tracking based on computer vision, comparison of transmitted and reflected signals, and the like. By way of example, taking a comparison method based on a transmitted signal and a reflected signal as shown in fig. 2, the gesture recognition apparatus 140 may include a signal transmitter 141, a signal receiver 142, and a signal processing unit 143. In the working process of the gesture recognition device 140, the signal transmitter 141 can send signals (such as millimeter wave, infrared light, ultrasound, wireless fidelity (wireless fidelity, wi-Fi) and the like); the signal receiver 142 may then receive the signal reflected by the hand 22 of user a; finally, the signal processing unit 143 tracks the motion of the hand 22 of the user a by comparing the original signal sent by the signal transmitter 141 and the reflected signal received by the signal receiver 142, and further determines the gesture made by the user a by using the principles of doppler effect, phase shift, time difference, etc. In one example, when the gesture recognition apparatus 140 detects a gesture made by a user using a camera tracking method based on computer vision, the gesture recognition apparatus 140 may further include a camera that may collect an image of a hand of the user and detect the gesture made by the user based on a preset image processing algorithm. In one example, at least some of the components of gesture recognition device 140 (such as signal processing unit 143) may be integrated into processor 110. In one example, gesture recognition apparatus 140 may be integrated on electronic device 100, may be disposed separately, or may be partially integrated on electronic device 100, and partially disposed separately, without limitation. In one example, at least some of the functionality (e.g., data processing functionality, etc.) in gesture recognition device 140 may be implemented by processor 110.
In one embodiment, gesture recognition device 140 may include an Electromyogram (EMG) bracelet or an infrared remote control pen, among other externally-mounted components. In addition, the user may use the peripheral components in the gesture recognition apparatus 140, and the signal processing unit in the gesture recognition apparatus 140 may acquire signals transmitted from the peripheral components and detect gestures and movements of the user according to the signals transmitted from the peripheral components.
It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
By way of example, fig. 3 illustrates a method of controlling an electronic device. The electronic device in fig. 3 may be the electronic device 100 described in fig. 1 above. As shown in fig. 3, the electronic device control method may include the steps of:
s301, acquiring first sound of a first duration through a sound acquisition device.
Specifically, the sound collection device matched with the electronic equipment can continuously or periodically collect the sound in the environment where the electronic equipment is located, so that the first sound with the first duration can be obtained through the sound collection device. Illustratively, the sound collection device may be the pickup 130 described in fig. 1 above.
In some embodiments, after the first sound is obtained, the first sound may be preprocessed to eliminate the interference. The first sound may be processed by an echo cancellation (acoustic echo cancellation, AEC) algorithm, for example, to eliminate disturbances, such as the elimination of cabin media played sounds, particularly human voice disturbances, for subsequent determination of whether or not the first sound contains voiceprints.
S302, judging whether the first sound contains a target sound or not through voiceprint recognition, wherein the target sound is a sound emitted by a real person.
Specifically, after the first sound is obtained, whether the sound (i.e., the target sound) emitted by the real person in the first sound is judged through voiceprint recognition. When included, S303 may be performed; otherwise, S307 is performed. For example, the sound emitted by the real person may be the sound emitted by the real person, instead of the sound of the user played through the sound playing device.
As a possible implementation manner, after the first sound is acquired, the first sound may be input into a neural network model trained in advance or other voiceprint models, so as to determine whether the sound uttered by the real person in the first sound is recognized through voiceprint. Wherein when included, it is indicated that there is a user speaking, at which time there is a possibility that gesture recognition is required, so S303 can be performed.
S303, acquiring second sound with a second duration through the sound acquisition device.
S304, judging whether the second sound contains the target sound or not through voiceprint recognition. When included, S305 is performed, otherwise S307 is performed.
S305, judging whether the number of voiceprints in the target sound is larger than a preset number.
Specifically, after the second sound is determined to include the target sound, it may be determined whether the number of voiceprints in the target sound is greater than a preset number. When it is greater than the preset number, S306 may be performed; otherwise, S307 is performed. The preset number may be 1, for example.
As a possible implementation manner, after the target sound is determined to be included in the second sound, the second sound may be input into a voiceprint encoder or other voiceprint extraction model to determine the number of voiceprints included in the second sound, so that it may be determined whether the number is greater than a preset number. When the number of voiceprints included in the second sound is greater than the preset number, it indicates that the current scene is a multi-person dialogue scene, and gesture recognition is required at this time, so S306 may be executed.
S306, controlling a gesture recognition device matched with the electronic equipment to work.
Specifically, when it is determined that gesture recognition is required, the gesture recognition device matched with the electronic device can be controlled to work, so that gesture recognition is performed. For example, when the gesture recognition apparatus is a camera, the camera may be controlled to be turned on.
As a possible implementation manner, if the gesture recognition apparatus is currently in a closed state, the operation of controlling the gesture recognition apparatus matched with the electronic device may be to control the gesture recognition apparatus to be turned on.
As a possible implementation manner, if the gesture recognition device is currently in an on state and the current detection frequency of the gesture recognition device is in a low frequency state, the operation of controlling the gesture recognition device matched with the electronic device may be to control the detection frequency of the gesture recognition device to be switched from a low frequency to a high frequency.
As another possible implementation manner, if the gesture recognition device is currently in the on state and the gesture recognition device is currently in the low-precision detection state, the operation of controlling the gesture recognition device matched with the electronic device may be controlled to switch from the low-precision detection to the high-precision detection.
S307, limiting the operation of a gesture recognition device matched with the electronic equipment.
Specifically, when it is determined that gesture recognition is not required, the gesture recognition device matched with the electronic device may be limited to operate, for example, the gesture recognition device is controlled to be in a closed state, the detection frequency of the gesture recognition device is controlled to be in a low frequency state, the gesture recognition device is controlled to be in a low-precision detection state, and the like, so as to save power consumption.
Therefore, whether the voice acquired by the voice acquisition device and the voiceprint recognition judge the multi-person conversation situation or not, and when the voice recognition judges the multi-person conversation situation, the gesture recognition device matched with the electronic equipment is controlled to work, so that gesture recognition can be started under the necessary condition, and the gesture recognition can be limited under the unnecessary condition, and the power consumption of the gesture recognition device is reduced.
In some embodiments, after the gesture recognition apparatus associated with the electronic device is controlled to operate, when the sound acquired within a certain period of time does not include the target sound (i.e. the sound emitted by the real person), or includes the target sound but does not include a plurality of voiceprints, it indicates that the multi-person dialogue scene is not satisfied at this time, and at this time, the gesture recognition apparatus associated with the electronic device may be limited to operate so as to save power consumption.
In some embodiments, when a user dialogs with a voice assistant associated with an electronic device, although the voice assistant does not belong to a real person, it may communicate with the user, and thus may also be counted as a "real person", i.e., the sound made by the semantic assistant may be regarded as a sound made by a real person, and the scene at this time may also be regarded as a "multi-person dialog scene". In a person-to-person conversation, gestures are often used instead of speech, and thus can also be considered as a high-frequency scene using gestures during the conversation with a voice assistant.
In some embodiments, there are a variety of sensors in the intelligent cockpit, such as image cameras, infrared sensors, etc. For example, the camera may be used to capture and identify the change of the facial features of the driver (eye and head movements, etc.), so as to determine whether the driver is driving fatigue, and whether a corresponding early warning mechanism is needed. However, the dimension is single only through the judging mechanism of the change of the facial features, and the power consumption load on the system is large due to the detection and operation in the whole process.
In the whole driving process, the corresponding fatigue monitoring can be determined by multi-dimensional judgment, so that the power consumption of the whole system is reduced. Ideally, the system is in an organic and flexible state, and different sensors are triggered to start to operate under different conditions through the coordinated operation of various sensors, so that the load of the whole system is reduced.
As a possible implementation, it is possible to set whether the user has not communicated with the voice assistant for a long time, whether the user has not made a sound of a certain decibel for a long time, as a condition whether fatigue monitoring needs to be turned on. If the user is driving or driving on the way, the user frequently interacts with the voice assistant, and the voice assistant controls various hardware devices on the vehicle. It is representative that the user's brain is in a flexible state, at which time fatigue monitoring may be temporarily disabled. When the environmental sound is too high or the condition that a plurality of people exist on the vehicle is monitored, the corresponding fatigue monitoring can not be started. Because of the interaction between people in the environment of multiple people, the fatigue feeling can be effectively reduced.
All the above list belongs to the method that the system is helped to reasonably distribute the starting state of the detection type hardware by adding the manually set fixed conditions so as to match with the multi-mode interaction of the user and reduce the power consumption of the system. Of course, the user may also help the system reduce power consumption by autonomously setting conditions. For example: before driving, the user sets a dedicated condition for starting fatigue monitoring based on own habit, for example, the fatigue monitoring is started again after the high decibel noise in the vehicle exceeds half an hour.
In some embodiments, by the method in the above embodiments, a condition setting for triggering multi-mode interaction may be provided for a device supporting multi-mode, and an instruction for determining whether to start multi-mode interaction is determined based on a plurality of judging conditions, so as to balance power consumption of the whole system.
For ease of understanding, the following examples are presented.
As shown in fig. 4, the sound is received by a sound receiving device (such as the pickup). Then, by means of the echo cancellation algorithm, interfering sounds, such as the sound of the media play, in particular the contained human voice interference, are excluded. Then, whether the person speaks is judged by voiceprint recognition. When the person does not speak, the person does not need to judge the scene of the multi-person conversation, and the gesture recognition device can be limited to work. When the person speaks, the person can monitor for a period of time, record the number of the voice prints, and judge whether the person speaks according to the recorded voice print number. When the user does not speak by a plurality of persons, the user may not determine that the user is a multi-person conversation scene, and the gesture recognition apparatus may be restricted from operating. When the user speaks by multiple persons, the user can judge that the user speaks by multiple persons, and the multi-mode gesture monitoring can be started at the moment, and the frequency of the gesture monitoring is increased.
Next, another electronic device control method provided by the embodiment of the present application is described based on the above-described electronic device control method. It will be appreciated that the method is based on the electronic device control method described above, and that some or all of this method may be found in the description of the electronic device control method above.
Referring to fig. 5, fig. 5 is a flowchart illustrating another electronic device control method according to an embodiment of the application. It is understood that the method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown in fig. 5, the electronic device control method may include:
s501, acquiring a first sound of a first duration through a sound acquisition device.
S502, determining the number of target users in the environment according to the first sound, wherein the target users are the users making the sound.
Specifically, after the first sound is obtained, the number of target users in the environment can be determined according to the first sound, where the target users are users who make the sound.
As a possible implementation manner, the number of target users in the environment may be determined according to the first sound, which may specifically be: judging whether the first sound contains a target sound or not, wherein the target sound is a sound emitted by a real person; when the first sound contains the target sound, the number of target users is determined according to the number of voiceprints contained in the first sound. By way of example, whether the target sound is contained may be determined by means of voiceprint recognition.
As another possible implementation manner, the number of target users in the environment may be determined according to the first sound, which may specifically be: judging whether the first sound contains a target sound or not, wherein the target sound is a sound emitted by a real person; when the first sound contains the target sound, the second sound with the second duration can be acquired through the sound acquisition device; then judging whether the second sound contains the target sound or not; when the target sound is contained in the second sound, the number of target users is determined according to the number of voiceprints contained in the second sound. Therefore, the resource consumption caused by misjudgment caused by the number of voiceprints contained in the direct judgment sound is reduced. By way of example, whether the target sound is contained may be determined by means of voiceprint recognition.
And S503, controlling a gesture recognition device matched with the electronic equipment to work when the number of the target users is greater than the preset number.
Specifically, when the number of target users is greater than the preset number, a multi-user dialogue scene is indicated, and at the moment, a gesture recognition device matched with the electronic equipment can be controlled to work. For example, the control gesture recognition apparatus is switched from an off state to an on state; or, controlling the detection frequency of the gesture recognition device to be switched from a first frequency to a second frequency, wherein the first frequency is smaller than the second frequency; alternatively, the detection accuracy of the control gesture recognition apparatus is switched from the first accuracy to the second accuracy, and the first accuracy is smaller than the second accuracy.
In some embodiments, when the number of target users is less than or equal to the preset number, the gesture recognition apparatus may be restricted from operating to reduce power consumption of the gesture recognition apparatus. For example, the gesture recognition device is controlled to be switched from an on state to an off state, or the gesture recognition device is controlled to be continuously in the off state; or, controlling the detection frequency of the gesture recognition device to be switched from the second frequency to the first frequency, wherein the first frequency is smaller than the second frequency; or, continuing to control the detection frequency of the gesture recognition device to be the second frequency; or, the detection precision of the gesture recognition device is controlled to be switched from the second precision to the first precision, and the first precision is smaller than the second precision; or, continuing to control the detection precision of the gesture recognition device to be the second precision.
Therefore, whether the voice is a multi-person conversation situation or not is judged through the voice collected by the voice collecting device, and when the voice is judged to be the multi-person conversation situation, the gesture identifying device matched with the electronic equipment is controlled to work, so that gesture identification can be started under the necessary condition, and the gesture identification can be limited under the unnecessary condition, and further the power consumption of the gesture identifying device is reduced.
In some embodiments, after controlling the gesture recognition apparatus matched with the electronic device to work, a third sound of a third duration may also be obtained through the sound collecting apparatus; judging whether the third sound contains target sound or not, or whether the number of voiceprints contained in the third sound is larger than a preset number or not; when the third sound does not contain the target sound or the number of voiceprints contained in the third sound is smaller than or equal to the preset number, the gesture recognition device is limited to work; and when the third sound contains the target sound and the number of the voiceprints contained in the third sound is larger than the preset number, continuing to control the gesture recognition device to work. Thus, the gesture recognition device is prevented from continuously working.
It should be understood that, the sequence number of each step in the foregoing embodiment does not mean the execution sequence, and the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application. In addition, in some possible implementations, each step in the foregoing embodiments may be selectively performed according to practical situations, and may be partially performed or may be performed entirely, which is not limited herein.
Based on the method described in the above embodiment, the embodiment of the present application further provides an electronic device control apparatus. Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device control apparatus according to an embodiment of the application. As shown in fig. 6, the electronic device control apparatus 600 includes one or more processors 601 and interface circuitry 602. Optionally, the electronic device control apparatus 600 may further comprise a bus 603. Wherein:
the processor 601 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 601 or instructions in the form of software. The processor 601 may be a general purpose processor, a neural network processor (Neural Network Processing Unit, NPU), a digital communicator (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components. The methods and steps disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The interface circuit 602 may be used for transmitting or receiving data, instructions, or information, and the processor 601 may process using the data, instructions, or other information received by the interface circuit 602, and may transmit processing completion information through the interface circuit 602.
Optionally, the electronic device control apparatus 600 further includes a memory, which may include a read only memory and a random access memory, and provides operating instructions and data to the processor. A portion of the memory may also include non-volatile random access memory (NVRAM). Wherein the memory may be coupled to the processor 601.
Alternatively, the memory stores executable software modules or data structures and the processor 601 may perform corresponding operations by invoking operational instructions stored in the memory (which may be stored in an operating system).
Alternatively, the interface circuit 602 may be configured to output the execution result of the processor 601.
It should be noted that, the functions corresponding to the processor 601 and the interface circuit 602 may be implemented by a hardware design, a software design, or a combination of hardware and software, which is not limited herein. By way of example, the electronic device control apparatus 600 may be applied, but is not limited to, in the electronic device 100 shown in fig. 2.
It will be appreciated that the steps of the method embodiments described above may be performed by logic circuitry in the form of hardware in a processor or instructions in the form of software.
It is to be appreciated that the processor in embodiments of the application may be a central processing unit (central processing unit, CPU), other general purpose processor, digital signal processor (digital signal processor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.
The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable programmable PROM (EPROM), electrically erasable programmable EPROM (EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application.

Claims (10)

1. A method of controlling an electronic device, the method comprising:
acquiring a first sound of a first duration through a sound acquisition device;
determining the number of target users in the environment according to the first sound, wherein the target users are users who make sound;
and when the number of the target users is greater than the preset number, controlling a gesture recognition device matched with the electronic equipment to work.
2. The method according to claim 1, wherein said determining the number of target users in the environment from said first sound, in particular comprises:
judging whether the first sound contains a target sound or not, wherein the target sound is a sound emitted by a real person;
and when the first sound contains the target sound, determining the number of target users according to the number of voiceprints contained in the first sound.
3. The method according to claim 1, wherein said determining the number of target users in the environment from said first sound, in particular comprises:
judging whether the first sound contains a target sound or not, wherein the target sound is a sound emitted by a real person;
when the first sound contains target sound, acquiring second sound with second duration through the sound acquisition device;
judging whether the target sound is contained in the second sound or not;
and when the target sound is contained in the second sound, determining the number of target users according to the number of voiceprints contained in the second sound.
4. A method according to any one of claims 1-3, wherein said controlling operation of the gesture recognition device associated with the electronic device comprises:
controlling the gesture recognition device to be switched from a closed state to an open state;
or, controlling the detection frequency of the gesture recognition device to be switched from a first frequency to a second frequency, wherein the first frequency is smaller than the second frequency;
or, the detection precision of the gesture recognition device is controlled to be switched from a first precision to a second precision, wherein the first precision is smaller than the second precision.
5. The method of any of claims 1-4, wherein after controlling operation of a gesture recognition apparatus associated with the electronic device, the method further comprises:
acquiring a third sound with a third duration through the sound acquisition device;
judging whether the third sound contains target sound, wherein the target sound is sound emitted by a real person or whether the number of voiceprints contained in the third sound is larger than a preset number;
when the target sound is not contained in the third sound, or the number of voiceprints contained in the third sound is smaller than or equal to the preset number, limiting the gesture recognition device to work;
and when the third sound contains the target sound and the number of voiceprints contained in the third sound is larger than the preset number, continuing to control the gesture recognition device to work.
6. The method according to any one of claims 1-5, further comprising:
and limiting the gesture recognition device to work when the number of the target users is smaller than or equal to the preset number.
7. An electronic device control apparatus, comprising:
at least one processor and an interface;
the at least one processor obtains program instructions or data through the interface;
the at least one processor is configured to execute the program line instructions to implement the method of any of claims 1-6.
8. An electronic device, comprising:
at least one memory for storing a program;
at least one processor for executing a memory-stored program, which processor is adapted to perform the method according to any of claims 1-6 when the memory-stored program is executed.
9. A computer readable storage medium storing a computer program which, when run on an electronic device, causes the electronic device to perform the method of any one of claims 1-8.
10. A computer program product, characterized in that the computer program product, when run on an electronic device, causes the electronic device to perform the method according to any of claims 1-6.
CN202210121313.7A 2022-02-09 2022-02-09 Electronic equipment control method and device and electronic equipment Pending CN116610206A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210121313.7A CN116610206A (en) 2022-02-09 2022-02-09 Electronic equipment control method and device and electronic equipment
PCT/CN2022/136694 WO2023151360A1 (en) 2022-02-09 2022-12-05 Electronic device control method and apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210121313.7A CN116610206A (en) 2022-02-09 2022-02-09 Electronic equipment control method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116610206A true CN116610206A (en) 2023-08-18

Family

ID=87563601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210121313.7A Pending CN116610206A (en) 2022-02-09 2022-02-09 Electronic equipment control method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN116610206A (en)
WO (1) WO2023151360A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201544993A (en) * 2014-05-28 2015-12-01 Pegatron Corp Gesture control method, gesture control module, and wearable device having the same
DE102016221564A1 (en) * 2016-10-13 2018-04-19 Bayerische Motoren Werke Aktiengesellschaft Multimodal dialogue in a motor vehicle
CN106960161A (en) * 2017-03-23 2017-07-18 全椒县志宏机电设备设计有限公司 The method and mobile terminal of a kind of application encryption

Also Published As

Publication number Publication date
WO2023151360A1 (en) 2023-08-17

Similar Documents

Publication Publication Date Title
CN107481718B (en) Audio recognition method, device, storage medium and electronic equipment
CN108766438B (en) Man-machine interaction method and device, storage medium and intelligent terminal
CN111325386B (en) Method, device, terminal and storage medium for predicting running state of vehicle
EP4191579A1 (en) Electronic device and speech recognition method therefor, and medium
US11495223B2 (en) Electronic device for executing application by using phoneme information included in audio data and operation method therefor
CN114360527B (en) Vehicle-mounted voice interaction method, device, equipment and storage medium
CN111833872B (en) Voice control method, device, equipment, system and medium for elevator
CN111326152A (en) Voice control method and device
CN114333774B (en) Speech recognition method, device, computer equipment and storage medium
CN115831155A (en) Audio signal processing method and device, electronic equipment and storage medium
CN112634895A (en) Voice interaction wake-up-free method and device
US11437031B2 (en) Activating speech recognition based on hand patterns detected using plurality of filters
CN115206306A (en) Voice interaction method, device, equipment and system
CN113330513A (en) Voice information processing method and device
CN111966321A (en) Volume adjusting method, AR device and storage medium
CN108922523B (en) Position prompting method and device, storage medium and electronic equipment
CN116610206A (en) Electronic equipment control method and device and electronic equipment
JPWO2020021861A1 (en) Information processing equipment, information processing system, information processing method and information processing program
CN115171692A (en) Voice interaction method and device
CN111816180B (en) Method, device, equipment, system and medium for controlling elevator based on voice
CN114220420A (en) Multi-modal voice wake-up method, device and computer-readable storage medium
CN113225624A (en) Time-consuming determination method and device for voice recognition
CN112882394A (en) Device control method, control apparatus, and readable storage medium
CN115331672B (en) Device control method, device, electronic device and storage medium
CN112740219A (en) Method and device for generating gesture recognition model, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination