CN115171703B - Distributed voice awakening method and device, storage medium and electronic device - Google Patents

Distributed voice awakening method and device, storage medium and electronic device Download PDF

Info

Publication number
CN115171703B
CN115171703B CN202210603410.XA CN202210603410A CN115171703B CN 115171703 B CN115171703 B CN 115171703B CN 202210603410 A CN202210603410 A CN 202210603410A CN 115171703 B CN115171703 B CN 115171703B
Authority
CN
China
Prior art keywords
group
devices
noise
signals
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210603410.XA
Other languages
Chinese (zh)
Other versions
CN115171703A (en
Inventor
邓邱伟
郝斌
王迪
张丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Qingdao Haier Intelligent Home Appliance Technology Co Ltd
Haier Smart Home Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Qingdao Haier Intelligent Home Appliance Technology Co Ltd
Haier Smart Home Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Qingdao Haier Intelligent Home Appliance Technology Co Ltd, Haier Smart Home Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202210603410.XA priority Critical patent/CN115171703B/en
Publication of CN115171703A publication Critical patent/CN115171703A/en
Priority to PCT/CN2023/085259 priority patent/WO2023231552A1/en
Application granted granted Critical
Publication of CN115171703B publication Critical patent/CN115171703B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The application discloses a distributed voice awakening method and device, a storage medium and an electronic device, and relates to the technical field of smart families, comprising the following steps: under the condition that the first group of equipment receives the first wake-up audio, acquiring an original signal generated by the first wake-up audio of each piece of equipment in the first group of equipment to obtain a first group of original signals, acquiring feedback information of each piece of equipment in the first group of equipment on the first wake-up audio to obtain a first group of feedback information, determining equipment which wakes up an interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment, and determining the original signal generated by the second group of equipment from the first group of original signals to obtain a second group of original signals; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices; and performing noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals, and determining target equipment in the second group of equipment according to the group of noise reduction signals.

Description

Distributed voice awakening method and device, storage medium and electronic device
Technical Field
The application relates to the technical field of smart families, in particular to a distributed voice awakening method and device, a storage medium and an electronic device.
Background
In the related art, with the development of artificial intelligence technology, more and more intelligent voice devices enter ordinary families. After the number of wake-up module devices is increased in the scene, intelligent devices such as televisions, air conditioners and refrigerators can simultaneously say 'I' after the user speaks wake-up words. Each device is networked by the Internet of things technology, intelligent judgment is carried out according to the dimensions such as the distance, the direction and the like of a user and a sound box through an intelligent perception algorithm, only one device responds and interacts with the wake-up word after the user speaks the wake-up word, and other devices remain quiet. In a quiet scene, the amplitude/energy of the signal may be utilized as a discriminant criterion. The near-distance device signal amplitude is greater than the far-end device due to attenuation of the acoustic wave. When a more complex scene appears, a user speaks a wake-up word when a certain device is self-broadcast, and at the moment, for the self-broadcast device, the echo belongs to self-noise, and for other devices, the echo belongs to external noise, and a device which simply takes the amplitude/energy of a signal as a discrimination criterion cannot accurately determine response cannot be determined.
Aiming at the problems that in the related art, in a complex scene, a responding device cannot be accurately and rapidly determined from a plurality of devices and the like, no effective solution has been proposed yet.
Disclosure of Invention
The embodiment of the application provides a distributed voice awakening method and device, a storage medium and an electronic device, which at least solve the problems that in the related art, a responsive device cannot be accurately and rapidly determined from a plurality of devices under a complex scene.
According to an embodiment of the present application, there is provided a distributed voice wakeup method, including: under the condition that the first group of equipment receives the first wake-up audio, acquiring an original signal generated by each equipment in the first group of equipment according to the first wake-up audio, obtaining a first group of original signals altogether, and acquiring feedback information of each equipment in the first group of equipment for the first wake-up audio, obtaining a first group of feedback information altogether, wherein the first group of equipment is equipment in the same network, the feedback information is used for indicating whether the corresponding equipment in the first group of equipment responds to the first wake-up audio wake-up interactive function or not, and the original signals are audio signals converted after the equipment receives the first wake-up audio; determining equipment which wakes up an interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment altogether, and determining an original signal generated by the second group of equipment in the first group of original signals to obtain a second group of original signals altogether; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices; performing noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; and determining target equipment in the second group of equipment according to the noise reduction signal, controlling the target equipment to play second audio corresponding to the first wake-up audio, and controlling equipment except the second target equipment in the second group of equipment to mute.
In one exemplary embodiment, determining a target noise cancellation scheme from a preset set of noise cancellation schemes based on the number of devices in the second set of devices comprises: determining a first noise cancellation scheme from a set of noise cancellation schemes when the number of devices in the second set of devices is greater than or equal to a first preset threshold, wherein the first noise cancellation scheme is used to filter out self-noise signals from the second set of original signals through a preset first adaptive filter, and to filter out external noise signals from the second set of original signals through a second adaptive filter, the second adaptive filter being a filter generated according to beamforming between the second set of devices; and under the condition that the number of the devices in the second group of devices is smaller than a first preset threshold value, determining a second noise elimination mode from one group of noise elimination modes, wherein the second noise elimination mode is used for filtering self-noise signals from the second group of original signals through a first adaptive filter and filtering external noise signals which are determined through sound source separation among the second group of devices from the second group of original signals.
In one exemplary embodiment, after determining the first noise cancellation scheme from a set of noise cancellation schemes, the method further comprises: counting a first energy value of a target noise reduction signal corresponding to a second group of equipment processed by a first noise signal elimination mode; determining a second energy value corresponding to an external noise signal in a second group of original signals filtered by the first noise signal elimination mode; determining that the target object is at the same angle with the second group of equipment under the condition that the difference value between the first energy value and the second energy value is lower than a second preset threshold value; adding estimated signals into a target second group of original signals processed in the first noise signal elimination mode, wherein the estimated signals are preset signals for balancing signal cancellation.
In an exemplary embodiment, before determining the first noise cancellation mode from the set of noise cancellation modes, the method further comprises, in the case that the number of devices in the second set of devices is greater than or equal to the first preset threshold: determining the position information of each device in the second group of devices in the target area to obtain a group of position information; determining a relative position between each two devices in the second set of devices through a set of position information; a second adaptive filter is determined for each device in the second set of devices based on the relative locations.
In one exemplary embodiment, determining the relative position between each two devices in the second set of devices from a set of position information includes each device in the second set of devices successively entering a calibration mode, determining the relative direction between each device and the other devices from the set of position information; beam forming is carried out based on the relative direction, so that first estimated external noise and second estimated external noise between every two devices in the second group of devices are obtained; and under the condition that the first estimated external noise is the same as the second estimated external noise, determining the relative position between every two devices in the second group of devices.
In an exemplary embodiment, before determining the second noise cancellation mode from the set of noise cancellation modes, the method further comprises: decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm; calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal; and determining the sub-signals approaching to the target energy value in the third energy value and the fourth energy value as echo signals corresponding to the second group of original signals, and determining the external noise signals to be filtered in the second group of original signals based on the echo signals.
In one exemplary embodiment, determining a target device in a second set of devices from a set of noise reduction signals comprises: under the condition that each device in the second group of devices has a noise reduction signal, determining a target amplitude value of the noise reduction signal corresponding to each device in the second group of devices to obtain a plurality of target amplitude values corresponding to the second group of devices; and sequentially arranging a plurality of target amplitudes from large to small, selecting the device with the largest target amplitude as a response device, and determining the response device as a target device from the second group of devices so as to interact with a target object which emits first wake-up audio.
According to another embodiment of the present application, there is also provided a distributed voice wakeup apparatus including: the device comprises an acquisition module, a first processing module and a second processing module, wherein the acquisition module is used for acquiring an original signal generated by each device in the first group of devices according to first wake-up audio under the condition that the first group of devices receive the first wake-up audio, acquiring a first group of original signals together, and acquiring feedback information of each device in the first group of devices on the first wake-up audio to acquire a first group of feedback information together, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding devices in the first group of devices respond to the first wake-up audio wake-up interactive function or not, and the original signals are audio signals converted after the first wake-up audio is received by the devices; the first determining module is used for determining equipment which wakes up the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment altogether, and determining the original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals altogether; a second determining module, configured to determine a target noise cancellation mode from a preset set of noise cancellation modes according to the number of devices in the second set of devices; the processing module is used for performing noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; and the control module is used for determining target equipment in the second group of equipment according to the group of noise reduction signals, controlling the target equipment to play second audio corresponding to the first wake-up audio and controlling equipment in the second group of equipment except the second target equipment to mute.
In an exemplary embodiment, the second determining module is further configured to determine, in a case where the number of devices in the second set of devices is greater than or equal to a first preset threshold, a first noise cancellation mode from a set of noise cancellation modes, where the first noise cancellation mode is used to filter, by a preset first adaptive filter, a self-noise signal from the second set of original signals, and filter, by a second adaptive filter, an external noise signal from the second set of original signals, where the second adaptive filter is a filter generated according to beamforming between the second set of devices; and under the condition that the number of the devices in the second group of devices is smaller than a first preset threshold value, determining a second noise elimination mode from one group of noise elimination modes, wherein the second noise elimination mode is used for filtering self-noise signals from the second group of original signals through a first adaptive filter and filtering external noise signals which are determined through sound source separation among the second group of devices from the second group of original signals.
In an exemplary embodiment, the second determining module further includes: the adding unit is used for counting first energy values of target noise reduction signals corresponding to the second group of equipment processed in the first noise signal elimination mode; determining a second energy value corresponding to an external noise signal in a second group of original signals filtered by the first noise signal elimination mode; determining that the target object is at the same angle with the second group of equipment under the condition that the difference value between the first energy value and the second energy value is lower than a second preset threshold value; adding estimated signals into a target second group of original signals processed in the first noise signal elimination mode, wherein the estimated signals are preset signals for balancing signal cancellation.
In an exemplary embodiment, the second determining module is further configured to determine location information of each device in the second set of devices in the target area, and obtain a set of location information altogether; determining a relative position between each two devices in the second set of devices through a set of position information; a second adaptive filter is determined for each device in the second set of devices based on the relative locations.
In an exemplary embodiment, the second determining module is further configured to sequentially enter the calibration mode by each device in the second set of devices, and determine a relative direction between each device and the other devices according to a set of location information; beam forming is carried out based on the relative direction, so that first estimated external noise and second estimated external noise between every two devices in the second group of devices are obtained; and under the condition that the first estimated external noise is the same as the second estimated external noise, determining the relative position between every two devices in the second group of devices.
In an exemplary embodiment, the second determining module further includes: the comparison unit is used for decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm; calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal; and determining the sub-signals approaching to the target energy value in the third energy value and the fourth energy value as echo signals corresponding to the second group of original signals, and determining the external noise signals to be filtered in the second group of original signals based on the echo signals.
In an exemplary embodiment, the control module is further configured to determine, when each device in the second set of devices has a noise reduction signal, a target amplitude of the noise reduction signal corresponding to each device in the second set of devices, to obtain multiple target amplitudes corresponding to the second set of devices; and sequentially arranging a plurality of target amplitudes from large to small, selecting the device with the largest target amplitude as a response device, and determining the response device as a target device from the second group of devices so as to interact with a target object which emits first wake-up audio.
According to another aspect of the embodiments of the present application, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described distributed voice wakeup method when run.
According to still another aspect of the embodiments of the present application, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above-mentioned distributed voice wakeup method through the computer program.
In the embodiment of the application, under the condition that the first group of equipment receives the first wake-up audio, acquiring an original signal generated by each equipment in the first group of equipment according to the first wake-up audio, obtaining a first group of original signals together, and acquiring feedback information of each equipment in the first group of equipment for the first wake-up audio, obtaining a first group of feedback information together, wherein the first group of equipment is equipment in the same network, the feedback information is used for indicating whether the corresponding equipment in the first group of equipment responds to the first wake-up audio wake-up interactive function or not, and the original signals are audio signals converted after the first wake-up audio received by the equipment; determining equipment which wakes up an interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment altogether, and determining an original signal generated by the second group of equipment in the first group of original signals to obtain a second group of original signals altogether; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices; performing noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; according to a group of noise reduction signals, determining target equipment in a second group of equipment, controlling the target equipment to play second audio corresponding to the first wake-up audio, and controlling equipment except the second target equipment in the second group of equipment to mute, namely determining original signals of equipment which feeds back first wake-up audio signals sent by target objects in a plurality of corresponding equipment in a target area under a distributed processing scene, performing noise elimination processing on the original signals to obtain corresponding noise reduction signals, and determining final target equipment from the plurality of target equipment with feedback through the noise reduction signals; by adopting the technical scheme, the problems that in the related technology, the responding equipment cannot be accurately and rapidly determined from the plurality of equipment in a complex scene are solved, the target equipment which can finally interact with the target object can be determined from the plurality of equipment with the response in the complex scene, and the technical effect of the effectiveness of the scheme taking the amplitude/energy of the signal as the discrimination criterion is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of a hardware environment of a distributed voice wakeup method according to an embodiment of the present application;
FIG. 2 is a flow chart of a distributed voice wakeup method according to an embodiment of the present application;
FIG. 3 is a computational flow diagram of an alternative embodiment of the present application for selecting beamforming;
FIG. 4 is a computational flow diagram of an alternative sound source separation in accordance with an alternative embodiment of the present application;
Fig. 5 is a block diagram of an alternative distributed voice wakeup device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to one aspect of an embodiment of the present application, a distributed voice wakeup method is provided. The distributed voice wake-up method is widely applied to full-house intelligent digital control application scenes such as intelligent Home (Smart Home), intelligent Home equipment ecology, intelligent Home (INTELLIGENCE HOUSE) ecology and the like. Alternatively, in the present embodiment, the above-described distributed voice wakeup method may be applied to a hardware environment constituted by the terminal device 102, the server 104, and the image pickup device 106 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be used to provide services (such as application services and the like) for a terminal or a client installed on the terminal, a database may be set on the server or independent of the server, for providing data storage services for the server 104, and cloud computing and/or edge computing services may be configured on the server or independent of the server, for providing data computing services for the server 104.
The network may include, but is not limited to, at least one of: wired network, wireless network. The wired network may include, but is not limited to, at least one of: a wide area network, a metropolitan area network, a local area network, and the wireless network may include, but is not limited to, at least one of: WIFI (WIRELESS FIDELITY ), bluetooth. The terminal device 102 may not be limited to a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent cooking range, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock, and the like.
In this embodiment, a distributed voice wake-up method is provided and applied to the above image capturing apparatus, and fig. 2 is a flowchart of an alternative distributed voice wake-up method according to an embodiment of the present application, where the flowchart includes the following steps:
Step S202, under the condition that a first group of devices receives first wake-up audio, acquiring an original signal generated by each device in the first group of devices according to the first wake-up audio, obtaining a first group of original signals altogether, and acquiring feedback information of each device in the first group of devices for the first wake-up audio, obtaining a first group of feedback information altogether, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding devices in the first group of devices respond to the first wake-up audio wake-up interactive function, and the original signals are audio signals converted after the first wake-up audio received by the devices;
Step S204, determining the equipment which wakes up the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment altogether, and determining the original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals altogether;
Step S206, determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices;
step S208, performing noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
Step S210, determining a target device in the second set of devices according to the set of noise reduction signals, controlling the target device to play second audio corresponding to the first wake-up audio, and controlling devices in the second set of devices except the second target device to mute.
Through the steps, under the condition that the first group of equipment receives the first wake-up audio, acquiring an original signal generated by each equipment in the first group of equipment according to the first wake-up audio, obtaining a first group of original signals altogether, and acquiring feedback information of each equipment in the first group of equipment for the first wake-up audio, obtaining a first group of feedback information altogether, wherein the first group of equipment is equipment in the same network, the feedback information is used for indicating whether the corresponding equipment in the first group of equipment responds to the first wake-up audio wake-up interactive function or not, and the original signals are audio signals converted after the equipment receives the first wake-up audio; determining equipment which wakes up an interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment altogether, and determining an original signal generated by the second group of equipment in the first group of original signals to obtain a second group of original signals altogether; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices; performing noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; according to a group of noise reduction signals, determining target equipment in a second group of equipment, controlling the target equipment to play second audio corresponding to the first wake-up audio, and controlling equipment except the second target equipment in the second group of equipment to mute, namely determining original signals of equipment which feeds back first wake-up audio signals sent by target objects in a plurality of corresponding equipment in a target area under a distributed processing scene, performing noise elimination processing on the original signals to obtain corresponding noise reduction signals, and determining final target equipment from the plurality of target equipment with feedback through the noise reduction signals; by adopting the technical scheme, the problems that in the related technology, the responding equipment cannot be accurately and rapidly determined from the plurality of equipment in a complex scene are solved, the target equipment which can finally interact with the target object can be determined from the plurality of equipment with the response in the complex scene, and the technical effect of the effectiveness of the scheme taking the amplitude/energy of the signal as the discrimination criterion is improved.
In one exemplary embodiment, determining a target noise cancellation scheme from a preset set of noise cancellation schemes based on the number of devices in the second set of devices comprises: determining a first noise cancellation scheme from a set of noise cancellation schemes when the number of devices in the second set of devices is greater than or equal to a first preset threshold, wherein the first noise cancellation scheme is used to filter out self-noise signals from the second set of original signals through a preset first adaptive filter, and to filter out external noise signals from the second set of original signals through a second adaptive filter, the second adaptive filter being a filter generated according to beamforming between the second set of devices; and under the condition that the number of the devices in the second group of devices is smaller than a first preset threshold value, determining a second noise elimination mode from one group of noise elimination modes, wherein the second noise elimination mode is used for filtering self-noise signals from the second group of original signals through a first adaptive filter and filtering external noise signals which are determined through sound source separation among the second group of devices from the second group of original signals.
Optionally, the external noise signal refers to a signal corresponding to noise received by a device when other devices in a group in the same network play themselves. The self-noise signal is a signal corresponding to noise generated by the equipment itself when the equipment is in operation.
In one exemplary embodiment, after determining the first noise cancellation scheme from a set of noise cancellation schemes, the method further comprises: counting a first energy value of a target noise reduction signal corresponding to a second group of equipment processed by a first noise signal elimination mode; determining a second energy value corresponding to an external noise signal in a second group of original signals filtered by the first noise signal elimination mode; determining that the target object is at the same angle with the second group of equipment under the condition that the difference value between the first energy value and the second energy value is lower than a second preset threshold value; adding estimated signals into a target second group of original signals processed in the first noise signal elimination mode, wherein the estimated signals are preset signals for balancing signal cancellation.
It should be noted that, the beamforming may be used in a multi-microphone scene, where the sound source localization of the multi-microphone is relatively accurate, or where the beam effect of the multi-microphone is relatively good, such as the main lobe width is small.
Alternatively, if the user (equivalent to the target object in the embodiment of the present invention) and the device a are at the same angle, the estimated external noise will have a wake-up word at this time, resulting in signal cancellation. At this point, the microphone signal may be used to calculate the statistical energy. In the judging method, optionally, the microphone signal and the signal for estimating the external noise are respectively connected with a wake-up module, and when the latter scoring is close to or even larger than the former, the user and the device A can be considered to be at the same angle.
In an exemplary embodiment, before determining the first noise cancellation mode from the set of noise cancellation modes, the method further comprises, in the case that the number of devices in the second set of devices is greater than or equal to the first preset threshold: determining the position information of each device in the second group of devices in the target area to obtain a group of position information; determining a relative position between each two devices in the second set of devices through a set of position information; a second adaptive filter is determined for each device in the second set of devices based on the relative locations.
In one exemplary embodiment, determining the relative position between each two devices in the second set of devices from a set of position information includes each device in the second set of devices successively entering a calibration mode, determining the relative direction between each device and the other devices from the set of position information; beam forming is carried out based on the relative direction, so that first estimated external noise and second estimated external noise between every two devices in the second group of devices are obtained; and under the condition that the first estimated external noise is the same as the second estimated external noise, determining the relative position between every two devices in the second group of devices.
Beamforming; the wave beam forming combines the multipath signals, so that interference signals in non-target directions can be restrained, and sound signals in target directions can be enhanced. Alternative embodiments of the present invention suggest that the external noise cancellation method uses a beam forming method when the number of device microphones is 4 or more. The flow is as follows: step one: and calibrating the user end. When the relative positions of the devices change, the propagation paths of the devices a to B also change. Once the position changes, the custom calibration module can be opened: after the device A is started, a piece of music or other music is automatically played by the device A, and the device B calculates the relative position of the device A. (note: sound source location calculation may use the algorithms music, gcc_ phat, tdoa, aml, etc.). Step two, signal noise reduction: first, device B beamforms in the a direction (e.g., mvdr, or gsc structure), which may yield an estimated external noise; the external noise is then filtered from the microphone signal using an adaptive filtering technique, such as NLMS.
In an exemplary embodiment, before determining the second noise cancellation mode from the set of noise cancellation modes, the method further comprises: decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm; calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal; and determining the sub-signals approaching to the target energy value in the third energy value and the fourth energy value as echo signals corresponding to the second group of original signals, and determining the external noise signals to be filtered in the second group of original signals based on the echo signals.
In one exemplary embodiment, determining a target device in a second set of devices from a set of noise reduction signals comprises: under the condition that each device in the second group of devices has a noise reduction signal, determining a target amplitude value of the noise reduction signal corresponding to each device in the second group of devices to obtain a plurality of target amplitude values corresponding to the second group of devices; and sequentially arranging a plurality of target amplitudes from large to small, selecting the device with the largest target amplitude as a response device, and determining the response device as a target device from the second group of devices so as to interact with a target object which emits first wake-up audio.
In order to better understand the process of the distributed voice wake-up method, the implementation method flow of the distributed voice wake-up method is described below in conjunction with an optional embodiment, but the implementation method flow is not limited to the technical scheme of the embodiment of the present application.
In the related art, in a quiet scene, the amplitude/energy of the signal may be used as a criterion. The near-distance device signal amplitude is greater than the far-end device due to attenuation of the acoustic wave. However, in complex scenarios, the original signal amplitude is no longer accurate as a criterion. Although the received signal may be expressed as y (t) =w (t) +n (t), i.e. the noisy signal is equal to the linear addition of the wake-up word w and the noise. But the noisy signal amplitude is not equal to the linear addition of the two amplitudes,Simple energy calibration cannot be achieved.
Optionally, an optional embodiment of the present invention proposes a front-end signal processing apparatus for distributed decision of a self-noise scene, and performs operations such as echo cancellation on a self-noise device signal in cooperation with information of a plurality of devices, and denoising an external noise signal to obtain respective clean signals, and then uses the amplitudes as a criterion. Optionally, the front-end signal processing device may include: echo cancellation module, sound source localization module, beam forming module, sound source separation module, dereverberation module, etc. The application of the different modules is as follows.
As an optional implementation manner, under the condition of self-noise scene distributed decision, for a plurality of voice devices (equivalent to the target devices in the above embodiment) under the same network, before determining the device to be responded by using the amplitude/energy of the signal as a criterion, eliminating self-noise signals in the generated signals in the interaction process of different voice devices and eliminating external noise signals which are included in the original signals corresponding to the audio and affect the device by other devices, so as to obtain clean signals of different voice devices in the interaction process, and then determining the final target device for response interaction according to the amplitude as the criterion.
Optionally, the external noise signal refers to a signal corresponding to noise received by a device when other devices in a group in the same network play themselves. The self-noise signal is a signal corresponding to noise generated by the equipment itself when the equipment is in operation.
For example, assume that a device a represents a self-broadcasting device, and a device B represents a device that receives external noise. When the corresponding noise signal processing is performed, the following cases are classified:
The self-noise processing mode is as follows: the self-noise signal carried in the original signal in the device is processed by a self-noise elimination mode, and specifically, an echo elimination technology is utilized. Optionally, the self-noise signal carried in the original signal is filtered by processing the original signal through an adaptive filter of Multi-delayblock frequency domainadaptive filter. It should be noted that if the self-noise signal in the original signal is a nonlinear echo portion, the self-noise signal may be removed by using a model/nonlinear method, and in this regard, alternative embodiments of the present invention are not limited too much.
The external noise processing mode is as follows: for external noise signals carried in the original signal in the device, beam forming or sound source separation is optional. The computational flow varies using different methods. The method comprises the following steps:
Mode one: beamforming; the wave beam forming combines the multipath signals, so that interference signals in non-target directions can be restrained, and sound signals in target directions can be enhanced. Alternative embodiments of the present invention suggest that the external noise cancellation method uses a beam forming method when the number of device microphones is 4 or more. The flow is as follows:
step one: and calibrating the user end. When the relative positions of the devices change, the propagation paths of the devices a to B also change. Once the position changes, the custom calibration module can be opened: after the device A is started, a piece of music or other music is automatically played by the device A, and the device B calculates the relative position of the device A. (note: sound source location calculation may use the algorithms music, gcc_ phat, tdoa, aml, etc.).
Step two, signal noise reduction: first, device B beamforms in the a direction (e.g., mvdr, or gsc structure), which may yield an estimated external noise; the external noise is then filtered from the microphone signal using an adaptive filtering technique, such as NLMS.
FIG. 3 is a computational flow diagram of an alternative embodiment of the present application for selecting beamforming; the method comprises the following steps:
step S402: performing sound source estimation between the equipment A and the equipment B;
step S404: determining original signals corresponding to audio sent by a target object and received by equipment A and equipment B respectively;
Step S406: performing echo cancellation (AEC) on the original signal to obtain an echo-removed signal corresponding to the equipment a and the equipment B;
step S408: it is determined whether there is an echo in device a and, in the determined case, the effect of the estimated external noise generated by device B on the signal in device a is estimated and the estimated external noise is filtered from the de-echo signal of device a using an adaptive filtering technique, such as NLMS.
Step S410: and determining clean signals which do not contain noise and correspond to the equipment A and the equipment B respectively, comparing the signal amplitude/energy corresponding to the equipment A and the equipment B, and determining the party with larger signal amplitude/energy as equipment of a final response user.
It should be noted that, the beamforming may be used in a multi-microphone scene, where the sound source localization of the multi-microphone is relatively accurate, or where the beam effect of the multi-microphone is relatively good, such as the main lobe width is small.
Alternatively, if the user (equivalent to the target object in the embodiment of the present invention) and the device a are at the same angle, the estimated external noise will have a wake-up word at this time, resulting in signal cancellation. At this point, the microphone signal may be used to calculate the statistical energy. In the judging method, optionally, the microphone signal and the signal for estimating the external noise are respectively connected with a wake-up module, and when the latter scoring is close to or even larger than the former, the user and the device A can be considered to be at the same angle.
Mode two: separating a sound source; when the number of the microphones of the equipment is small, the beam forming effect is poor, for example, the main lobe is wide, the pickup range is large, and the estimated external noise contains part of wake-up words, so that signals are counteracted. In an alternative embodiment of the present invention, it is proposed to use the AUX-IVA method for sound source separation when the number of microphones is 2. The method designs complex matrix inversion calculation, the 2 x2 matrix inversion has an analytical solution, and the calculated amount is small. When the number of microphones is large, for example, 4×4 and 6×6, the calculation amount is large, and real-time calculation cannot be performed. In addition, the reverberation component has a great influence on sound source separation, and the WPE algorithm can be used for carrying out dereverberation operation first.
It can be appreciated that two paths of output can be obtained by using the AUX-IVA method, one path being noise and the other path being a clean signal, it cannot be known which path is noise or a clean signal due to the substitution problem. Therefore, the signals of the different channels need to be processed and then selected again.
Optionally, when the channel is selected, each path of signal can be connected with a wake-up module, and the path with high wake-up score is output as a clean signal, but the method has large calculated amount. The energy E1, E2 of the two channels can also be calculated; the device a calculates microphone signals mic, de-echoed signals aec, respectively, and estimates the energies corresponding to the echo signals spk, respectively. The method requires self-calibration, e.g. device A automatically plays a piece of music or otherwise, device B's original signal energy E B is only affected by A echo, to findWhen in use, the steps are as follows: (1) E mic and E spk are close, and when E aec is smaller, the played sound is judged to be small at the moment, and E1 and E2 are selected to be large; (2) Calculating alpha A→B*Espk, E1 and E2 to be close to alpha A→B*Espk may be considered as separate echoes.
When wake-up trigger exists, signal energy in a certain time is counted:
wherein, the suffix clean represents the signal after noise reduction, T represents the statistical period, X represents the frequency domain signal after stft, fh is the statistical maximum frequency band, and fl is the lowest frequency band. E clean of devices A and B give priority to responses.
When E spk is less than a certain threshold, the device may be considered not playing.
FIG. 4 is a computational flow diagram of an alternative sound source separation in accordance with an alternative embodiment of the present application; the method comprises the following steps:
Step S502: performing echo path calibration between the device A and the device B;
Step S504: determining original signals corresponding to audio sent by a target object and received by equipment A and equipment B respectively;
Step S506: echo cancellation is carried out on the original signal, and an echo removing signal corresponding to the equipment A and the equipment B is obtained;
Step S508: determining whether the equipment A has echo or not, performing sound source separation under the condition of determining the echo, and performing channel selection based on the energy determination mode to determine a corresponding echo signal;
Step S510: and determining clean signals which do not contain noise and correspond to the equipment A and the equipment B respectively, comparing the signal amplitude/energy corresponding to the equipment A and the equipment B, and determining the party with larger signal amplitude/energy as equipment of a final response user.
In summary, an alternative embodiment of the invention cooperates with information of a plurality of devices to formulate different external noise removal methods for different devices; the collaborative wake-up module selects an estimated clean signal; when the external noise is removed, the prior information can be obtained through the self-calibration module to assist subsequent signal processing, and the accuracy based on energy judgment is improved. The method comprises the steps of performing echo cancellation on equipment signals, denoising external noise signals and the like to obtain respective clean signals, and taking the amplitudes as discrimination criteria.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present application.
FIG. 5 is a block diagram of an alternative distributed voice wakeup device according to an embodiment of the present application; as shown in fig. 5, includes:
The obtaining module 62 is configured to obtain, when it is determined that a first group of devices receives a first wake-up audio, an original signal generated by each device in the first group of devices according to the first wake-up audio, so as to obtain a first group of original signals, and obtain feedback information of each device in the first group of devices for the first wake-up audio, so as to obtain a first group of feedback information, where the first group of devices are devices in the same network, and the feedback information is used to indicate whether a corresponding device in the first group of devices responds to the first wake-up audio wake-up interactive function, and the original signals are audio signals converted after the first wake-up audio received by the device;
A first determining module 64, configured to determine, from the first set of devices, devices that wake up the interactive function according to the first set of feedback information, to obtain a second set of devices altogether, and determine, from the first set of original signals, an original signal generated by the second set of devices, to obtain a second set of original signals altogether;
A second determining module 66, configured to determine a target noise cancellation mode from a preset set of noise cancellation modes according to the number of devices in the second set of devices;
a processing module 68, configured to perform noise cancellation processing on the second set of original signals using the target noise cancellation mode to obtain a set of noise reduction signals;
the control module 70 is configured to determine a target device in the second set of devices according to the set of noise reduction signals, and control the target device to play second audio corresponding to the first wake-up audio, and control devices in the second set of devices except for the second target device to mute.
Through the device, under the condition that the first group of equipment receives the first wake-up audio, acquiring an original signal generated by each equipment in the first group of equipment according to the first wake-up audio, obtaining a first group of original signals altogether, and acquiring feedback information of each equipment in the first group of equipment for the first wake-up audio, obtaining a first group of feedback information altogether, wherein the first group of equipment is equipment in the same network, the feedback information is used for indicating whether the corresponding equipment in the first group of equipment responds to the first wake-up audio wake-up interactive function or not, and the original signals are audio signals converted after the equipment receives the first wake-up audio; determining equipment which wakes up an interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment altogether, and determining an original signal generated by the second group of equipment in the first group of original signals to obtain a second group of original signals altogether; determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices; performing noise elimination processing on the second group of original signals by using a target noise elimination mode to obtain a group of noise reduction signals; according to a group of noise reduction signals, determining target equipment in a second group of equipment, controlling the target equipment to play second audio corresponding to the first wake-up audio, and controlling equipment except the second target equipment in the second group of equipment to mute, namely determining original signals of equipment which feeds back first wake-up audio signals sent by target objects in a plurality of corresponding equipment in a target area under a distributed processing scene, performing noise elimination processing on the original signals to obtain corresponding noise reduction signals, and determining final target equipment from the plurality of target equipment with feedback through the noise reduction signals; by adopting the technical scheme, the problems that in the related technology, the responding equipment cannot be accurately and rapidly determined from the plurality of equipment in a complex scene are solved, the target equipment which can finally interact with the target object can be determined from the plurality of equipment with the response in the complex scene, and the technical effect of the effectiveness of the scheme taking the amplitude/energy of the signal as the discrimination criterion is improved.
In an exemplary embodiment, the second determining module is further configured to determine, in a case where the number of devices in the second set of devices is greater than or equal to a first preset threshold, a first noise cancellation mode from a set of noise cancellation modes, where the first noise cancellation mode is used to filter, by a preset first adaptive filter, a self-noise signal from the second set of original signals, and filter, by a second adaptive filter, an external noise signal from the second set of original signals, where the second adaptive filter is a filter generated according to beamforming between the second set of devices; and under the condition that the number of the devices in the second group of devices is smaller than a first preset threshold value, determining a second noise elimination mode from one group of noise elimination modes, wherein the second noise elimination mode is used for filtering self-noise signals from the second group of original signals through a first adaptive filter and filtering external noise signals which are determined through sound source separation among the second group of devices from the second group of original signals.
In an exemplary embodiment, the second determining module further includes: the adding unit is used for counting first energy values of target noise reduction signals corresponding to the second group of equipment processed in the first noise signal elimination mode; determining a second energy value corresponding to an external noise signal in a second group of original signals filtered by the first noise signal elimination mode; determining that the target object is at the same angle with the second group of equipment under the condition that the difference value between the first energy value and the second energy value is lower than a second preset threshold value; adding estimated signals into a target second group of original signals processed in the first noise signal elimination mode, wherein the estimated signals are preset signals for balancing signal cancellation.
In an exemplary embodiment, the second determining module is further configured to determine location information of each device in the second set of devices in the target area, and obtain a set of location information altogether; determining a relative position between each two devices in the second set of devices through a set of position information; a second adaptive filter is determined for each device in the second set of devices based on the relative locations.
In an exemplary embodiment, the second determining module is further configured to sequentially enter the calibration mode by each device in the second set of devices, and determine a relative direction between each device and the other devices according to a set of location information; beam forming is carried out based on the relative direction, so that first estimated external noise and second estimated external noise between every two devices in the second group of devices are obtained; and under the condition that the first estimated external noise is the same as the second estimated external noise, determining the relative position between every two devices in the second group of devices.
In an exemplary embodiment, the second determining module further includes: the comparison unit is used for decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm; calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal; and determining the sub-signals approaching to the target energy value in the third energy value and the fourth energy value as echo signals corresponding to the second group of original signals, and determining the external noise signals to be filtered in the second group of original signals based on the echo signals.
In an exemplary embodiment, the control module is further configured to determine, when each device in the second set of devices has a noise reduction signal, a target amplitude of the noise reduction signal corresponding to each device in the second set of devices, to obtain multiple target amplitudes corresponding to the second set of devices; and sequentially arranging a plurality of target amplitudes from large to small, selecting the device with the largest target amplitude as a response device, and determining the response device as a target device from the second group of devices so as to interact with a target object which emits first wake-up audio.
An embodiment of the present application also provides a storage medium including a stored program, wherein the program executes the method of any one of the above.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store program code for performing the steps of:
S1, under the condition that a first group of devices receives first wake-up audio, acquiring an original signal generated by each device in the first group of devices according to the first wake-up audio, obtaining a first group of original signals altogether, and acquiring feedback information of each device in the first group of devices for the first wake-up audio, obtaining a first group of feedback information altogether, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding devices in the first group of devices respond to the first wake-up audio wake-up interactive function, and the original signals are audio signals converted after the first wake-up audio received by the devices;
S2, determining equipment which wakes up the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment, and determining an original signal generated by the second group of equipment in the first group of original signals to obtain a second group of original signals;
S3, determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices;
s4, performing noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
s5, determining target equipment in the second group of equipment according to the group of noise reduction signals, controlling the target equipment to play second audio corresponding to the first wake-up audio, and controlling equipment in the second group of equipment except for the second target equipment to mute.
An embodiment of the application also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
S1, under the condition that a first group of devices receives first wake-up audio, acquiring an original signal generated by each device in the first group of devices according to the first wake-up audio, obtaining a first group of original signals altogether, and acquiring feedback information of each device in the first group of devices for the first wake-up audio, obtaining a first group of feedback information altogether, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding devices in the first group of devices respond to the first wake-up audio wake-up interactive function, and the original signals are audio signals converted after the first wake-up audio received by the devices;
S2, determining equipment which wakes up the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment, and determining an original signal generated by the second group of equipment in the first group of original signals to obtain a second group of original signals;
S3, determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices;
s4, performing noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
s5, determining target equipment in the second group of equipment according to the group of noise reduction signals, controlling the target equipment to play second audio corresponding to the first wake-up audio, and controlling equipment in the second group of equipment except for the second target equipment to mute.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a removable hard disk, a magnetic disk, or an optical disk, etc., which can store program codes.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing system, they may be centralized in a single computing system, or distributed across a network of computing systems, and they may alternatively be implemented in program code that is executable by the computing system, such that they are stored in a memory system and, in some cases, executed in a different order than that shown or described, or they may be implemented as individual integrated circuit modules, or as individual integrated circuit modules. Thus, the present application is not limited to any specific combination of hardware and software.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (9)

1. A distributed voice wakeup method, comprising:
Under the condition that a first group of devices receives first wake-up audio, acquiring an original signal generated by each device in the first group of devices according to the first wake-up audio, obtaining a first group of original signals altogether, and acquiring feedback information of each device in the first group of devices for the first wake-up audio, obtaining a first group of feedback information altogether, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding devices in the first group of devices respond to the first wake-up audio wake-up interactive function, and the original signals are audio signals converted after the devices receive the first wake-up audio;
Determining equipment which wakes up the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment altogether, and determining original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals altogether;
Determining a target noise elimination mode from a preset group of noise elimination modes according to the number of the devices in the second group of devices;
Performing noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
Determining a target device in the second group of devices according to the group of noise reduction signals, controlling the target device to play second audio corresponding to the first wake-up audio, and controlling devices except the second target device in the second group of devices to mute;
Wherein the determining, according to the number of devices in the second set of devices, a target noise cancellation mode from a preset set of noise cancellation modes includes:
determining a first noise cancellation scheme from the set of noise cancellation schemes when the number of devices in the second set of devices is greater than or equal to a first preset threshold, wherein the first noise cancellation scheme is used for filtering self-noise signals from the second set of original signals through a preset first adaptive filter, and filtering external noise signals from the second set of original signals through a second adaptive filter, wherein the second adaptive filter is a filter generated according to beam forming among the second set of devices;
And under the condition that the number of the devices in the second group of devices is smaller than the first preset threshold value, determining a second noise elimination mode from the group of noise elimination modes, wherein the second noise elimination mode is used for filtering self-noise signals from the second group of original signals through the first adaptive filter and filtering external noise signals which are determined through sound source separation among the second group of devices from the second group of original signals.
2. The method of claim 1, wherein after said determining a first noise cancellation scheme from said set of noise cancellation schemes, said method further comprises:
counting a first energy value of a target noise reduction signal corresponding to the second group of equipment after being processed by the first noise signal elimination mode; determining a second energy value corresponding to an external noise signal in the second group of original signals filtered by the first noise signal elimination mode;
Determining that the target object is at the same angle with a second group of equipment under the condition that the difference value between the first energy value and the second energy value is lower than a second preset threshold value;
Adding estimated signals into a target second group of original signals processed in the first noise signal elimination mode, wherein the estimated signals are preset signals for balancing signal cancellation.
3. The method of claim 1, wherein, in the case where the number of devices in the second set of devices is greater than or equal to a first preset threshold, before determining a first noise cancellation scheme from the set of noise cancellation schemes, the method further comprises:
Determining the position information of each device in the second group of devices in the target area to obtain a group of position information;
Determining a relative position between each two devices in the second set of devices from the set of location information;
A second adaptive filter is determined for each device in the second set of devices based on the relative position.
4. A method according to claim 3, wherein determining the relative position between each two devices in the second set of devices from the set of position information comprises:
Each device in the second group of devices sequentially enters a calibration mode, and the relative direction between each device and other devices is determined according to the group of position information;
based on the relative direction, carrying out beam forming to obtain a first estimated external noise and a second estimated external noise between every two devices in the second group of devices;
and under the condition that the first estimated external noise is the same as the second estimated external noise, determining the relative position between every two devices in the second group of devices.
5. The method of claim 1, wherein, in the case where the number of devices in the second set of devices is less than the first preset threshold, before determining a second noise cancellation scheme from the set of noise cancellation schemes, the method further comprises:
decomposing the second group of original signals into a first sub-signal and a second sub-signal through a target algorithm;
Calculating a third energy value corresponding to the first sub-signal and a fourth energy value corresponding to the second sub-signal;
and determining the sub-signals approaching to the target energy value in the third energy value and the fourth energy value as echo signals corresponding to the second group of original signals, and determining external noise signals to be filtered in the second group of original signals based on the echo signals.
6. The method of claim 1, wherein determining a target device in the second set of devices based on the set of noise reduction signals comprises:
Under the condition that each device in the second group of devices has a noise reduction signal, determining a target amplitude value of the noise reduction signal corresponding to each device in the second group of devices to obtain a plurality of target amplitude values corresponding to the second group of devices;
And arranging the plurality of target amplitudes in sequence from large to small, selecting the equipment with the largest target amplitude as response equipment, and determining the response equipment from the second group of equipment as target equipment so as to interact with a target object sending the first wake-up audio.
7. A distributed voice wakeup apparatus, comprising:
The device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an original signal generated by each device in a first group of devices according to a first awakening audio under the condition that the first group of devices receive the first awakening audio, acquiring a first group of original signals together, and acquiring feedback information of each device in the first group of devices for the first awakening audio, acquiring a first group of feedback information together, wherein the first group of devices are devices in the same network, the feedback information is used for indicating whether the corresponding devices in the first group of devices respond to the first awakening audio awakening interactive function or not, and the original signals are audio signals converted after the first awakening audio received by the devices;
The first determining module is used for determining equipment which wakes up the interactive function from the first group of equipment according to the first group of feedback information to obtain a second group of equipment altogether, and determining original signals generated by the second group of equipment from the first group of original signals to obtain a second group of original signals altogether;
a second determining module, configured to determine a target noise cancellation mode from a preset set of noise cancellation modes according to the number of devices in the second set of devices;
The processing module is used for performing noise elimination processing on the second group of original signals by using the target noise elimination mode to obtain a group of noise reduction signals;
The control module is used for determining target equipment in the second group of equipment according to the group of noise reduction signals, controlling the target equipment to play second audio corresponding to the first wake-up audio and controlling equipment in the second group of equipment except for the second target equipment to mute;
The processing module is further configured to determine a first noise cancellation mode from the set of noise cancellation modes when the number of devices in the second set of devices is greater than or equal to a first preset threshold, where the first noise cancellation mode is configured to filter, by means of a preset first adaptive filter, a self-noise signal from the second set of original signals, and filter, by means of a second adaptive filter, an external noise signal from the second set of original signals, where the second adaptive filter is a filter generated according to beamforming between the second set of devices;
And under the condition that the number of the devices in the second group of devices is smaller than the first preset threshold value, determining a second noise elimination mode from the group of noise elimination modes, wherein the second noise elimination mode is used for filtering self-noise signals from the second group of original signals through the first adaptive filter and filtering external noise signals which are determined through sound source separation among the second group of devices from the second group of original signals.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program when run performs the method of any one of claims 1 to 6.
9. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of claims 1 to 6 by means of the computer program.
CN202210603410.XA 2022-05-30 2022-05-30 Distributed voice awakening method and device, storage medium and electronic device Active CN115171703B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210603410.XA CN115171703B (en) 2022-05-30 2022-05-30 Distributed voice awakening method and device, storage medium and electronic device
PCT/CN2023/085259 WO2023231552A1 (en) 2022-05-30 2023-03-30 Distributed voice wake-up method and apparatus, storage medium, and electronic apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210603410.XA CN115171703B (en) 2022-05-30 2022-05-30 Distributed voice awakening method and device, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN115171703A CN115171703A (en) 2022-10-11
CN115171703B true CN115171703B (en) 2024-05-24

Family

ID=83483084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210603410.XA Active CN115171703B (en) 2022-05-30 2022-05-30 Distributed voice awakening method and device, storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN115171703B (en)
WO (1) WO2023231552A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115171703B (en) * 2022-05-30 2024-05-24 青岛海尔科技有限公司 Distributed voice awakening method and device, storage medium and electronic device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101325604A (en) * 2008-07-21 2008-12-17 重庆邮电大学 Energy-saving method for distributed self-adaption industry wireless network
WO2018032954A1 (en) * 2016-08-16 2018-02-22 华为技术有限公司 Method and device for waking up wireless device
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN109669775A (en) * 2018-12-10 2019-04-23 平安科技(深圳)有限公司 Distributed task dispatching method, system and storage medium
CN110211599A (en) * 2019-06-03 2019-09-06 Oppo广东移动通信有限公司 Using awakening method, device, storage medium and electronic equipment
CN111696562A (en) * 2020-04-29 2020-09-22 华为技术有限公司 Voice wake-up method, device and storage medium
CN111880855A (en) * 2020-07-31 2020-11-03 宁波奥克斯电气股份有限公司 Equipment control method and distributed voice system
CN112185388A (en) * 2020-09-14 2021-01-05 北京小米松果电子有限公司 Speech recognition method, device, equipment and computer readable storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140006825A1 (en) * 2012-06-30 2014-01-02 David Shenhav Systems and methods to wake up a device from a power conservation state
WO2020218645A1 (en) * 2019-04-25 2020-10-29 엘지전자 주식회사 Method and device for searching for smart voice enabled device
CN110288997B (en) * 2019-07-22 2021-04-16 苏州思必驰信息科技有限公司 Device wake-up method and system for acoustic networking
US11211072B2 (en) * 2020-01-23 2021-12-28 International Business Machines Corporation Placing a voice response system into a forced sleep state
CN111640431B (en) * 2020-04-30 2023-10-27 海尔优家智能科技(北京)有限公司 Equipment response processing method and device
CN112634922A (en) * 2020-11-30 2021-04-09 星络智能科技有限公司 Voice signal processing method, apparatus and computer readable storage medium
CN113593548B (en) * 2021-06-29 2023-12-19 青岛海尔科技有限公司 Method and device for waking up intelligent equipment, storage medium and electronic device
CN114420094A (en) * 2021-12-13 2022-04-29 北京声智科技有限公司 Cross-device wake-up method, device, equipment and storage medium
CN115171703B (en) * 2022-05-30 2024-05-24 青岛海尔科技有限公司 Distributed voice awakening method and device, storage medium and electronic device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101325604A (en) * 2008-07-21 2008-12-17 重庆邮电大学 Energy-saving method for distributed self-adaption industry wireless network
WO2018032954A1 (en) * 2016-08-16 2018-02-22 华为技术有限公司 Method and device for waking up wireless device
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN109669775A (en) * 2018-12-10 2019-04-23 平安科技(深圳)有限公司 Distributed task dispatching method, system and storage medium
CN110211599A (en) * 2019-06-03 2019-09-06 Oppo广东移动通信有限公司 Using awakening method, device, storage medium and electronic equipment
CN111696562A (en) * 2020-04-29 2020-09-22 华为技术有限公司 Voice wake-up method, device and storage medium
CN111880855A (en) * 2020-07-31 2020-11-03 宁波奥克斯电气股份有限公司 Equipment control method and distributed voice system
CN112185388A (en) * 2020-09-14 2021-01-05 北京小米松果电子有限公司 Speech recognition method, device, equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于小波包的分布式光纤信号降噪方法;吴亚明;;激光杂志;20181025(第10期);全文 *

Also Published As

Publication number Publication date
WO2023231552A1 (en) 2023-12-07
CN115171703A (en) 2022-10-11

Similar Documents

Publication Publication Date Title
CN111161751A (en) Distributed microphone pickup system and method under complex scene
JP6196320B2 (en) Filter and method for infomed spatial filtering using multiple instantaneous arrival direction estimates
US8842851B2 (en) Audio source localization system and method
US11297178B2 (en) Method, apparatus, and computer-readable media utilizing residual echo estimate information to derive secondary echo reduction parameters
US8238569B2 (en) Method, medium, and apparatus for extracting target sound from mixed sound
US9336767B1 (en) Detecting device proximities
Rui et al. Time delay estimation in the presence of correlated noise and reverberation
CN107271963A (en) The method and apparatus and air conditioner of auditory localization
GB2495278A (en) Processing received signals from a range of receiving angles to reduce interference
US9378754B1 (en) Adaptive spatial classifier for multi-microphone systems
CN109087665A (en) A kind of nonlinear echo suppressing method
JP2001309483A (en) Sound pickup method and sound pickup device
CN115171703B (en) Distributed voice awakening method and device, storage medium and electronic device
US10863296B1 (en) Microphone failure detection and re-optimization
Schwartz et al. Nested generalized sidelobe canceller for joint dereverberation and noise reduction
TWI459381B (en) Speech enhancement method
CN107360497B (en) Calculation method and device for estimating reverberation component
JP2020504966A (en) Capture of distant sound
CN112997249B (en) Voice processing method, device, storage medium and electronic equipment
CN110913312B (en) Echo cancellation method and device
CN111445916A (en) Audio dereverberation method, device and storage medium in conference system
CN115410593A (en) Audio channel selection method, device, equipment and storage medium
CN110246516A (en) The processing method of small space echo signal in a kind of voice communication
US20210174820A1 (en) Signal processing apparatus, voice speech communication terminal, signal processing method, and signal processing program
WO2023246223A1 (en) Speech enhancement method and apparatus for distributed wake-up, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant