CN116486797A - Method, device, electronic equipment and medium for reducing false wake-up - Google Patents

Method, device, electronic equipment and medium for reducing false wake-up Download PDF

Info

Publication number
CN116486797A
CN116486797A CN202310532234.XA CN202310532234A CN116486797A CN 116486797 A CN116486797 A CN 116486797A CN 202310532234 A CN202310532234 A CN 202310532234A CN 116486797 A CN116486797 A CN 116486797A
Authority
CN
China
Prior art keywords
audio
wake
slice
audio slice
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310532234.XA
Other languages
Chinese (zh)
Inventor
周毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Original Assignee
Apollo Zhilian Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apollo Zhilian Beijing Technology Co Ltd filed Critical Apollo Zhilian Beijing Technology Co Ltd
Priority to CN202310532234.XA priority Critical patent/CN116486797A/en
Publication of CN116486797A publication Critical patent/CN116486797A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The disclosure provides a method, a device, electronic equipment and a medium for reducing false wake-up, relates to the technical field of audio processing, and particularly relates to the technical field of voice interaction and audio noise reduction. The specific implementation scheme comprises the following steps: and collecting the environmental audio, and reducing noise of the collected environmental audio. And then, aiming at each audio slice in the noise-reduced environmental audio, carrying out voice activity detection on the audio slice to obtain the energy value of the audio slice. It is then determined whether the energy value of the audio slice is below a specified threshold. If yes, replacing the audio slice with a mute segment, and sending the replaced audio slice to a wake-up engine of the voice assistant. Thereby enabling a reduction of false wake-up of the voice assistant caused by noise residuals.

Description

Method, device, electronic equipment and medium for reducing false wake-up
Technical Field
The present disclosure relates to the field of audio processing technologies, and in particular, to the field of voice interaction and audio noise reduction technologies.
Background
With the development of artificial intelligence and internet of things, the variety of services available to voice assistants is becoming more and more abundant. Such as inquiring weather, making a call, switching on and off a lamp, adjusting the temperature of the air conditioner, etc. When the voice assistant is used, the user needs to wake up the voice assistant before using the voice assistant, and can send a voice instruction to the voice assistant, so that the voice assistant is waken up. For example, the voice instructions are: "Small X, small X" or "Heixx".
Disclosure of Invention
The disclosure provides a method, a device, electronic equipment and a medium for reducing false wake-up.
In a first aspect of an embodiment of the present disclosure, a method for reducing false wake-up is provided, including:
collecting environmental audio, and reducing noise of the collected environmental audio;
for each audio slice in the noise-reduced environmental audio, performing voice activity detection on the audio slice to obtain an energy value of the audio slice;
determining whether an energy value of the audio slice is below a specified threshold;
if yes, replacing the audio slice with a mute segment, and sending the replaced audio slice to a wake-up engine of the voice assistant.
In a second aspect of the embodiments of the present disclosure, there is provided an apparatus for reducing false wake-up, including:
the noise reduction module is used for collecting the environmental audio and reducing noise of the collected environmental audio;
the detection module is used for detecting the voice activity of each audio slice in the environment audio after the noise reduction by the noise reduction module to obtain the energy value of the audio slice;
a determining module for determining whether the energy value of the audio slice detected by the detecting module is lower than a specified threshold;
and the sending module is used for replacing the audio slice with a mute segment and sending the replaced audio slice to a wake-up engine of the voice assistant if the determination result of the determining module is yes.
In a third aspect of the disclosed embodiments, there is provided an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.
A fourth aspect of embodiments of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the first aspects.
A fifth aspect of embodiments of the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the first aspects.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a method for reducing false wake-up provided by an embodiment of the present disclosure;
FIG. 2 is a graph of a spectrum of unsubstituted audio and mute audio provided by an embodiment of the present disclosure;
FIG. 3 is a flow chart of another method for reducing false wake-up provided by an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a device for reducing false wake-up according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing a method of reducing false wake-up in accordance with an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In order to reduce false wake-up of the voice assistant, at present, after the environmental sound is collected, noise reduction processing is performed on the collected environmental sound, then the noise-reduced environmental sound is sent to a wake-up engine of the voice assistant, and the wake-up engine judges whether the voice assistant needs to be wake-up or not based on the received noise-reduced environmental sound.
Although this approach reduces false wake-up by noise, there may be problems of insufficient echo cancellation, insufficient noise suppression, and insufficient isolation between the voice zones due to the noise reduction process, so that sound residues remain in the environment sound after the noise reduction process, and these sound residues easily cause false wake-up by the voice assistant.
In order to reduce false wake-up of a voice assistant caused by noise residues, the embodiment of the disclosure provides a method for reducing false wake-up, which is applied to electronic equipment, wherein the voice assistant is installed in the electronic equipment, and the electronic equipment can be equipment with voice interaction capability such as a vehicle, a mobile phone or an intelligent interaction terminal. As shown in fig. 1, the method comprises the steps of:
s101, collecting environmental audio, and reducing noise of the collected environmental audio.
Environmental audio may be captured by a Microphone (mic) configured in the electronic device. For example, the environmental audio may include human voice, noise, echo, and/or speaker play from text to speech (TextToSpeech, TTS), and the like.
And then, adopting a preset noise reduction algorithm to reduce noise of the collected environmental audio. For example, the preset noise reduction algorithm may be a linear filtering algorithm, a spectral subtraction algorithm, a noise reduction algorithm based on a machine learning model, or the like, which is not particularly limited in the embodiments of the present disclosure.
S102, aiming at each audio slice in the noise-reduced environmental audio, performing voice activity detection on the audio slice to obtain an energy value of the audio slice.
The environmental audio collected by the electronic device in S101 is composed of a plurality of audio frames, each audio frame includes a plurality of audio slices, and the length of each audio slice is a preset length. For example, a single audio frame is 160 milliseconds (ms) in length and a single audio slice is 32ms in length, i.e., a single audio frame includes 5 audio slices.
The energy value of the audio slice may be obtained by performing voice activity detection on each audio slice in the noise-reduced environmental audio in order of acquisition time from first to second using a voice activity detection (Voice Activity Detection, VAD) algorithm. For example, an average, a sum of squares, a weighted sum, or the like of the energy values of the respective sampling points within the audio slice may be calculated, and the calculation result may be taken as the energy value of the audio slice.
Alternatively, the amplitude of each sample point within the audio slice may be detected using a VAD algorithm, where the amplitude can represent the energy value of the audio slice. Alternatively, the energy value of the audio slice may be determined by other VAD algorithms, which are not particularly limited by the embodiments of the present disclosure.
S103, determining whether the energy value of the audio slice is lower than a specified threshold. If yes, S104 is executed.
It may be directly determined whether the energy value of the audio slice is below a specified threshold. Or, it may be determined whether the amplitudes of the sampling points are all within a preset amplitude range, for example, the preset amplitude range is [ -200,200], if so, it is determined that the energy value of the audio slice is below a specified threshold; if not, it is determined that the energy value of the audio slice is not below a specified threshold. The specified threshold used by the electronic device may be pre-configured, for example, by a manufacturer before shipping, or issued to the electronic device by the cloud, etc.
S104, replacing the audio slice with a mute segment, and sending the replaced audio slice to a wake-up engine of the voice assistant.
In the case where the energy value of the audio slice is lower than the specified threshold, it is indicated that the volume of the audio slice is small, that is, the amplitude of the audio signal is small, and thus the sound included is largely noise remaining after noise reduction. It is thus possible to replace the audio slice with a silence segment of equal length and to send the replaced audio slice to the wake-up engine, i.e. send the silence segment to the wake-up engine.
Since electronic devices are typically configured with multiple microphones, each microphone independently captures environmental audio. According to the embodiment of the disclosure, noise reduction and VAD detection can be performed on the environmental audio collected by each microphone, and the audio slice with the energy value lower than the specified threshold is replaced by a mute segment, so that the subsequent wake-up engine can time align the audio from different microphones, and whether a voice assistant needs to be awakened or not is judged based on the aligned audio.
The wake-up engine generally stores a keep-alive mechanism, and if the wake-up engine continuously does not receive audio for a period of time, problems such as downtime or automatic shutdown may occur. If the energy value of the audio slice is lower than the specified threshold value, no audio is sent to the wake-up engine, and error conditions such as downtime or automatic shutdown of the wake-up engine may occur, which affect the normal use of the voice assistant. Therefore, the embodiment of the disclosure replaces the audio slice with the mute segment and then sends the mute segment to the wake-up engine, so that the method and the device not only can adapt to the keep-alive mechanism of the wake-up engine, but also can avoid false wake-up of voice assistant caused by residual noise in the audio slice.
The wake-up engine is installed in the electronic device, and for convenience of description, a functional module in the electronic device for executing the method for reducing false wake-up provided by the embodiment of the disclosure is referred to as a false-reducing module, that is, the false-reducing module sends the replaced audio slice to the wake-up engine.
Through the method, the embodiment of the disclosure can detect voice activity of each audio slice of the environmental audio after noise reduction, and when the energy value of the audio slice is smaller than the designated threshold value, the audio slice is smaller in volume, and the high probability is residual noise after noise reduction, so that the audio slice is replaced by a mute segment and then sent to a wake-up engine. As can be seen, the embodiments of the present disclosure can reduce the situation that the noise remaining after noise reduction is sent to the wake engine, thus reducing false wake of the voice assistant caused by the noise remaining.
In addition, compared to the mode of noise reduction based on models, that is, for each sound to be noise reduced, for example, wind noise and tire friction noise when a vehicle is running, a large number of audio samples are recorded for the sound in advance, models are trained, and each sound in the collected environmental sounds is sequentially suppressed by using various models after training. However, this case by case approach requires training models for each type of sound requiring noise reduction in advance, and because the actual acoustic environment is complex, the pre-recorded sound samples cannot cover all types of noise, so that it is difficult to suppress each type of noise in the environment, that is, noise residues still exist after noise reduction by this approach, and thus the residual noise may also cause false wake-up of the voice assistant.
The embodiment of the disclosure does not distinguish what sound is in the noise-reduced environmental audio, does not need to respectively inhibit noise for each sound, and directly replaces the whole audio slice with a mute segment, so that various noise residues in the audio slice can be eliminated, and the probability of false wake-up of a voice assistant caused by the noise residues is reduced.
Referring to fig. 2, the upper curve in fig. 2 represents the spectrum of one audio slice after noise reduction, and the lower straight line represents the spectrum of the mute section. As can be seen from fig. 2, although the frequency of the noise-reduced audio slice is close to 0, it is not completely equal to 0, and the energy in the corresponding slice is also not 0, i.e. there is a sound residue. However, the frequency of the silence segment is equal to 0 and the corresponding energy is also 0, i.e. no sound residues are present. Therefore, the mute segment is used for replacing the audio slice with the energy value lower than the specified threshold, so that false awakening of a voice assistant caused by various sound residues in the audio slice can be avoided. Fig. 2 is merely an example of a mute segment and a noise-reduced audio slice, and the spectrum of an actual noise-reduced audio slice is not limited thereto.
In the embodiment of the present disclosure, referring to fig. 3, after determining whether the energy value of the audio slice is lower than the specified threshold in S103 above, on the other hand, if the energy value of the audio slice is higher than or equal to the specified threshold, the audio slice may be transmitted to the wake engine.
When the energy value of the audio slice is higher than or equal to a specified threshold, it is indicated that the volume of the audio slice is large, i.e. the amplitude of the audio signal is large, whereas since the audio slice has been subjected to a noise reduction process, it is indicated that most of the noise has been removed, and thus that there is a high probability of human voice being present in the audio slice. Thus to reduce the impact on the wake engine to wake the voice assistant normally, embodiments of the present disclosure send the audio slice directly into the wake engine.
In the embodiment of the present disclosure, referring to fig. 3, in the case where it is determined in S103 that the energy value of the audio slice is lower than the specified threshold, before replacing the audio slice with the silence segment in S104, the error reduction module in the electronic device may further perform the following steps:
s301, determining an audio slice with a last energy value higher than a specified threshold and a time interval between the audio slices.
For example, a time difference between a start time of an audio slice having a last energy value higher than a specified threshold and a start time of the audio slice may be calculated as the time interval. Alternatively, the time difference between the end time of the last audio slice with an energy value above the specified threshold and the start time of the audio slice may be calculated as the time interval.
Or the time interval of two audio clips may also be calculated by other means, as the embodiments of the present disclosure are not particularly limited.
S302, judging whether the time interval is larger than a preset interval. If yes, executing the step S104; if not, then S303 is performed.
If the time interval calculated in S301 is greater than the preset interval, it indicates that, after the last audio slice with an energy value higher than the specified threshold, the audio slices with continuous energy values lower than the specified threshold appear, and no voice exists in all the audio slices with high probability, so that the voice of the person is finished with the current high probability, and the currently determined audio slice can be replaced by a mute segment. For example, the preset interval is 1.5 seconds.
S303, the audio slice is sent to a wake-up engine.
The energy of the collected audio may be different during the continuous speaking of the person. For example: the difference in volume of the nose sound generated when the person speaks normally and the difference in distance between the person and the microphone and the difference in occlusion between the person and the microphone may result in different energy of the collected audio in the process of the person speaking continuously.
Because the last energy value is higher than the audio slice with the specified threshold, the voice exists with a large probability, if the time interval calculated in S301 is not greater than the preset interval, the person does not need to finish speaking at present, that is, the voice exists in the current audio slice, and in order to reduce the possibility of replacing the voice audio slice with a voice audio slice, the voice audio slice can be directly sent to the wake-up engine without replacing the voice audio slice.
By the method, after the last audio slice with the energy value higher than the specified threshold, the audio slice is replaced by the mute segment only when the energy of the audio slice is detected to be lower than the specified threshold at the preset interval, so that the situation that the audio slice including the human voice is replaced by the mute segment by mistake is reduced.
In the embodiment of the disclosure, the audio slice with the energy value higher than the specified threshold after noise reduction can be sent to the wake engine, and although the probability of existence of human voice in the audio slice with the energy value higher than the specified threshold after noise reduction is higher, the possibility of not containing human voice still exists. Whereas an audio slice that does not contain a human voice and has an energy value above a specified threshold may cause false wake up of the voice assistant. The error-reduction module in the electronic device may also detect if the voice assistant has been awakened by error.
The detection mode can be realized as follows: and acquiring an audio fragment triggering a wake-up engine to wake up the voice assistant, detecting whether the voice exists in the audio fragment, if not, determining that the wake-up is false wake-up, and sending false wake-up information to the cloud. The false wake-up information is used for indicating that the electronic equipment is awakened up by mistake.
Wherein the audio clip comprises at least one denoised audio slice, i.e. the audio clip may be a continuous denoised audio slice, or a continuous denoised audio slice and silence segment. In the embodiment of the disclosure, the error reduction module may send the audio slice to the wake engine each time, which may be the audio slice itself after noise reduction or may be an alternative silence segment, and the wake engine may take one or more received audio slices as one audio segment, for example, take the audio slice received every 3 seconds as one audio segment, and detect whether the audio segment can be used to wake up the voice assistant every time one audio segment is obtained. Upon determining that the audio clip can be used to wake up the voice assistant, the voice assistant is awakened and the audio clip is sent to the error reduction module.
Optionally, the wake engine may detect whether a specified wake word exists within the audio segment, and if so, determine that the audio segment can be used to wake up a voice assistant; otherwise, it is determined that the audio clip cannot be used to wake up the voice assistant.
Or the wake engine may also detect if the audio clip is available to wake the voice assistant in other ways, as the embodiments of the present disclosure are not specifically limited.
After the error-reducing module receives the audio fragment which is sent by the wake-up engine and triggers the wake-up voice assistant, whether the voice exists in the audio fragment or not can be detected through a pre-trained voice recognition model. If not, the wake-up voice assistant is not caused by the voice instruction sent by the person, so that the wake-up is determined to be false wake-up, and false wake-up information is sent to the cloud.
Otherwise, if the voice frequency exists in the audio frequency fragment, the wake-up probability is high because of voice instructions sent by people, and therefore the voice frequency fragment belongs to normal wake-up.
Through the method, whether the voice assistant wakes up by mistake or not can be detected, and when the voice assistant wakes up by mistake, the error wake-up information is sent to the cloud end, so that the cloud end can monitor and analyze the error wake-up condition of each device, and the error wake-up condition of the device can be optimized in time conveniently and subsequently.
In the embodiment of the present disclosure, the false wake-up information sent by the electronic device to the cloud includes: and designating a threshold value and hardware parameters of target equipment where the voice assistant is located, wherein the target equipment is the electronic equipment to which the method for reducing false wake-up provided by the embodiment of the disclosure is applied.
Wherein the hardware parameters include: the model number of the device, the number of microphones, the spacing between the microphones, etc. Taking an electronic device as an example of a vehicle, the model of the vehicle may be: sport utility vehicles (sport utility vehicle, SUV), sedans or coaches, and the like.
Also, the model of the device may exist in multiple levels. Taking the device as an example of a vehicle, the model of the vehicle may be divided into two stages, for example, the model may be: SUV-aa manufacturer, SUV-bb manufacturer, or car-aa manufacturer, etc.
The cloud can classify the devices with the same hardware parameters or in the same range into one type through the hardware parameters of the devices. Taking equipment as an example of a vehicle, the cloud can divide vehicles with the same vehicle type, the same number of microphones and the same microphone spacing into the same equipment type. Or, the cloud can divide the vehicles with the same vehicle type, the same number of microphones in the same number range and the same distance range into the same equipment type.
Optionally, the false wake-up information may further include a name of a noise reduction algorithm used when the target device performs noise reduction on the environmental audio. The cloud can divide the equipment with the same hardware parameters or the same range and the noise reduction algorithm used by the cloud into the same equipment type. For example, the noise reduction type includes: noise reduction algorithm, linear filtering algorithm, spectral subtraction and the like based on a machine learning model.
After classifying the devices, the cloud can count the false wake-up times of each device in the current detection period according to the devices with the same device type in each detection period, so that the device with the lowest false wake-up times is counted, and a specified threshold used by the device is obtained. The specified threshold used by the device may then be synchronized to other devices of the same device type as the device.
Correspondingly, after the electronic device sends the false wake-up information to the cloud, the electronic device can also receive the statistical threshold sent by the cloud and update the designated threshold used by the target device to the statistical threshold. The statistical threshold is a specified threshold used by the cloud for each device of the target device type, wherein the device with the least false wake-up times counted in the current detection period is used by the specified threshold, and the target device type is the type to which the hardware parameter of the target device belongs.
Because the specified threshold value used by the equipment with the least false wake-up times has the best effect of reducing false wake-up, and the hardware parameters of the equipment with the same equipment type are the same or similar, the specified threshold value is applied to other equipment with the same equipment type, and the better effect of reducing false wake-up can be obtained with high probability. Namely, the embodiment of the disclosure can flexibly adjust the designated threshold value used by the electronic equipment through the false wake-up times of different equipment of the same equipment type, thereby further reducing the false wake-up probability.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the environmental audio accord with the regulations of related laws and regulations, and the public order is not violated.
Based on the same inventive concept, corresponding to the above method embodiment, the disclosed embodiment further provides a device for reducing false wake-up, as shown in fig. 4, where the device includes: a noise reduction module 401, a detection module 402, a determination module 403 and a transmission module 404;
the noise reduction module 401 is configured to collect environmental audio, and reduce noise of the collected environmental audio;
the detection module 402 is configured to detect, for each audio slice in the environmental audio after the noise reduction by the noise reduction module 401, a voice activity of the audio slice, so as to obtain an energy value of the audio slice;
a determining module 403, configured to determine whether the energy value of the audio slice detected by the detecting module 402 is lower than a specified threshold;
and a sending module 404, configured to replace the audio slice with a silence segment if the determination result of the determining module 403 is yes, and send the replaced audio slice to a wake-up engine of the voice assistant.
In some embodiments of the present disclosure, the apparatus may further include:
a determining module 403, configured to determine, before replacing the audio slice with a silence segment, a time interval between an audio slice having a last energy value above a specified threshold and the audio slice;
the judging module is used for judging whether the time interval is larger than a preset interval or not;
the calling module is used for calling the sending module to execute the step of replacing the audio slice with a mute segment if the judgment result of the judgment module is yes;
the sending module 404 is further configured to send the audio slice to the wake-up engine if the determination result of the determining module is negative.
In some embodiments of the present disclosure,
the sending module 404 is further configured to send the audio slice to the wake engine if the energy value of the audio slice is greater than or equal to the specified threshold after determining whether the energy value of the audio slice is less than the specified threshold.
In some embodiments of the present disclosure, the apparatus may further include:
the acquisition module is used for acquiring an audio fragment triggering the wake-up engine to wake up the voice assistant, wherein the audio fragment comprises at least one noise-reduced audio slice;
the detection module 402 is further configured to detect whether a voice exists in the audio segment;
the sending module 404 is further configured to determine that the wake-up is a false wake-up if the detection result of the detecting module 402 is no, and send false wake-up information to the cloud.
In some embodiments of the present disclosure, the false wake-up information includes: specifying a threshold and a hardware parameter of a target device where the voice assistant is located; the apparatus may further comprise
The receiving module is used for receiving the statistical threshold value sent by the cloud after the false wake-up information is sent to the cloud; the statistical threshold is a specified threshold used by the cloud for each device of the target device type, wherein the device with the least false wake-up times is counted in the current detection period, and the target device type is the type to which the hardware parameter of the target device belongs;
and the updating module is used for updating the designated threshold value used by the target equipment into the statistical threshold value received by the receiving module.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic device 500 may also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as a method of reducing false wakeups. For example, in some embodiments, the method of reducing false wake may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by the computing unit 501, one or more steps of the method of reducing false wake-up described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the method of reducing false wakeups by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (13)

1. A method of reducing false wake-up, comprising:
collecting environmental audio, and reducing noise of the collected environmental audio;
for each audio slice in the noise-reduced environmental audio, performing voice activity detection on the audio slice to obtain an energy value of the audio slice;
determining whether an energy value of the audio slice is below a specified threshold;
if yes, replacing the audio slice with a mute segment, and sending the replaced audio slice to a wake-up engine of the voice assistant.
2. The method of claim 1, prior to said replacing the audio slice with a silence segment, said method further comprising:
determining an audio slice for which a last energy value is above the specified threshold and a time interval between the audio slices;
judging whether the time interval is larger than a preset interval or not;
if yes, executing the step of replacing the audio slice with a mute segment;
if not, the audio slice is sent to the wake engine.
3. The method of claim 1, after said determining whether the energy value of the audio slice is below a specified threshold, said method further comprising:
and if the energy value of the audio slice is higher than or equal to the specified threshold, sending the audio slice to the wake engine.
4. A method according to claim 3, the method further comprising:
acquiring an audio fragment triggering the awakening engine to awaken the voice assistant, wherein the audio fragment comprises at least one noise-reduced audio slice;
detecting whether human voice exists in the audio fragment;
if not, determining that the wake-up is the false wake-up, and sending false wake-up information to the cloud.
5. The method of claim 4, the false wake-up information comprising: the specified threshold and the hardware parameters of the target equipment where the voice assistant is located; after the sending of the false wake-up information to the cloud, the method further includes:
receiving a statistical threshold value sent by the cloud; the statistical threshold is a specified threshold used by the cloud for each device of a target device type, wherein the device with the least false wake-up times is counted in a current detection period, and the target device type is a type to which a hardware parameter of the target device belongs;
updating a specified threshold value used by the target device to the statistical threshold value.
6. An apparatus for reducing false wake-up, comprising:
the noise reduction module is used for collecting the environmental audio and reducing noise of the collected environmental audio;
the detection module is used for detecting the voice activity of each audio slice in the environment audio after the noise reduction by the noise reduction module to obtain the energy value of the audio slice;
a determining module for determining whether the energy value of the audio slice detected by the detecting module is lower than a specified threshold;
and the sending module is used for replacing the audio slice with a mute segment and sending the replaced audio slice to a wake-up engine of the voice assistant if the determination result of the determining module is yes.
7. The apparatus of claim 6, the apparatus further comprising:
the determining module is further configured to determine, before the replacing the audio slice with the silence segment, a time interval between the audio slice having a last energy value higher than the specified threshold and the audio slice;
the judging module is used for judging whether the time interval is larger than a preset interval or not;
the calling module is used for calling the sending module to execute the step of replacing the audio slice with a mute segment if the judging result of the judging module is yes;
and the sending module is further configured to send the audio slice to the wake-up engine if the judgment result of the judging module is negative.
8. The device according to claim 6,
the sending module is further configured to send the audio slice to the wake engine if the energy value of the audio slice is higher than or equal to the specified threshold after the determining whether the energy value of the audio slice is lower than the specified threshold.
9. The apparatus of claim 8, the apparatus further comprising:
the acquisition module is used for acquiring an audio fragment triggering the awakening engine to awaken the voice assistant, wherein the audio fragment comprises at least one noise-reduced audio slice;
the detection module is also used for detecting whether the voice exists in the audio clip;
and the sending module is further used for determining that the wake-up is the false wake-up if the detection result of the detection module is negative, and sending false wake-up information to the cloud.
10. The apparatus of claim 9, the false wake-up information comprising: the specified threshold and the hardware parameters of the target equipment where the voice assistant is located; the apparatus further comprises:
the receiving module is used for receiving the statistical threshold value sent by the cloud after the false wake-up information is sent to the cloud; the statistical threshold is a specified threshold used by the cloud for each device of a target device type, wherein the device with the least false wake-up times is counted in a current detection period, and the target device type is a type to which a hardware parameter of the target device belongs;
and the updating module is used for updating the specified threshold value used by the target equipment into the statistical threshold value received by the receiving module.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-5.
CN202310532234.XA 2023-05-11 2023-05-11 Method, device, electronic equipment and medium for reducing false wake-up Pending CN116486797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310532234.XA CN116486797A (en) 2023-05-11 2023-05-11 Method, device, electronic equipment and medium for reducing false wake-up

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310532234.XA CN116486797A (en) 2023-05-11 2023-05-11 Method, device, electronic equipment and medium for reducing false wake-up

Publications (1)

Publication Number Publication Date
CN116486797A true CN116486797A (en) 2023-07-25

Family

ID=87225106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310532234.XA Pending CN116486797A (en) 2023-05-11 2023-05-11 Method, device, electronic equipment and medium for reducing false wake-up

Country Status (1)

Country Link
CN (1) CN116486797A (en)

Similar Documents

Publication Publication Date Title
CN111210021B (en) Audio signal processing method, model training method and related device
CN109697984B (en) Method for reducing self-awakening of intelligent equipment
US20180174574A1 (en) Methods and systems for reducing false alarms in keyword detection
KR100631608B1 (en) Voice discrimination method
US20200202890A1 (en) Voice activity detection method
CN111754982A (en) Noise elimination method and device for voice call, electronic equipment and storage medium
US9558758B1 (en) User feedback on microphone placement
US20180158462A1 (en) Speaker identification
CN104575509A (en) Voice enhancement processing method and device
CN112581960A (en) Voice wake-up method and device, electronic equipment and readable storage medium
EP3010017A1 (en) Method and apparatus for separating speech data from background data in audio communication
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
US20150325252A1 (en) Method and device for eliminating noise, and mobile terminal
EP4057277A1 (en) Method and apparatus for noise reduction, electronic device, and storage medium
CN112562735B (en) Voice detection method, device, equipment and storage medium
CN112669867B (en) Debugging method and device of noise elimination algorithm and electronic equipment
CN113329372A (en) Method, apparatus, device, medium and product for vehicle-mounted call
CN110556128B (en) Voice activity detection method and device and computer readable storage medium
WO2024017110A1 (en) Voice noise reduction method, model training method, apparatus, device, medium, and product
CN113330513A (en) Voice information processing method and device
CN111739515B (en) Speech recognition method, equipment, electronic equipment, server and related system
CN116486797A (en) Method, device, electronic equipment and medium for reducing false wake-up
CN116705033A (en) System on chip for wireless intelligent audio equipment and wireless processing method
CN114333912B (en) Voice activation detection method, device, electronic equipment and storage medium
US20180108345A1 (en) Device and method for audio frame processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination