CN114333793A

CN114333793A - Filter model updating method, device, equipment, medium and computer program product

Info

Publication number: CN114333793A
Application number: CN202111554958.1A
Authority: CN
Inventors: 吴俊�; 李良斌; 陈孝良
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-04-12

Abstract

The application discloses a filtering model updating method, a filtering model updating device, a filtering model updating equipment, a filtering model updating medium and a computer program product. The method comprises the steps of obtaining a wake-up score of a first voice frame signal of a wake-up word, wherein the wake-up score is used for representing the possibility that the first voice frame signal contains the wake-up word; calculating a wake-up trigger value according to a wake-up trigger threshold and a wake-up average value, wherein the wake-up average value is a mean value calculated according to wake-up distribution of historical voice frame signals of wake-up words in a historical wake-up voice set; when the wake-up score is smaller than the wake-up trigger threshold, determining a reduction factor of an update step length of the filter model according to the magnitude relation among the wake-up trigger threshold, the wake-up mean, the wake-up score and the wake-up trigger value; and updating the filtering model according to the updating step length and the reduction factor. According to the embodiment of the application, the updating accuracy of the filtering model can be improved.

Description

Filter model updating method, device, equipment, medium and computer program product

Technical Field

The present application relates to the field of speech enhancement technologies, and in particular, to a method, an apparatus, a device, a medium, and a computer program product for updating a filtering model.

Background

With the advent of the mobile internet and the artificial intelligence era, voice interaction has gained unprecedented growth in recent years, wherein voice wakeup technology, as a special voice recognition technology, becomes an important component for interaction between users and machines.

At present, after several rounds of optimization and iteration, in an actual application scenario, before judging whether a current voice signal can wake up a device, a voice wake-up technology filters invalid and interfering voices such as noise and the like of the current voice frame signal through a filtering model. The filtering model can automatically update and iterate while filtering the voice frame signal, but the problem of poor awakening effect caused by non-ideal filtering result of the filtering model still exists at present.

Therefore, how to improve the accuracy of the filter model update becomes a technical problem which needs to be solved urgently at present.

Disclosure of Invention

The method, the device, the equipment, the medium and the computer program product for updating the filtering model provided by the embodiment of the application aim to solve the problem of reducing the damage of the filter coefficient updating to the voice and improve the updating accuracy of the filtering model.

In a first aspect, an embodiment of the present application provides a filtering model updating method, including:

acquiring a wake-up score of a first voice frame signal of a wake-up word, wherein the wake-up score is used for representing the possibility that the first voice frame signal contains the wake-up word;

calculating a wake-up trigger value according to a wake-up trigger threshold and a wake-up average value, wherein the wake-up average value is a mean value calculated according to wake-up distribution of historical voice frame signals of wake-up words in a historical wake-up voice set;

when the wake-up score is smaller than the wake-up trigger threshold, determining a reduction factor of an update step length of the filter model according to the magnitude relation among the wake-up trigger threshold, the wake-up mean, the wake-up score and the wake-up trigger value;

and updating the filtering model according to the updating step length and the reduction factor.

In some embodiments, when the wake-up score is smaller than the wake-up trigger threshold, determining a reduction factor of an update step size of a filter model according to a magnitude relationship between the wake-up trigger threshold, the wake-up mean, the wake-up score, and the wake-up trigger value may include:

when the awakening trigger value is smaller than or equal to the awakening time division, determining a first preset value as the reduction factor;

when the awakening average value is smaller than or equal to the awakening score and the awakening score is smaller than the awakening trigger value, determining a second preset value as the reduction factor;

determining a third preset value as the reduction factor when the wake-up score is smaller than the wake-up average value;

the first preset value is smaller than the second preset value, and the second preset value is smaller than the third preset value.

In some embodiments, before obtaining the wake-up score of the first speech frame signal of the wake-up word, the method further includes:

aiming at the historical voice frame signal of each frame of awakening words in the historical awakening voice set, acquiring a target historical awakening score of a target historical voice frame signal containing the awakening words;

and counting the average value of the historical awakening distribution of the target to obtain the awakening average value.

In some embodiments, after updating the filtering model according to the update step size and the reduction factor, the method further includes:

filtering a second voice frame signal of the awakening word according to the updated filtering model to obtain a filtered second voice frame signal, wherein the second voice frame signal of the awakening word is a next frame signal of the first voice frame signal of the awakening word;

and sending the filtered second voice frame signal to a wake-up device so that the wake-up device wakes up the target equipment according to the magnitude relation between the wake-up of the filtered second voice frame signal and the wake-up trigger threshold.

In some embodiments, the method further comprises:

not updating the filtering model when the wake-up score is greater than or equal to the wake-up trigger threshold.

In a second aspect, an embodiment of the present application provides a filtering model updating apparatus, where the apparatus includes:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a wake-up score of a first voice frame signal of a wake-up word, and the wake-up score is used for indicating the possibility that the first voice frame signal contains the wake-up word;

the calculating module is used for calculating a wake-up trigger value according to a wake-up trigger threshold value and a wake-up score, wherein the wake-up score is a calculated average value according to wake-up score distribution of historical voice frame signals of wake-up words in a historical wake-up voice set;

the determining module is used for determining a reduction factor of the updating step length of the filtering model according to the magnitude relation among the awakening trigger threshold, the awakening mean value, the awakening score and the awakening trigger value when the awakening score is smaller than the awakening trigger threshold;

and the updating module is used for updating the filtering model according to the updating step length and the reduction factor.

In some embodiments, the determining module may include:

the first determining submodule is used for determining a first preset value as the reduction factor when the awakening trigger value is smaller than or equal to the awakening time division;

a second determining submodule, configured to determine a second preset value as the reduction factor when the wake-up average is less than or equal to the wake-up score and the wake-up score is less than the wake-up trigger value;

a third determining submodule, configured to determine a third preset value as the reduction factor when the wake-up score is smaller than the wake-up average;

In a third aspect, an embodiment of the present application provides a filtering model updating device, where the device includes: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the filter model update method described in any embodiment of the present application.

In a fourth aspect, embodiments of the present application provide a computer storage medium having computer program instructions stored thereon, which when executed by a processor implement a filter model updating method as described in any of the embodiments of the present application.

In a fifth aspect, the present application provides a computer program product, and when executed by a processor of an electronic device, the instructions of the computer program product cause the electronic device to perform a filtering model updating method as in any embodiment of the present application.

According to the method, the device, the equipment, the medium and the computer program product for updating the filtering model, the awakening trigger value is calculated according to the awakening trigger threshold value and the awakening mean value. And determining a reduction factor of the updating step length of the filtering model according to the size relation of the awakening trigger threshold, the awakening mean value, the awakening score and the awakening trigger value, and finally updating the filtering model according to the updating step length and the reduction factor. The awakening average value is an average possibility that the voice frame signal representing the awakening word can awaken the equipment according to the awakening distribution and the calculated average value of the historical voice frame signal of the awakening word in the historical awakening voice set. And awakening a trigger threshold value, wherein the voice frame signal indicating the awakening word can awaken the equipment. The wake-up trigger value calculated by combining the wake-up trigger threshold and the wake-up average value may represent the average proximity of the voice frame signal of the wake-up word to the event that the device can be awakened. Thus, the reduction factor is a factor that comprehensively considers the possibility that the first speech frame signal of the wake-up word contains the wake-up word, the proximity to the event that can wake up the device, the average possibility that the speech frame signal of the wake-up word can wake up the device, and the wake-up trigger threshold. That is, the reduction factor is a factor determined according to how close the first speech frame signal of the wake-up word is to the event that the device can be awakened. And then according to the filtering model updated by the updating step length and the reduction factor, after different approaching degrees of the event that the distance of the first voice frame signal of the awakening word can awaken the equipment are considered, the filtering model is updated in different degrees, the error update of the filtering model to the voice frame signal segment containing the awakening word is effectively avoided, the damage of the filter coefficient update to the voice frame signal is reduced, and the updating accuracy of the filtering model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a filtering model updating method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of another filtering model updating method provided in the embodiment of the present application;

fig. 3 is a schematic flowchart of another filtering model updating method provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of another filtering model updating method provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a filtering model updating apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram of a filtering model updating apparatus according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Before explaining technical solutions provided by the embodiments of the present application, in order to facilitate understanding of the embodiments of the present application, specific terms are first introduced in the present application.

Wake-up trigger threshold: upon voice wake-up, detection of a wake-up word triggers the wake-up of the device. The wake-up word is distributed in a plurality of frames of voice frame signals, that is, the voice frame signal corresponding to the wake-up word lasts for a period of time. The wake-up trigger is completed in a moment, but each frame of the speech frame signal has a corresponding possibility of containing the wake-up word during the whole duration of the wake-up word. The accumulation of the likelihood of having a wake-up word in the speech frame signal over a period of time may form an event that wakes up the device. The value indicating this event occurs is the wake trigger threshold.

As an example, assuming that the duration of a segment of wakeup word is F frames, gradually accumulating a value F in the whole time, and when F is greater than or equal to the wakeup trigger threshold T, waking up the device; if F is less than T, not waking up the device.

The current filtering model can filter invalid and interference voice such as noise and the like of the current voice frame signal, and can automatically update and iterate when the voice signal is filtered. Generally, the filter model is updated based on whether the device is awake, i.e., updated without waking, and not updated if the device is awake.

However, the inventor researches and discovers that, when the current voice frame signal filtered by the filtering model is a voice frame signal containing a wakeup word and the device is not yet woken up, the filtering model still needs to be updated, that is, the filtering model still needs to be updated in a 'cutting' manner, and the updating causes great damage to the voice frame signal containing the wakeup word when the voice frame signal is filtered.

Therefore, the inventors propose to calculate the wake-up trigger value according to the wake-up trigger threshold and the wake-up average. And determining a reduction factor of the updating step length of the filtering model according to the size relation of the awakening trigger threshold, the awakening mean value, the awakening score and the awakening trigger value, and finally updating the filtering model according to the updating step length and the reduction factor. After the signal distance of the current voice frame of the awakening word is considered to be capable of awakening the equipment according to different approaching degrees of the event, the filtering model is updated according to different degrees, the filtering model is prevented from being updated in a 'one-time' mode, and the updating accuracy of the filtering model is improved.

The filtering model updating method can be executed by a filtering model updating device.

The following describes an embodiment of the present application with reference to the drawings, and first describes a filtering model updating method provided by the embodiment of the present application.

Fig. 1 shows a schematic flow chart of a filtering model updating method provided in an embodiment of the present application, where the method includes:

s110, acquiring a wake-up score of a first voice frame signal of a wake-up word, wherein the wake-up score is used for indicating the possibility that the first voice frame signal contains the wake-up word.

And S120, calculating a wake-up trigger value according to a wake-up trigger threshold and a wake-up average value, wherein the wake-up average value is an average value calculated according to wake-up distribution of historical voice frame signals of wake-up words in a historical wake-up voice set.

S130, when the wake-up score is smaller than the wake-up trigger threshold, determining a reduction factor of the update step length of the filter model according to the magnitude relation among the wake-up trigger threshold, the wake-up mean, the wake-up score and the wake-up trigger value.

And S140, updating the filtering model according to the updating step length and the reduction factor.

In the embodiment of the application, the wake-up trigger value is calculated according to the wake-up trigger threshold and the wake-up averaging. And determining a reduction factor of the updating step length of the filtering model according to the size relation of the awakening trigger threshold, the awakening mean value, the awakening score and the awakening trigger value, and finally updating the filtering model according to the updating step length and the reduction factor. The awakening average value is an average possibility that the voice frame signal representing the awakening word can awaken the equipment according to the awakening distribution and the calculated average value of the historical voice frame signal of the awakening word in the historical awakening voice set. And awakening a trigger threshold value, wherein the voice frame signal indicating the awakening word can awaken the equipment. The wake-up trigger value calculated by combining the wake-up trigger threshold and the wake-up average value may represent the average proximity of the voice frame signal of the wake-up word to the event that the device can be awakened. Thus, the reduction factor is a factor that comprehensively considers the possibility that the first speech frame signal of the wake-up word contains the wake-up word, the proximity to the event that can wake up the device, the average possibility that the speech frame signal of the wake-up word can wake up the device, and the wake-up trigger threshold. That is, the reduction factor is a factor determined according to how close the first speech frame signal of the wake-up word is to the event that the device can be awakened. And then according to the filtering model updated by the updating step length and the reduction factor, after different approaching degrees of the event that the distance of the first voice frame signal of the awakening word can awaken the equipment are considered, the filtering model is updated in different degrees, the error update of the filtering model to the voice frame signal segment containing the awakening word is effectively avoided, the damage of the filter coefficient update to the voice frame signal is reduced, and the updating accuracy of the filtering model is improved.

In some embodiments, in S110, a voice frame signal of each frame of the wake-up word corresponds to a wake-up score. The first speech frame signal may be a current speech frame signal of a wake-up word, the wake-up score corresponding to the current speech frame signal is an accumulated score calculated by a wake-up device independent of the filtering model updating device in combination with the wake-up score of the speech frame signal of the previous frame and the likelihood of the current speech frame signal itself containing the wake-up word, and the filtering model updating device receives the wake-up score of the current speech frame signal calculated by the wake-up device. The embodiment of the present application does not limit the accumulation mode, and may be any mathematical calculation method.

The wake-up score is used to indicate the likelihood that the current speech frame signal contains a wake-up word. The wake-up score is a score related to the likelihood that the speech frame signal of the previous frame contains a wake-up word, and is an accumulated score.

As an example, the voice frame signal of the wake-up word has 1000 frames, wherein the 200 th frame to the 900 th frame contain the wake-up word. The current speech frame signal is the 700 th frame, the probabilities that the speech frame signals of the 1 st to 699 th speech frame signals contain the awakening words respectively correspond to j _1 to j _699, and the probability that the 700 th speech frame signal contains the awakening words is j _ 700. Then, the wake-up score of frame 700 may be the result of any data calculation method including all values of j _1 to j _ 700. For example, the wake-up score for the 700 th frame may be j _1+ … + j _699+ j _ 700.

In some embodiments, the calculation of the wake-up score of the speech frame signal of the wake-up word has different calculation methods for different speech wake-up technologies, and a specific calculation method thereof is the prior art, which is not described in detail in the embodiments of the present application. Voice wake techniques include, but are not limited to, voice wake based on confidence, voice wake based on recognition engine, voice wake based on garbage word (filer), and voice wake based on deep learning, among others.

In order to improve the accuracy of updating the filtering model, in some embodiments, as shown in fig. 2, another flow chart of the filtering model updating method provided in an embodiment of the present application further includes, before acquiring the wakeup component of the first speech frame signal of the wakeup word, S210-S220:

s210, aiming at the historical voice frame signal of each frame of the awakening words in the historical awakening voice set, obtaining a target historical awakening score of the target historical voice frame signal containing the awakening words.

In some embodiments, in S210, the target historical wake-up score is a wake-up score of a voice frame signal containing a wake-up word per frame in the historical wake-up voice set. And the filtering model updating device receives the awakening score of the frame of historical voice frame signal calculated by the awakening device aiming at each frame of historical voice frame signal containing the awakening word in the historical awakening voice set.

S220, counting the average value of the historical awakening distribution of the target to obtain the awakening average value.

In some embodiments, in S220, when the historical wake-up speech set is not updated, the wake-up mean used by the filtering model updating apparatus is a mean of statistical target historical wake-up distributions of the previous filtering model updating apparatus, and the mean is the wake-up mean; when the historical wake-up speech set is updated, the filtering model updating apparatus repeats step 210 and step 220 to re-calculate the mean value of the target historical wake-up distribution.

In some embodiments, the specific method for the filtering model updating apparatus to count the mean of the target historical wake-up distribution is not limited in the embodiments of the present application. The awakening distribution conditions of each frame of voice frame signals containing awakening words in different awakening words and different historical awakening voice sets are different, and the mean value counting mode is different.

As an example, the filtering model updating apparatus may calculate the average value of the statistical target historical wake-up score distribution by calculating a weighted average value of wake-up scores of voice frame signals containing wake-up words in each frame in the historical wake-up voice set, and using the weighted average value as the wake-up average value.

In the embodiment of the application, the awakening mean value is a mean value obtained by counting the target historical awakening distribution of each frame of target historical voice frame signals containing awakening words in the historical awakening voice set. The average value of probability value distribution of most historical voice frame signals containing the awakening words is considered, and the average possibility of awakening equipment by the voice frame signals of the awakening words is more accurately represented. And obtaining the corresponding awakening mean value aiming at different historical voice sets and different awakening words. Therefore, the method can adapt to different awakening words and historical awakening voice sets, and represents the awakening mean value of the average possibility of awakening equipment by the voice frame signal of the awakening words, the determined awakening trigger value can be more accurate, and the different degrees of the event that the first voice frame signal of the awakening words is far away from the awakening equipment can be more accurately judged according to the awakening trigger value, the awakening mean value, the awakening trigger threshold value and the awakening score, so that the accuracy of determining the reduction factor is improved, and the accuracy of updating the filtering model is further improved.

In some embodiments, in S120, the filtering model updating apparatus calculates a value that may indicate an average proximity of a voice frame signal of the wake-up word to an event that the device can be woken up, which is referred to as a wake-up trigger value, according to the wake-up trigger threshold and the wake-up average, and the embodiment of the present invention does not limit the specific calculation of the wake-up trigger value.

As an example, the filtering model updating apparatus may calculate the wake-up trigger value by equation 1, where T is the wake-up trigger threshold, T is the wake-up trigger value, m is the wake-up mean value, and r is the scaling factor.

T ═ m + r ═ (T-m) formula 1

The value of r is adjusted according to different voice wake-up technologies and actual conditions, and the embodiment of the present application is not particularly limited.

In some embodiments, the value of r may be determined by the difference between the awakening and awakening averages of the historical speech frame signals containing the awakening words in the historical awakening speech set. For example, after calculating the sum of the awakening scores of the historical speech frame signals containing the awakening words in the historical awakening speech set, the filtering model updating device calculates the difference value between the awakening scores and the awakening score average value, and when the difference value is larger than or equal to a preset threshold value, r is a first preset value; and when the difference value is smaller than the preset threshold value, taking a second preset value as r, wherein the first preset value is larger than the second preset value. The embodiment of the application does not limit the values of the preset threshold value and the first threshold value and the second threshold value, and the values are adjusted according to different voice awakening technologies and actual conditions.

To improve the accuracy of determining the reduction factor, in some embodiments, r may be 0.8.

In some embodiments, the wake-up trigger threshold is a value calculated according to a historical voice frame signal of a wake-up word in the historical wake-up voice set, and is used to indicate that the voice frame signal of the wake-up word can wake up the device. The calculation of the wake-up trigger threshold has different calculation methods for different voice wake-up technologies, but the specific calculation method is the prior art, and details are not described in the embodiments of the present application. Voice wake techniques include, but are not limited to, confidence-based voice wake, recognition engine-based voice wake, garbage word-based voice wake, deep learning-based voice wake, and neural network-based voice wake, among others. For example, taking voice wakeup based on deep learning as an example, the wakeup trigger threshold may be a confidence obtained after a posterior probability of each historical voice frame signal containing a wakeup word in a historical wakeup voice set output by the statistical neural network.

In some embodiments, in S130, when the wake-up score of the current voice frame signal is smaller than the wake-up trigger threshold, the filtering model updating apparatus determines the reduction factors of different update step sizes by using different magnitude relationships among the wake-up trigger threshold, the wake-up mean, the wake-up score and the wake-up trigger value to represent different proximity degrees of the distance of the current voice frame signal of the wake-up word from the event that the device can be woken up.

In some embodiments, the reduction factor ranges from (0, 1).

In some embodiments, the filter model may include an adaptive filter, and the update of the filter coefficients of the adaptive filter is related to the filter coefficients and the update step size of the previous frame. The adaptive filter update step size has different calculation methods for different adaptive filtering algorithms, and the specific calculation method is the prior art, which is not described in detail in the embodiments of the present application. Adaptive filtering algorithms include, but are not limited to, Least Mean Square (LMS) algorithms, Normalized Least Mean Square (NLMS) algorithms, Affinity Propagation (AP) algorithms, derivatives of LMS algorithms, derivatives of NLMS algorithms, and segmented Block Frequency Domain Adaptive filtering (PBFDAF) based algorithms.

In order to improve the accuracy of updating the filtering model, in some embodiments, when the wake-up score is smaller than the wake-up trigger threshold, determining a reduction factor of an update step size of the filtering model according to a magnitude relationship between the wake-up trigger threshold, the wake-up mean, the wake-up score, and the wake-up trigger value may include:

and when the awakening trigger value is smaller than or equal to the awakening score, determining a first preset numerical value as the reduction factor.

In some embodiments, with continued reference to the above embodiments, since the wake-up trigger value is calculated in relation to the wake-up trigger threshold, when the wake-up trigger value is less than or equal to the wake-up time, the current speech frame signal of the wake-up word may be considered to be in a greater proximity to the event that the device can be woken up, and the wake-up ratio of the current speech frame signal of the wake-up word may be considered to be closer to the wake-up trigger threshold, almost triggering wake-up. Or, the current speech frame signal of the wake-up word may contain the wake-up word, that is, the current speech frame signal of the wake-up word may be in the wake-up word segment in the speech signal of the whole wake-up word, and at this time, the filtering model updating apparatus uses a smaller first preset value as the reduction factor.

In some embodiments, the first predetermined value may be 0.1 in order to control the update of the filter model more accurately.

It should be noted that, in the embodiment of the present application, the value of the first preset value is not specifically limited, and may be adjusted in response according to an actual situation.

And when the awakening average value is smaller than or equal to the awakening score and the awakening score is smaller than the awakening trigger value, determining a second preset value as the reduction factor.

In some embodiments, with continued reference to the above embodiments, when the wake-up average is less than or equal to the wake-up score and the wake-up score is less than the wake-up trigger value, it may be considered that the proximity of the current speech frame signal of the wake-up word to the event that the device can be woken up is not greater than the wake-up trigger value, but the wake-up ratio of the current speech frame signal of the wake-up word is closer to the wake-up trigger value, that is, closer to the average proximity of the event that the device can be woken up. It can be considered that the current speech frame signal of the wake-up word is now in a moderate state with respect to the possibility to wake-up the device. Or, the current voice frame signal of the wake-up word may be as fast as the voice frame signal containing the wake-up word, and is located at the front section of the wake-up word segment in the voice signal of the whole wake-up word. At this time, the filter model updating means uses a second predetermined value larger than the first predetermined value as the reduction factor.

In some embodiments, the second predetermined value may be 0.5 in order to control the update of the filter model more accurately.

It should be noted that, in the embodiment of the present application, the value of the second preset value is not specifically limited, and may be adjusted in response according to an actual situation.

And when the wake-up score is smaller than the wake-up average value, determining a third preset value as the reduction factor.

In some embodiments, with continued reference to the above embodiments, when the wake-up score is less than the wake-up mean, that is, the likelihood that the wake-up score of the current speech frame signal of the wake-up word is less than the average of the wake-up word's speech frame signals waking up the device. The device is not woken up basically, and at the moment, the filtering model updating device takes a third preset value larger than the second preset value as a reduction factor.

In some embodiments, the third predetermined value may be 1 in order to control the update of the filter model more accurately.

In the embodiment of the application, the reduction factor is determined according to different size relationships among the wake-up trigger threshold, the wake-up mean, the wake-up score and the wake-up trigger value, and different degrees of closeness of the voice frame signal distance of the wake-up word to the event that the device can be woken up are considered. The first preset value is smaller than the second preset value, the second preset value is smaller than the third preset value, the reduction degrees of the reduction factors with different degrees of closeness of the event that the equipment can be awakened according to the voice frame signal distance of the awakening word are different, and then the updating strength of the filtering model can be controlled, so that different updating strengths of the filtering model can be controlled according to different degrees of closeness of the event that the voice frame signal distance of the awakening word can be awakened, and the accuracy of updating the filtering model is improved.

In some embodiments, in S140, the filter model updating device updates the filter coefficient of the filter model according to the scaled-down update step obtained by multiplying the update step by the scaling factor to update the filter model.

As an example, with an adaptive filter and an LMS algorithm, updating the filter model is to update the filter coefficients of the adaptive filter. Specifically, the formula can be shown as formula 2.

W (n +1) ═ W (n) + k μ x (n) e (n) formula 2

Wherein, W (n +1) is the updated filter coefficient, W (n) is the filter coefficient when filtering the current speech frame signal, k is the reduction factor, μ is the update step size, x (n) is the input current noise signal, and e (n) is the instantaneous input error power.

In order to improve the accuracy of updating the filtering model, in some embodiments, as shown in fig. 3, an embodiment of the present application provides a schematic flow chart of a filtering model updating method, where the method further includes S310:

S310, when the wake-up score is larger than or equal to the wake-up trigger threshold, the filtering model is not updated.

S110 to S140 are the same as S110 to S140 in the above embodiments, and for the sake of brevity, will not be described in detail here.

In some practical examples, in S310, when the wake-up score of the wake-up word for the current speech frame signal is greater than or equal to the wake-up trigger threshold, it means that the accumulated wake-up score of the current speech frame signal in the whole speech signal of the wake-up word indicates that the wake-up word can wake up the device. That is, the filtering model for filtering the current speech frame signal can exactly make the awakening word in the speech frame signal, and filter out the invalid and interfering signals. The filtering model updating device receives the identification signal that the filtering model for filtering the current voice frame signal is updated and does not need to be updated. The filter model updating device responds to the identification signal and does not update the filter coefficient of the filter model any more.

In some embodiments, when filtering the speech frame signal of the current frame by using the filtering models of the previous frames, the filtering model updating device may determine whether the filtering coefficient of the current filtering model is reasonable after receiving the identification signal. If the current filter coefficient is the filter coefficient updated after the voice frame signal containing the awakening word is filtered, the filter model updating device can make the current filter coefficient useless according to the length d of the awakening word in the received identification signal, and the filter coefficient d frame away from the current filter coefficient is used as the last updated filter coefficient.

In the embodiment of the application, when the wake-up score is equal to the wake-up trigger threshold, the reduction factor is determined according to the wake-up trigger threshold, the wake-up mean, the wake-up score and the wake-up trigger value, and then the filtering model is updated according to the update step length and the reduction factor, so that the updating degree of the filtering model is controlled according to different approaching degrees of an event that the distance of a first voice frame signal of a wake-up word can wake up equipment. When the awakening score is larger than or equal to the awakening trigger threshold, the filtering model is not updated, the condition that the voice frame signal of the awakening word awakens the equipment and the different conditions that the voice frame signal of the awakening word cannot awaken the equipment are considered, and the accuracy of updating the filtering model is improved.

In the existing scheme, when filtering interference signals such as invalid signals and noise in voice frame signals of wake-up words, in order to improve the recognition rate of a wake-up algorithm, a delay filter algorithm is adopted to filter the voice frame signals. Generally speaking, an adaptive filter is adopted to select two paths of signals from a microphone array for adaptive filtering, and delay an updated filter coefficient h (x), wherein the delay time depends on the empirical statistics of the length of a wake-up word, and if the length of the wake-up word is d, a filter model corresponding to the filter coefficient h (x-d) of the x-d frame is used for filtering the x-th frame speech frame signal, and the filter coefficient is not updated when equipment is awakened.

However, the inventor has found that, in the existing scheme, although the filtering model corresponding to the filter coefficient of the previous x-d frame is used to filter the voice frame signal of the x-th frame, when the voice frame signal reaches the tail end of the wakeup word, the used filter coefficient is not affected by the wakeup word, and thus no signal cancellation occurs. However, in an actual application scenario, since the updating of the filter coefficient and the wake-up device are separately and independently completed in the filter model updating apparatus and the wake-up apparatus, when the filter model updating apparatus updates the filter model, a current speech frame signal may not be triggered to wake up in the wake-up apparatus, and the current speech frame signal may be a speech frame signal containing a wake-up word, but the filter model updating apparatus at this time still updates the filter model. That is to say, in the prior art scheme, the filter model updating apparatus still updates the wake-up word segment in the whole voice message containing the wake-up word, so that when the wake-up or recognition is continued, there is a strong damage to the voice frame signal, which affects the recognition rate of the voice frame signal, and finally results in poor interactive experience.

Therefore, in order to improve the recognition rate of the speech frame signal of the wake-up word, in some embodiments, as shown in fig. 4, a flow diagram of another filtering model updating method provided in the embodiments of the present application is provided. After updating the filtering model according to the update step and the reduction factor, S410-S420 are further included:

s410, filtering a second voice frame signal of the awakening word according to the updated filtering model to obtain a filtered second voice frame signal, wherein the second voice frame signal of the awakening word is a next frame signal of the first voice frame signal of the awakening word.

In some embodiments, in S410, the filtering model updating means filters the second speech frame signal of the wake-up word by the updated filtering model.

As an example, it can be understood that, when the filtering model corresponding to the filtering coefficient of the previous x-d frame is used for filtering the x-th frame speech frame signal, the filtering model updated by the filtering model updating means is the filtering model corresponding to the x-d +1 frame, thereby completing the filtering of the x + 1-th frame speech frame signal.

And S420, sending the filtered second voice frame signal to a wake-up device, so that the wake-up device sends the filtered second voice frame signal to the wake-up device, and the wake-up device wakes up the target device according to the magnitude relation between the wake-up part of the filtered second voice frame signal and the wake-up trigger threshold.

In some implementations, in S420, the filtering model updating apparatus sends the filtered second speech frame signal to the waking device, and the waking device calculates a wake-up score of the filtered second speech frame signal, compares the wake-up score with a wake-up trigger threshold, and wakes up the device corresponding to the wake-up word when the wake-up score is greater than or equal to the wake-up trigger threshold; and when the awakening score is lower than the awakening trigger threshold, the awakening device does not awaken the equipment corresponding to the awakening word.

In the embodiment of the present application, the next frame signal of the first speech frame signal of the wakeup word, that is, the second speech frame signal of the wakeup word, is filtered by the updated filtering model. And the awakening device awakens the target equipment according to the size relation between the awakening of the filtered second voice frame signal and the awakening triggering threshold value. The updated filtering model is the filtering model which considers the different closeness degrees of the event that the distance of the first voice frame signal of the awakening word can awaken the equipment, and is effective, the filtering damage to the second voice frame signal is smaller, so that the recognition rate of the voice frame signal is improved, and the interaction experience of a user is further increased.

Based on the filtering model updating method provided by any one of the above embodiments, the present application also provides an embodiment of a filtering model updating apparatus. With particular reference to FIG. 6

Fig. 5 shows a schematic diagram of a filtering model updating apparatus according to an embodiment of the present application. As shown in fig. 5, the apparatus may include:

an obtaining module 510, configured to obtain a wakeup score of a first speech frame signal of a wakeup word, where the wakeup score is used to indicate a possibility that the first speech frame signal contains the wakeup word;

a calculating module 520, configured to calculate a wake-up trigger value according to a wake-up trigger threshold and a wake-up score, where the wake-up score is a calculated average according to a wake-up score distribution of historical speech frame signals of wake-up words in a historical wake-up speech set;

a determining module 530, configured to determine, when the wake-up score is smaller than the wake-up trigger threshold, a reduction factor of an update step size of the filtering model according to a size relationship between the wake-up trigger threshold, the wake-up mean, the wake-up score, and the wake-up trigger value;

an updating module 540, configured to update the filtering model according to the update step and the reduction factor.

The device in the embodiment of the application calculates the awakening trigger value according to the awakening trigger threshold value and the awakening mean value. And determining a reduction factor of the updating step length of the filtering model according to the size relation of the awakening trigger threshold, the awakening mean value, the awakening score and the awakening trigger value, and finally updating the filtering model according to the updating step length and the reduction factor. The awakening average value is an average possibility that the voice frame signal representing the awakening word can awaken the equipment according to the awakening distribution and the calculated average value of the historical voice frame signal of the awakening word in the historical awakening voice set. And awakening a trigger threshold value, wherein the voice frame signal indicating the awakening word can awaken the equipment. The wake-up trigger value calculated by combining the wake-up trigger threshold and the wake-up average value may represent the average proximity of the voice frame signal of the wake-up word to the event that the device can be awakened. Thus, the reduction factor is a factor that comprehensively considers the possibility that the first speech frame signal of the wake-up word contains the wake-up word, the proximity to the event that can wake up the device, the average possibility that the speech frame signal of the wake-up word can wake up the device, and the wake-up trigger threshold. That is, the reduction factor is a factor determined according to how close the first speech frame signal of the wake-up word is to the event that the device can be awakened. And then according to the filtering model updated by the updating step length and the reduction factor, after different approaching degrees of the event that the distance of the first voice frame signal of the awakening word can awaken the equipment are considered, the filtering model is updated in different degrees, the error update of the filtering model to the voice frame signal segment containing the awakening word is effectively avoided, the damage of the filter coefficient update to the voice frame signal is reduced, and the updating accuracy of the filtering model is improved.

In some embodiments, to improve the accuracy of updating the filtering model, the determining module 530 may include:

the first determining submodule is used for determining a first preset numerical value as the reduction factor when the awakening trigger value is smaller than or equal to the awakening score;

In the device in the embodiment of the application, the reduction factor is determined according to different size relationships among the wake-up trigger threshold, the wake-up mean, the wake-up score and the wake-up trigger value, and different closeness degrees of the voice frame signal distance of the wake-up word to the event that the device can be woken up are considered. The first preset value is smaller than the second preset value, the second preset value is smaller than the third preset value, the reduction degrees of the reduction factors with different degrees of closeness of the event that the equipment can be awakened according to the voice frame signal distance of the awakening word are different, and then the updating strength of the filtering model can be controlled, so that different updating strengths of the filtering model can be controlled according to different degrees of closeness of the event that the voice frame signal distance of the awakening word can be awakened, and the accuracy of updating the filtering model is improved.

In some embodiments, to improve the accuracy of updating the filtering model, before acquiring the wake-up score of the first speech frame signal of the wake-up word, the acquiring module 510 further includes:

and the acquisition submodule is used for acquiring a target historical awakening score of a target historical voice frame signal containing the awakening word aiming at the historical voice frame signal of each frame of the awakening word in the historical awakening voice set.

And the statistic submodule is used for counting the average value of the historical awakening distribution of the target to obtain the awakening average value.

According to the device in the embodiment of the application, the awakening mean value is obtained by counting the target historical awakening distribution of each frame of target historical voice frame signals containing awakening words in the historical awakening voice set. The average value of probability value distribution of most historical voice frame signals containing the awakening words is considered, and the average possibility of awakening equipment by the voice frame signals of the awakening words is more accurately represented. And obtaining the corresponding awakening mean value aiming at different historical voice sets and different awakening words. Therefore, the method can adapt to different awakening words and historical awakening voice sets, and represents the awakening mean value of the average possibility of awakening equipment by the voice frame signal of the awakening words, the determined awakening trigger value can be more accurate, and the different degrees of the event that the first voice frame signal of the awakening words is far away from the awakening equipment can be more accurately judged according to the awakening trigger value, the awakening mean value, the awakening trigger threshold value and the awakening score, so that the accuracy of determining the reduction factor is improved, and the accuracy of updating the filtering model is further improved.

In some embodiments, after updating the filtering model according to the update step and the reduction factor in order to improve the recognition rate of the speech frame signal of the wake-up word, the updating module 540 further includes:

and the filtering submodule is used for filtering a second voice frame signal of the awakening word according to the updated filtering model to obtain a filtered second voice frame signal, wherein the second voice frame signal of the awakening word is a next frame signal of the first voice frame signal of the awakening word.

And the sending submodule is used for sending the filtered second voice frame signal to a wake-up device so that the wake-up device sends the filtered second voice frame signal to the wake-up device, and the wake-up device wakes up the target equipment according to the size relation between the wake-up part of the filtered second voice frame signal and the wake-up trigger threshold value.

The device in the embodiment of the application filters a next frame signal of the first speech frame signal of the wake-up word, namely a second speech frame signal of the wake-up word, by using the updated filtering model. And the awakening device awakens the target equipment according to the size relation between the awakening of the filtered second voice frame signal and the awakening triggering threshold value. The updated filtering model is the filtering model which considers the different closeness degrees of the event that the distance of the first voice frame signal of the awakening word can awaken the equipment, and is effective, the filtering damage to the second voice frame signal is smaller, so that the recognition rate of the voice frame signal is improved, and the interaction experience of a user is further increased.

In addition, in combination with the filtering model updating method in the foregoing embodiment, as shown in fig. 6, an embodiment of the present application may provide a filtering model updating apparatus, which may include a processor 610 and a memory 620 storing computer program instructions.

Specifically, the processor 610 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 620 may include mass storage for data or instructions. By way of example, and not limitation, memory 620 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 620 may include removable or non-removable (or fixed) media, where appropriate. The memory 620 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 620 is a non-volatile solid-state memory. In certain embodiments, memory 620 comprises Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.

The processor 610 may implement any of the filter model updating methods in the above embodiments by reading and executing computer program instructions stored in the memory 620.

In one example, the electronic device can also include a communication interface 630 and a bus 640. As shown in fig. 6, the processor 610, the memory 620, and the communication interface 630 are connected via a bus 640 to complete communication therebetween.

The communication interface 630 is mainly used for implementing communication between modules, devices, units and/or devices in this embodiment.

The bus 640 includes hardware, software, or both to couple the components of the electronic device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 640 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The filter model update processor, when executing the computer program instructions, implements the filter model update method of any of the above embodiments.

In addition, in combination with the above-mentioned filter model updating method, embodiments of the present application may provide a computer storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the computer program instructions implement the filter model updating method according to any of the above-mentioned embodiments.

It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A method for updating a filter model, comprising:

2. The method according to claim 1, wherein when the wake-up score is smaller than the wake-up trigger threshold, determining a reduction factor of an update step size of the filter model according to a magnitude relationship between the wake-up trigger threshold, the wake-up mean, the wake-up score and the wake-up trigger value includes:

3. The method of claim 1, further comprising, before obtaining the wakeup score of the first speech frame signal of the wakeup word:

4. The method of claim 1, further comprising, after updating the filtering model according to the update step size and the reduction factor:

5. The method according to claim 1 or 2, characterized in that the method further comprises:

6. A filtering model updating apparatus, comprising:

7. The apparatus according to claim 6, wherein the determining module specifically includes:

8. A filtering model updating apparatus, characterized in that the apparatus comprises: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the filter model updating method of any of claims 1-5.

9. A computer storage medium having computer program instructions stored thereon which, when executed by a processor, implement the filter model updating method of any one of claims 1-5.

10. A computer program product, wherein instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the filter model updating method according to any one of claims 1-5.