CN110211599B

CN110211599B - Application awakening method and device, storage medium and electronic equipment

Info

Publication number: CN110211599B
Application number: CN201910478400.6A
Authority: CN
Inventors: 陈喆; 刘耀勇; 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-06-03
Filing date: 2019-06-03
Publication date: 2021-07-16
Anticipated expiration: 2039-06-03
Also published as: CN110211599A

Abstract

The embodiment of the application awakening method and device, a storage medium and electronic equipment are disclosed, wherein the electronic equipment comprises two microphones, and two paths of audio data can be acquired through the two microphones, and background audio data played during audio acquisition is acquired; then, echo cancellation processing is carried out on the two paths of audio data according to the background audio data so as to eliminate self-noise; then, performing beam forming processing on the two paths of audio data after echo cancellation to eliminate external noise and obtain enhanced audio data; and then, performing two-stage verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and awakening the voice interaction application when the two-stage verification is passed, thereby realizing the voice interaction between the electronic equipment and the user. Therefore, the method and the device can eliminate the interference of self noise and external noise, ensure the verification accuracy by utilizing two-stage verification and achieve the purpose of improving the awakening rate of the voice interaction application.

Description

Application awakening method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of speech processing technologies, and in particular, to an application wake-up method, apparatus, storage medium, and electronic device.

Background

Currently, with the development of voice recognition technology, an electronic device (such as a mobile phone, a tablet computer, etc.) may perform voice interaction with a user through a running voice interaction application, for example, the user may say "i want to listen to a song", and then the voice interaction application recognizes the voice of the user and plays the song after recognizing the intention of the user that wants to listen to the song. It can be understood that the premise of voice interaction between the user and the electronic device is to wake up the voice interaction application, however, in an actual use environment, various noises often exist, so that the wake-up rate of the voice interaction application is low.

Disclosure of Invention

The embodiment of the application awakening method and device, the storage medium and the electronic equipment can improve the awakening rate of the voice interaction application.

In a first aspect, an embodiment of the present application provides an application wake-up method, which is applied to an electronic device, where the electronic device includes two microphones, and the application wake-up method includes:

acquiring two paths of audio data through the two microphones, and acquiring background audio data played in an audio acquisition period;

performing echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;

performing beam forming processing on the two paths of audio data after the echo cancellation to obtain enhanced audio data;

performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed;

and if the secondary verification passes, awakening the voice interactive application.

In a second aspect, an embodiment of the present application provides an application waking device, which is applied to an electronic device, where the electronic device includes two microphones, and the application waking device includes:

the audio acquisition module is used for acquiring two paths of audio data through the two microphones and acquiring background audio data played in an audio acquisition period;

the echo cancellation module is used for carrying out echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;

the beam forming module is used for carrying out beam forming processing on the two paths of audio data after the echo cancellation to obtain enhanced audio data;

the audio verification module is used for performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed;

and the application awakening module is used for awakening the voice interaction application when the secondary verification is passed.

In a third aspect, the present application provides a storage medium, on which a computer program is stored, and when the computer program is run on an electronic device including two microphones, the electronic device is caused to execute the application wakeup method provided in the present application.

In a fourth aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a processor, a memory, and two microphones, where the memory stores a computer program, and the processor is used to execute the application wake-up method provided in the embodiment of the present application by calling the processor.

In the embodiment of the application, the electronic equipment comprises two microphones, and the two microphones can acquire two paths of audio data and acquire background audio data played in an audio acquisition period; then, echo cancellation processing is carried out on the two paths of audio data according to the background audio data so as to eliminate self-noise; then, performing beam forming processing on the two paths of audio data after echo cancellation to eliminate external noise and obtain enhanced audio data; secondly, performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed; and finally, if the secondary verification passes, awakening the voice interaction application so as to realize the voice interaction between the electronic equipment and the user. Therefore, the method and the device can eliminate the interference of self noise and external noise, ensure the verification accuracy by utilizing two-stage verification and achieve the purpose of improving the awakening rate of the voice interaction application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart illustrating an application wake-up method according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of the arrangement positions of two microphones in the embodiment of the present application.

Fig. 3 is a schematic flow chart of training a voiceprint feature extraction model in the embodiment of the present application.

Fig. 4 is a schematic diagram of a spectrogram extracted in the example of the present application.

Fig. 5 is another flowchart of an application wake-up method according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an application wake-up apparatus according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Fig. 8 is another schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

The embodiment of the present application first provides an application wake-up method, where an execution main body of the application wake-up method may be an electronic device provided in the embodiment of the present application, the electronic device includes two microphones, and the electronic device may be a device with processing capability and configured with a processor, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.

Referring to fig. 1, fig. 1 is a flowchart illustrating an application wake-up method according to an embodiment of the present disclosure. The application wake-up method is applied to the electronic device provided by the present application, where the electronic device includes two microphones, as shown in fig. 1, a flow of the application wake-up method provided by the embodiment of the present application may be as follows:

in 101, two paths of audio data are acquired by two microphones, and background audio data played during audio acquisition is acquired.

For example, two microphones included in the electronic device are arranged back to back and separated by a preset distance, where the arrangement of the two microphones back to back means that sound pickup holes of the two microphones face opposite directions. For example, referring to fig. 2, the electronic device includes two microphones, which are a microphone 1 disposed on a lower side of the electronic device and a microphone 2 disposed on an upper side of the electronic device, respectively, wherein a sound-collecting hole of the microphone 1 faces downward, a sound-collecting hole of the microphone 2 faces upward, and a connection line between the microphone 2 and the microphone 1 is parallel to left/right sides of the electronic device. Furthermore, the two microphones included in the electronic device may be non-directional microphones (or, omni-directional microphones).

In the embodiment of the application, the electronic equipment can collect sound through two microphones arranged back to back during playing audio and video, so that two paths of audio data with the same time length are collected. In addition, the electronic device may also obtain audio data played during audio acquisition, which may be independent audio data, such as audio files, songs, etc. played, or audio data appended to the video data, etc. It should be noted that, in order to distinguish between audio data obtained by performing sound acquisition and audio data played during audio acquisition, the audio data obtained during audio acquisition is referred to as background audio data in the present application.

At 102, echo cancellation processing is performed on the two paths of audio data according to the background audio data, so as to obtain two paths of audio data after echo cancellation.

It should be noted that, during playing audio and video, the electronic device performs sound collection through two microphones, and will collect and obtain the sound of the playing background audio data, that is, echo (or self-noise). In the application, in order to eliminate echoes in the two collected audio data, echo cancellation processing is further performed on the two audio data by using an echo cancellation algorithm according to background audio data so as to eliminate echoes in the two audio data, and the two audio data after echo cancellation are obtained. It should be noted that, in the embodiment of the present application, there is no particular limitation on what echo cancellation algorithm is used, and a person skilled in the art may select the echo cancellation algorithm according to actual needs.

For example, the electronic device may perform anti-phase processing on the background audio data to obtain anti-phase background audio data, and then superimpose the anti-phase background audio data with the two paths of audio data respectively to eliminate echoes in the two paths of audio data, so as to obtain two paths of audio data after echo cancellation.

In a popular way, the echo cancellation process performed above cancels the self-noise carried in the audio data.

In 103, the two paths of audio data after echo cancellation are processed by beamforming to obtain enhanced audio data.

After completing echo cancellation processing on the two paths of audio data and obtaining the two paths of audio data after echo cancellation, the electronic device further performs beam forming processing on the two paths of audio data after echo cancellation to obtain a path of audio data with a higher signal-to-noise ratio, and the audio data is recorded as enhanced audio data.

In colloquial terms, the beamforming process performed above eliminates external noise carried in the audio data. Therefore, the electronic device obtains the enhanced audio data with self-noise and external noise removed through echo cancellation processing and beam forming processing of the two paths of acquired audio data.

At 104, a primary check is performed on the text features and the voiceprint features of the enhanced audio data, and a secondary check is performed on the text features and the voiceprint features of the enhanced audio data after the primary check is passed.

As described above, the enhanced audio data eliminates self-noise and external noise compared to the collected original two-way audio data, which has a higher signal-to-noise ratio. At this time, the electronic device further performs two-stage verification on the text feature and the voiceprint feature of the enhanced audio data, wherein the electronic device performs one-stage verification on the text feature and the voiceprint feature of the enhanced audio data based on the first wake-up algorithm, and if the one-stage verification passes, the electronic device performs two-stage verification on the text feature and the voiceprint feature of the enhanced audio data based on the second wake-up algorithm.

It should be noted that, in the embodiment of the present application, whether the primary verification or the secondary verification is performed on the text feature and the voiceprint feature of the enhanced audio data, it is verified whether the enhanced audio data includes a preset wake-up word spoken by a preset user (for example, an owner of the electronic device, or another user who the owner authorizes to use the electronic device), if the enhanced audio data includes the preset wake-up word spoken by the preset user, the text feature and the voiceprint feature of the enhanced audio data are verified to be passed, and otherwise, the verification is not passed. For example, the enhanced audio data includes a preset wake-up word set by a preset user, and if the preset wake-up word is spoken by the preset user, the text feature and the voiceprint feature of the enhanced audio data pass the verification. For another example, when the enhanced audio data includes a preset wake-up word spoken by a user other than the preset user, or the enhanced audio data does not include any preset wake-up word spoken by the user, the verification fails (or the verification fails).

In addition, it should be further noted that, in the embodiment of the present application, the first wake-up algorithm and the second wake-up algorithm adopted by the electronic device are different. For example, the first voice wake-up algorithm is a voice wake-up algorithm based on a gaussian mixture model, and the second voice wake-up algorithm is a voice wake-up algorithm based on a neural network.

In 105, if the secondary check passes, the voice interaction application is awakened.

Among them, the voice interactive application is a so-called voice assistant, such as the voice assistant "xiaoho" of the european.

Based on the above description, it can be understood by those skilled in the art that when the secondary verification of the enhanced audio data passes, it indicates that a preset user currently speaks a preset wake-up word, and at this time, the voice interaction application is woken up, so as to implement voice interaction between the electronic device and the user.

As can be seen from the above, in the embodiment of the application, the electronic device may acquire two paths of audio data through two microphones, and acquire background audio data played during audio acquisition; then, echo cancellation processing is carried out on the two paths of audio data according to the background audio data so as to eliminate self-noise; then, performing beam forming processing on the two paths of audio data after echo cancellation to eliminate external noise and obtain enhanced audio data; secondly, performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed; and finally, if the secondary verification passes, awakening the voice interaction application so as to realize the voice interaction between the electronic equipment and the user. Therefore, the method and the device can eliminate the interference of self noise and external noise, ensure the verification accuracy by utilizing two-stage verification and achieve the purpose of improving the awakening rate of the voice interaction application.

In one embodiment, "echo cancellation processing two audio data according to background audio data" includes:

(1) obtaining an initial adaptive filter coefficient, and iteratively updating the initial adaptive filter coefficient according to background audio data and audio data to obtain a target adaptive filter coefficient;

(2) and performing echo cancellation processing on the audio data according to the target self-adaptive filter coefficient.

In the embodiment of the present application, when the electronic device performs echo cancellation processing on two paths of audio data according to background audio data, the following description will take echo cancellation processing on one path of audio data as an example.

The electronic equipment firstly acquires an initial adaptive filter coefficient, and then iteratively updates the initial adaptive filter coefficient according to background audio data and one path of audio data to obtain a target adaptive filter coefficient. Then, the electronic device estimates echo audio data carried in the audio data according to the target adaptive filter coefficient obtained by iterative update, so as to eliminate the echo audio data carried in the audio data, and complete echo cancellation processing on the audio data, as shown in the following formula:

X’＝X-W^T*X；

where X' denotes audio data after echo cancellation, X denotes audio data before echo cancellation, W denotes a target adaptive filter coefficient, and T denotes transposition.

In one embodiment, "iteratively updating the initial adaptive filter coefficients according to the background audio data and the audio data to obtain the target adaptive filter coefficients" includes:

(1) obtaining the self-adaptive filter coefficient at the current moment according to the initial self-adaptive filter coefficient;

(2) estimating echo audio data carried in the audio data and corresponding to the current moment according to the coefficient of the adaptive filter at the current moment;

(3) acquiring error audio data at the current moment according to the background audio data and the echo audio data obtained by estimation;

(4) and identifying the active part of the adaptive filter coefficient at the current moment, updating the active part of the adaptive filter coefficient at the current moment according to the error audio data at the current moment, and adjusting the order of the adaptive filter coefficient at the current moment to obtain the adaptive filter coefficient at the next moment.

How to iteratively update the initial adaptive filter coefficients is described in one update process below.

The current time is not specific to a certain time, but refers to a time when the initial adaptive filter coefficient is updated once.

Taking the first update of the initial adaptive filter coefficient as an example, the electronic device obtains the initial adaptive filter coefficient as the adaptive filter coefficient at the current time k. For example, the adaptive filter coefficient obtained at current time k is W_(k)＝[w₀,w₁,w₃...w_L-1]^TAnd has a length L.

Then, the electronic device estimates, according to the adaptive filter coefficient at the current time k, that the audio data carries echo audio data corresponding to the current time, as shown in the following formula:

wherein the content of the first and second substances,

representing the estimated echo audio data corresponding to the current time k, and x (k) representing the portion of the audio data corresponding to the current time k.

Then, the electronic device obtains error audio data of the current time k according to the portion of the background audio data corresponding to the current time k and the echo audio data obtained by estimation, as shown in the following formula:

where e (k) represents error audio data at the current time k, and r (k) represents a portion of the background audio data corresponding to the current time k.

It should be noted that larger filter orders increase computational complexity, while smaller filter orders do not fully converge the echo. In the application, the coefficients of the adaptive filter are mostly 0, and only a small part of the coefficients play a role of iterative update, so that only the active part of the adaptive filter can be iteratively updated, and the order of the adaptive filter can be adjusted in real time.

Correspondingly, in this embodiment of the present application, after acquiring the error audio data at the current time, the electronic device further identifies an active part of the adaptive filter coefficient at the current time k, so as to update the active part of the adaptive filter coefficient at the current time according to the error audio data at the current time, as shown in the following formula:

W(k+1)＝W(k)+ux(k)e(k)；

wherein u represents a preset convergence step length, which can be set by a person skilled in the art according to actual needs, and the present applicationThe examples do not specifically limit this. It is emphasized that the adaptive filter coefficient W at the current time k_(k)When an update is made, only the active part thereof is updated. For example, W_(k)＝[w₀,w₁,w₃...w_L-1]^TWherein [ w₀,w₁,w₃...w_L-3]Determined to be active, the electronic device pairs w as per the formula above₀,w₁,w₃...w_L-3]And (6) updating.

In addition, the electronic device adjusts the order of the adaptive filter coefficient at the current time according to the identified active portion, thereby obtaining an adaptive filter coefficient W (k +1) at the next time.

In one embodiment, "identifying an active portion of adaptive filter coefficients for a current time instant" includes:

(1) dividing the adaptive filter coefficient at the current moment into a plurality of sub-filter coefficients with equal length;

(2) acquiring the average value and the variance of each sub-filter coefficient from the back to the front, and determining the first sub-filter coefficient and the previous sub-filter coefficient of which the average value is greater than the preset average value and the variance is greater than the preset variance as active parts;

adjusting the order of the adaptive filter coefficient at the current time comprises:

(3) and judging whether the first sub-filter coefficient is the last sub-filter coefficient, if so, increasing the order of the adaptive filter coefficient at the current moment, and otherwise, reducing the order of the adaptive filter coefficient at the current moment.

In the embodiment of the present application, when identifying an active portion of an adaptive filter coefficient at a current time, an electronic device first divides the adaptive filter coefficient at the current time into a plurality of sub-filter coefficients with equal length (the length is greater than 1), for example, the electronic device divides the adaptive filter coefficient W at the current time into [ W ═ W₀,w₁,w₂...w_L-1]^TDividing the coefficient into M sub-filters with equal length, each sub-filter having a length of L/M, so that the mth sub-filter is setNumber W_m＝[w_mL/M,w_mL/M+1,w_mL/M+2…w_(m+1)L/M]^TAnd M has a value range of [0, M]。

Then, the electronic device obtains the average value and the variance of each sub-filter coefficient from the back to the front, that is, the average value and the variance of the Mth sub-filter coefficient are obtained first, and then the average value and the scheme of the M-1 th sub-filter coefficient are obtained until the first sub-filter coefficient of which the average value is larger than the preset average value and the opposite side difference is larger than the preset variance is obtained, and the first sub-filter coefficient and the sub-filter coefficients before the first sub-filter coefficient are determined as the active part of the adaptive filter coefficient at the current moment.

The preset average value and the preset variance may be obtained by a person skilled in the art through experimental adjustment, which is not specifically limited in the embodiment of the present application, for example, in the embodiment of the present application, the preset average value may be 0.000065, and the preset variance may be 0.003.

In addition, when the order of the adaptive filter coefficient at the current time is adjusted, the electronic device may determine whether the first sub-filter coefficient is the last sub-filter coefficient, if so, it indicates that the order of the adaptive filter coefficient at the current time is insufficient, and increase the order of the adaptive filter coefficient at the current time, otherwise, it indicates that the order of the adaptive filter coefficient at the current time is sufficient, and may decrease the order of the adaptive filter coefficient at the current time.

In this embodiment, for the variation of increasing or decreasing the order, a person skilled in the art can take an empirical value according to actual needs, and the embodiment of the present application does not specifically limit this.

In an embodiment, the "performing beamforming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data" includes:

and respectively carrying out beam forming processing on the two paths of audio data after echo cancellation at a plurality of preset angles by adopting a preset beam forming algorithm to obtain a plurality of enhanced audio data.

In the embodiment of the present application, a plurality of preset angles are provided for a microphone of an electronic device, for example, in a process of performing voice interaction with a user, the electronic device counts incoming wave angles of voice of the user to obtain a plurality of incoming wave angles at which a user usage probability reaches a preset probability, and uses the plurality of incoming wave angles as the plurality of preset angles.

Therefore, the electronic equipment can preset a beam forming algorithm to perform beam forming processing on the two paths of audio data after echo cancellation at a plurality of preset angles respectively to obtain a plurality of enhanced audio data.

For example, assume that 3 preset angles are provided, each being θ₁，θ₂And theta₃The GSC algorithm can be adopted for beam forming processing, and since the GSC algorithm needs to estimate the beam forming angle in advance, the electronic equipment will estimate the angle theta₁，θ₂And theta₃As the beam forming angles estimated by the GSC algorithm, the GSC algorithm is adopted to respectively aim at theta₁，θ₂And theta₃And performing beam forming processing to obtain 3 paths of enhanced audio data.

As described above, in the embodiment of the present application, the preset angle is used instead of the beam forming angle of the angle estimation, so that time-consuming angle estimation is not required, and the overall efficiency of beam forming can be improved.

In one embodiment, "performing a primary check on text features and voiceprint features of the enhanced audio data" includes:

(1) extracting Mel frequency cepstrum coefficients of the enhanced audio data corresponding to each preset angle;

(2) calling a target voiceprint characteristic model related to a preset text to match the extracted mel frequency cepstrum coefficients;

(3) if the matched mel frequency cepstrum coefficient exists, judging that the primary check is passed;

the target voiceprint feature model is obtained by a Gaussian mixture general background model related to a preset text in a self-adaptive mode according to the Mel frequency cepstrum coefficient of the preset audio data, and the preset audio data are audio data of a preset text spoken by a preset user.

The first-order wake-up algorithm is explained below.

It should be noted that, in the embodiment of the present application, a gaussian mixture general background model related to a preset text is trained in advance. The preset text is the above mentioned preset wake-up word. For example, audio data of a plurality of people (e.g., 200 people) who speak a preset wake-up word may be collected in advance, mel-frequency cepstrum coefficients of the audio data are extracted respectively, and a gaussian mixture general background model related to a preset text (i.e., the preset wake-up word) is obtained through training according to the mel-frequency cepstrum coefficients of the audio data.

Then, further training is carried out on the Gaussian mixture general background model, wherein self-adaptive processing (such as self-adaptive algorithms of maximum posterior probability MAP, maximum likelihood linear regression MLLR and the like) is carried out on the Gaussian mixture general background model according to the Mel frequency cepstrum coefficients of the preset audio data, the preset audio data is audio data of a preset text (namely a preset awakening word) spoken by a preset user, therefore, each Gaussian distribution of the Gaussian mixture general background model is close to the Mel frequency cepstrum coefficient corresponding to the preset user, the Gaussian mixture general background model carries the voiceprint characteristics of the preset user, and the Gaussian mixture general background model carrying the voiceprint characteristics of the preset user is recorded as a target voiceprint characteristic model.

Therefore, when the electronic equipment performs primary verification on the text features and the voiceprint features of the enhanced audio data, Mel frequency cepstrum coefficients of the enhanced audio data corresponding to each preset angle are respectively extracted, then target voiceprint feature models related to preset texts are called to respectively match the extracted Mel frequency cepstrum coefficients, the electronic equipment inputs the extracted Mel frequency cepstrum coefficients into the target voiceprint feature models, the target voiceprint feature models identify the input Mel frequency cepstrum coefficients and output a score, when the output score reaches a preset threshold value, the input Mel frequency cepstrum coefficients can be judged to be matched with the target voiceprint feature models, and otherwise, the input Mel frequency cepstrum coefficients are not matched with the target voiceprint feature models. For example, in the embodiment of the present application, the interval of the output score of the target voiceprint feature model is [0,1], and the preset threshold is configured to be 0.28, that is, when the score corresponding to the mel-frequency cepstrum coefficient of the input target voiceprint feature model reaches 0.28, the electronic device determines that the mel-frequency cepstrum coefficient matches with the target voiceprint feature model.

After the electronic equipment calls a target voiceprint feature model related to the preset text to match the extracted mel frequency cepstrum coefficients, if the matched mel frequency cepstrum coefficients exist, the electronic equipment judges that the primary check is passed.

In one embodiment, "performing a secondary check on text features and voiceprint features of enhanced audio data" includes:

(1) dividing the enhanced audio data corresponding to the preset angle into a plurality of sub audio data;

(2) extracting a voiceprint characteristic vector of each sub audio data according to a voiceprint characteristic extraction model related to a preset text;

(3) acquiring similarity between each voiceprint feature vector and a target voiceprint feature vector, wherein the target voiceprint feature vector is a voiceprint feature vector of preset audio data;

(4) according to the similarity corresponding to each sub audio data, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;

(5) and if the enhanced audio data corresponding to the preset angle passing the verification exists, judging that the secondary verification passes.

The secondary wake-up algorithm is explained below.

In the embodiment of the present application, it is considered that the enhanced audio data may not only include the preset wake-up word, for example, the preset wake-up word is "small europe and small europe", and the enhanced audio data is "small europe and small europe with your good". Therefore, in the embodiment of the present application, the voice part is divided into a plurality of sub-audio data according to the length of the preset wakeup word, where the length of each sub-audio data is greater than or equal to the length of the preset wakeup word, and two adjacent sub-audio data have an overlapped part, and as for the length of the overlapped part, the length of the overlapped part may be set by a person skilled in the art according to actual needs, for example, the length of the overlapped part is set to be 25% of the length of the sub-audio data in the embodiment of the present application.

It should be noted that in the embodiment of the present application, a voiceprint feature extraction model related to a preset text (i.e., a preset wake-up word) is also trained in advance. For example, in the embodiment of the present application, a voiceprint feature extraction model based on a convolutional neural network is trained, as shown in fig. 3, audio data of a preset wakeup word spoken by multiple persons (for example, 200 persons) is collected in advance, then endpoint detection is performed on the audio data, a preset wakeup word part is segmented out, then preprocessing (for example, high-pass filtering) and windowing are performed on the segmented preset wakeup word part, then fourier transform (for example, short-time fourier transform) is performed, and then energy density is calculated, a spectrogram of a gray scale is generated (as shown in fig. 4, wherein a horizontal axis represents time, a vertical axis represents frequency, and a gray scale represents an energy value), and finally, the generated spectrogram is trained by using the convolutional neural network, and a voiceprint feature extraction model related to a preset text is generated. In addition, in the embodiment of the application, a spectrogram of audio data of a preset user speaking a preset wakeup word (that is, a preset text) is extracted and input into a previously trained voiceprint feature extraction model, and after passing through a plurality of convolution layers, pooling layers and full-link layers of the voiceprint feature extraction model, a corresponding group of feature vectors are output and recorded as a target voiceprint feature vector.

Correspondingly, after the electronic device divides the enhanced audio data corresponding to the preset angle into a plurality of sub audio data, the spectrogram of each sub audio data is respectively extracted. For how to extract the spectrogram, details are not repeated here, and specific reference may be made to the above related description. After extracting the spectrogram of the sub-audio data, the electronic device inputs the spectrogram of the sub-audio data into a previously trained voiceprint feature extraction model, so as to extract a voiceprint feature vector of each sub-audio data.

After extracting the voiceprint feature vectors of the sub-audio data, the electronic device respectively obtains the similarity between the voiceprint feature vectors of the sub-audio data and the target voiceprint feature vector, and then verifies the text feature and the voiceprint feature of the enhanced audio data corresponding to the preset angle according to the similarity corresponding to the sub-audio data. For example, the electronic device may determine whether there is sub audio data whose similarity between the voiceprint feature vector and the target voiceprint feature vector reaches a preset similarity (an empirical value may be taken by a person of ordinary skill in the art according to actual needs, and may be set to 75%, for example), and if there is, determine a text feature and a voiceprint feature of the enhanced audio data corresponding to the preset angle.

After the electronic equipment completes the verification of the text features and the voiceprint features of the enhanced audio data corresponding to the preset angle, if the enhanced audio data corresponding to the preset angle passes the verification, the electronic equipment judges that the secondary verification passes.

In an embodiment, the verifying the text feature and the voiceprint feature of the enhanced audio data corresponding to the predetermined angle according to the similarity corresponding to each sub-audio data includes:

according to the similarity corresponding to each sub audio data and a preset identification function, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;

wherein the preset identification function is gamma_n＝γ_n-1+f(l_n)，γ_nRepresenting the state value, gamma, of the recognition function corresponding to the nth sub-audio data_n-1Represents the state value of the recognition function corresponding to the n-1 th sub audio data,

a is a correction value of the recognition function, b is a predetermined similarity, l_nIf the similarity exists between the voiceprint characteristic vector of the nth sub-audio data and the target voiceprint characteristic vector, the similarity is larger than the gamma of the preset identification function state value_nAnd judging that the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle pass verification.

It should be noted that the value of a in the recognition function can be an empirical value according to actual needs by those skilled in the art, for example, a can be set to 1.

In addition, the value of b in the recognition function is positively correlated with the recognition rate of the voiceprint feature extraction model, and the value of b is determined according to the recognition rate of the voiceprint feature extraction model obtained through actual training.

In addition, the preset recognition function state value can also be obtained by a person skilled in the art according to actual needs, and the higher the value is, the higher the accuracy of verification on the voice part is.

Therefore, through the identification function, even if other information except the preset awakening words is included in the enhanced audio data, the enhanced audio data can be accurately verified.

Optionally, when obtaining the similarity between the voiceprint feature vector of each sub-audio data and the target voiceprint feature training, the similarity between the voiceprint feature vector of each sub-audio data and the target voiceprint feature vector may be calculated according to a dynamic time warping algorithm.

Or, a feature distance between the voiceprint feature vector of each sub-audio data and the target voiceprint feature vector may be calculated as a similarity, and as to what feature distance is used to measure the similarity between the two vectors, no specific limitation is imposed in this embodiment of the application, for example, an euclidean distance may be used to measure the similarity between the voiceprint feature vector of the sub-audio data and the target voiceprint feature vector.

Fig. 5 is another flowchart of an application wake-up method according to an embodiment of the present application. The application wake-up method is applied to the electronic device provided by the present application, where the electronic device includes two microphones, as shown in fig. 5, a flow of the application wake-up method provided by the embodiment of the present application may be as follows:

in 201, the electronic device determines whether the electronic device is in an audio/video playing state based on a processor, if so, the electronic device proceeds to 202, and if not, the electronic device proceeds to 206.

In the embodiment of the application, the electronic device firstly judges whether the electronic device is in the audio and video playing state based on the processor, for example, taking an android system as an example, the electronic device receives an android internal message based on the processor, and judges whether the electronic device is in the audio and video playing state according to the android internal message.

In 202, the electronic device acquires two paths of audio data through two microphones, and acquires background audio data played during audio acquisition.

At 203, the electronic device performs echo cancellation processing on the two paths of audio data based on the processor according to the background audio data to obtain two paths of audio data after echo cancellation.

It should be noted that, during playing audio and video, the electronic device performs sound collection through two microphones, and will collect and obtain the sound of the playing background audio data, that is, echo (or self-noise). In the application, in order to eliminate echoes in the two collected audio data, an echo cancellation algorithm is called based on the processor to perform echo cancellation processing on the two audio data further according to background audio data so as to eliminate echoes in the two audio data and obtain two audio data after echo cancellation. It should be noted that, in the embodiment of the present application, there is no particular limitation on what echo cancellation algorithm is used, and a person skilled in the art may select the echo cancellation algorithm according to actual needs.

For example, the electronic device may perform anti-phase processing on the background audio data based on the processor to obtain anti-phase background audio data, and then superimpose the anti-phase background audio data with the two paths of audio data respectively to eliminate echoes in the two paths of audio data, so as to obtain two paths of audio data after echo cancellation.

At 204, the electronic device performs beamforming processing on the two paths of audio data after echo cancellation based on the processor, so as to obtain enhanced audio data.

After the electronic device completes echo cancellation processing on the two paths of audio data to obtain two paths of audio data after echo cancellation, the electronic device further performs beam forming processing on the two paths of audio data after echo cancellation based on the processor to obtain one path of audio data with a higher signal-to-noise ratio, and the audio data is recorded as enhanced audio data.

In 205, the electronic device performs a primary check on the text feature and the voiceprint feature of the enhanced audio data based on the processor, performs a secondary check on the text feature and the voiceprint feature of the enhanced audio data based on the processor after the primary check is passed, and wakes up the voice interaction application based on the processor if the secondary check is passed.

As described above, the enhanced audio data eliminates self-noise and external noise compared to the collected original two-way audio data, which has a higher signal-to-noise ratio. At this time, the electronic device further performs two-stage verification on the text feature and the voiceprint feature of the enhanced audio data based on the processor, wherein the first wake-up algorithm is called based on the processor to perform one-stage verification on the text feature and the voiceprint feature of the enhanced audio data, and if the one-stage verification passes, the second wake-up algorithm is called based on the processor to perform two-stage verification on the text feature and the voiceprint feature of the enhanced audio data.

At 206, the electronic device acquires a channel of audio data through any of the microphones.

When the electronic equipment does not play audio and video, sound collection is carried out through any microphone, and one path of audio data is obtained.

In 207, the electronic device performs a primary verification on the one path of audio data based on the dedicated voice recognition chip, and performs a secondary verification on the one path of audio data based on the processor after the primary verification passes.

The dedicated voice recognition chip is a dedicated chip designed for voice recognition, such as a digital signal processing chip designed for voice, an application specific integrated circuit chip designed for voice, and the like, and has lower power consumption than a general-purpose processor.

After the electronic equipment acquires the path of audio data, calling a third awakening algorithm based on a special voice recognition chip to verify the path of audio data, wherein text characteristics and voiceprint characteristics of the path of audio data can be verified simultaneously, and text characteristics of the path of audio data can be verified only.

For example, the electronic device may extract the mel-frequency cepstrum coefficient of the aforementioned audio data based on a dedicated speech recognition chip; then, calling a Gaussian mixture general background model related to a preset text based on a special voice recognition chip to match the extracted Mel frequency cepstrum coefficient; if the matching is successful, the text characteristic check of the path of audio data is judged to be passed.

After the first-level verification of the one path of audio data is passed, the electronic device further performs second-level verification on the one path of audio data based on the processor, wherein when the electronic device performs the second-level verification on the one path of audio data based on the processor, the first awakening algorithm or the second awakening algorithm is called based on the processor to verify the text feature and the sound pattern feature of the one path of audio data.

At 208, if the secondary check passes, the electronic device wakes up the voice interaction application based on the processor.

When the secondary verification of the audio data passes, the electronic equipment can wake up the voice interaction application based on the processor, and the voice interaction between the electronic equipment and the user is realized.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an application wake-up apparatus according to an embodiment of the present application. The application waking device can be applied to an electronic device which comprises two microphones. The wake-on-app device may include an audio acquisition module 401, an echo cancellation module 402, a beamforming module 403, an audio verification module 404, and a wake-on-app module 405, wherein,

the audio acquisition module 401 is configured to acquire two paths of audio data through two microphones and acquire background audio data played during audio acquisition;

the echo cancellation module 402 is configured to perform echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;

a beam forming module 403, configured to perform beam forming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data;

the audio verification module 404 is configured to perform primary verification on the text features and the voiceprint features of the enhanced audio data, and perform secondary verification on the text features and the voiceprint features of the enhanced audio data after the primary verification is passed;

and an application wake-up module 405, configured to wake up the voice interaction application when the secondary verification passes.

In an embodiment, when performing echo cancellation processing on two audio data according to background audio data, the echo cancellation module 402 may be configured to:

obtaining an initial adaptive filter coefficient, and iteratively updating the initial adaptive filter coefficient according to background audio data and audio data to obtain a target adaptive filter coefficient;

and performing echo cancellation processing on the audio data according to the target self-adaptive filter coefficient.

In one embodiment, when iteratively updating the initial adaptive filter coefficients according to the background audio data and the audio data to obtain the target adaptive filter coefficients, the echo cancellation module 402 may be configured to:

obtaining the self-adaptive filter coefficient at the current moment according to the initial self-adaptive filter coefficient;

estimating echo audio data carried in the audio data and corresponding to the current moment according to the coefficient of the adaptive filter at the current moment;

acquiring error audio data at the current moment according to the background audio data and the echo audio data obtained by estimation;

and identifying the active part of the adaptive filter coefficient at the current moment, updating the active part of the adaptive filter coefficient at the current moment according to the error audio data at the current moment, and adjusting the order of the adaptive filter coefficient at the current moment to obtain the adaptive filter coefficient at the next moment.

In an embodiment, in identifying the active portion of the adaptive filter coefficients at the current time, the echo cancellation module 402 may be configured to:

dividing the adaptive filter coefficient at the current moment into a plurality of sub-filter coefficients with equal length;

acquiring the average value and the variance of each sub-filter coefficient from the back to the front, and determining the first sub-filter coefficient and the previous sub-filter coefficient of which the average value is greater than the preset average value and the variance is greater than the preset variance as active parts;

while adjusting the order of the adaptive filter coefficients at the current time, the echo cancellation module 402 may be configured to:

and judging whether the first sub-filter coefficient is the last sub-filter coefficient, if so, increasing the order of the adaptive filter coefficient at the current moment, and otherwise, reducing the order of the adaptive filter coefficient at the current moment.

In an embodiment, when performing beamforming on the two paths of audio data after echo cancellation to obtain enhanced audio data, the beamforming module 403 may be configured to:

In one embodiment, in performing the primary verification on the text feature and the voiceprint feature of the enhanced audio data, the audio verification module 404 may be configured to:

extracting Mel frequency cepstrum coefficients of the enhanced audio data corresponding to each preset angle;

calling a target voiceprint characteristic model related to a preset text to match the extracted mel frequency cepstrum coefficients;

if the matched mel frequency cepstrum coefficient exists, judging that the primary check is passed;

In one embodiment, in performing the secondary verification on the text feature and the voiceprint feature of the enhanced audio data, the audio verification module 404 may be configured to:

dividing the enhanced audio data corresponding to the preset angle into a plurality of sub audio data;

extracting a voiceprint characteristic vector of each sub audio data according to a voiceprint characteristic extraction model related to a preset text;

acquiring similarity between each voiceprint feature vector and a target voiceprint feature vector, wherein the target voiceprint feature vector is a voiceprint feature vector of preset audio data;

according to the similarity corresponding to each sub audio data, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;

and if the enhanced audio data corresponding to the preset angle passing the verification exists, judging that the secondary verification passes.

In an embodiment, when the text feature and the voiceprint feature of the enhanced audio data corresponding to the preset angle are checked according to the similarity corresponding to each piece of sub audio data, the audio checking module 404 may be configured to:

wherein the preset identification function is gamma_n＝γ_n-1+f(l_n)，γ_nRepresenting the state value, gamma, of the recognition function corresponding to the nth sub-audio data_-1Represents the state value of the recognition function corresponding to the n-1 th sub audio data,

a is a correction value of the recognition function, b is a predetermined similarity, l_nFor the nth sub audio dataIf there is a gamma greater than the preset recognition function state value_nAnd judging that the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle pass verification.

In an embodiment, when obtaining the similarity between the voiceprint feature vector of each sub-audio data and the target voiceprint feature training, the audio verification module 404 may be configured to:

calculating the similarity between the vocal print characteristic vector of each sub audio data and the target vocal print characteristic vector according to a dynamic time warping algorithm;

or, calculating a feature distance between the voiceprint feature vector of each sub-audio data and the target voiceprint feature vector as a similarity.

The embodiment of the present application provides a storage medium, on which an instruction execution program is stored, and when the stored instruction execution program is executed on an electronic device provided in the embodiment of the present application, the electronic device is caused to execute the steps in the application wake-up method provided in the embodiment of the present application. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

Referring to fig. 7, the electronic device includes a processor 501, a memory 502, and a microphone 503.

The processor 501 in the present embodiment is a general purpose processor, such as an ARM architecture processor.

The memory 502 stores an instruction execution program, which may be a high speed random access memory, or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 access to the memory 502 to implement the following functions:

acquiring two paths of audio data through two microphones, and acquiring background audio data played in an audio acquisition period;

performing beam forming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data;

Referring to fig. 8, fig. 8 is another schematic structural diagram of the electronic device according to the embodiment of the present disclosure, and the difference from the electronic device shown in fig. 7 is that the electronic device further includes components such as an input unit 504 and an output unit 505.

The input unit 504 may be used for receiving input numbers, character information, or user characteristic information (such as fingerprints), and generating a keyboard, a mouse, a joystick, an optical or trackball signal input, and the like, related to user settings and function control, among others.

The output unit 505 may be used to display information input by a user or information provided to a user, such as a screen.

In this embodiment of the present application, the processor 501 in the electronic device loads instructions corresponding to one or more processes of the computer program into the memory 502 according to the following steps, and the processor 501 runs the computer program stored in the memory 502, so as to implement various functions, as follows:

In an embodiment, when performing echo cancellation processing on two audio data according to background audio data, the processor 501 may perform:

In one embodiment, when iteratively updating the initial adaptive filter coefficients according to the background audio data and the audio data to obtain the target adaptive filter coefficients, the processor 501 may perform:

In an embodiment, in identifying the active portion of the adaptive filter coefficients at the current time, processor 501 may perform:

while adjusting the order of the adaptive filter coefficients at the current time, processor 501 may perform:

In an embodiment, when performing beamforming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data, the processor 501 may perform:

In one embodiment, in performing a primary check on the text feature and the voiceprint feature of the enhanced audio data, the processor 501 may perform:

In one embodiment, in performing the secondary verification on the text feature and the voiceprint feature of the enhanced audio data, the processor 501 may perform:

In an embodiment, when the text feature and the voiceprint feature of the enhanced audio data corresponding to the preset angle are checked according to the similarity corresponding to each sub audio data, the processor 501 may perform:

In an embodiment, when obtaining the similarity between the voiceprint feature vector of each sub-audio data and the target voiceprint feature training, the processor 501 may perform:

It should be noted that the electronic device provided in the embodiment of the present application and the application wake-up method in the foregoing embodiments belong to the same concept, and any method provided in the embodiment of the application wake-up method may be run on the electronic device, and a specific implementation process thereof is described in detail in the embodiment of the feature extraction method, and is not described herein again.

It should be noted that, for the application wake-up method in the embodiment of the present application, it can be understood by a person skilled in the art that all or part of the process for implementing the application wake-up method in the embodiment of the present application can be completed by controlling the relevant hardware through a computer program, where the computer program can be stored in a computer readable storage medium, such as a memory of an electronic device, and executed by a processor and a dedicated voice recognition chip in the electronic device, and the process of executing the process can include, for example, the process of implementing the embodiment of the application wake-up method. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.

The application wake-up method, the storage medium, and the electronic device provided in the embodiments of the present application are described in detail above, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the description of the above embodiments is only used to help understand the method and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An application wake-up method applied to an electronic device, wherein the electronic device comprises two microphones, the application wake-up method comprising:

when the electronic equipment is in an audio and video playing state, the processor acquires two paths of audio data through the two microphones and acquires background audio data played during audio acquisition;

the processor performs echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;

the processor performs beamforming processing on the two paths of audio data after echo cancellation at a plurality of preset angles respectively by adopting a preset beamforming algorithm to obtain enhanced audio data corresponding to each preset angle, wherein the preset angle is obtained according to an incoming wave angle at which the counted use probability of a preset user reaches a preset probability;

the processor performs primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to each preset angle, and performs secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification;

and if the secondary verification passes, the processor wakes up the voice interaction application.

2. The application wake-up method according to claim 1, wherein the processor performs echo cancellation processing on the two paths of audio data according to the background audio data, and the echo cancellation processing includes:

the processor obtains an initial adaptive filter coefficient, and iteratively updates the initial adaptive filter coefficient according to the background audio data and the audio data to obtain a target adaptive filter coefficient;

and the processor performs echo cancellation processing on the audio data according to the target adaptive filter coefficient.

3. The wake-up application method according to claim 2, wherein the processor iteratively updates the initial adaptive filter coefficients according to the background audio data and the audio data to obtain target adaptive filter coefficients, comprising:

the processor acquires the self-adaptive filter coefficient at the current moment according to the initial self-adaptive filter coefficient;

the processor estimates echo audio data carried in the audio data and corresponding to the current moment according to the adaptive filter coefficient of the current moment;

the processor acquires error audio data at the current moment according to the background audio data and the echo audio data;

and the processor identifies the active part of the adaptive filter coefficient at the current moment, updates the active part according to the error audio data, and adjusts the order of the adaptive filter coefficient at the current moment to obtain the adaptive filter coefficient at the next moment.

4. The application wakeup method according to claim 3, wherein the identifying, by the processor, the active portion of the adaptive filter coefficient at the current time comprises:

the processor divides the adaptive filter coefficient of the current moment into a plurality of sub-filter coefficients with equal length;

the processor obtains the average value and the variance of each sub-filter coefficient from the back to the front, and determines the first sub-filter coefficient and the previous sub-filter coefficient of which the average value is greater than the preset average value and the variance is greater than the preset variance as the active part;

the adjusting the order of the adaptive filter coefficient at the current time includes:

and the processor judges whether the first sub-filter coefficient is the last sub-filter coefficient, if so, the order of the self-adaptive filter coefficient at the current moment is increased, and otherwise, the order of the self-adaptive filter coefficient at the current moment is reduced.

5. The application wake-up method according to any one of claims 1 to 4, wherein the processor performs primary verification on the text feature and the voiceprint feature of the enhanced audio data corresponding to each preset angle, including:

the processor extracts a Mel frequency cepstrum coefficient of the enhanced audio data corresponding to each preset angle;

the processor calls a target voiceprint characteristic model related to a preset text to match the extracted mel frequency cepstrum coefficients;

if the matched mel frequency cepstrum coefficient exists, the processor judges that the primary check is passed;

the target voiceprint feature model is obtained by a Gaussian mixture general background model related to a preset text in a self-adaptive mode according to a Mel frequency cepstrum coefficient of preset audio data, and the preset audio data are audio data of the preset text spoken by a preset user.

6. The application wake-up method according to claim 5, wherein the secondary verification of the text feature and the voiceprint feature of the enhanced audio data after the primary verification comprises:

the processor divides the enhanced audio data passing the primary check into a plurality of sub audio data;

the processor extracts the voiceprint feature vectors of the sub audio data according to the voiceprint feature extraction model related to the preset text;

the processor obtains the similarity between each voiceprint feature vector and a target voiceprint feature vector, wherein the target voiceprint feature vector is the voiceprint feature vector of the preset audio data;

the processor checks the text characteristic and the voiceprint characteristic of the enhanced audio data which passes the primary check according to the corresponding similarity of each sub audio data;

and if the enhanced audio data passing the primary verification passes the verification again, the processor judges that the secondary verification passes.

7. An application wakeup apparatus applied to a processor of an electronic device, wherein the electronic device includes two microphones, the application wakeup apparatus comprising:

the audio acquisition module is used for acquiring two paths of audio data through the two microphones when the electronic equipment is in an audio and video playing state and acquiring background audio data played in an audio acquisition period;

the beam forming module is used for performing beam forming processing on the two paths of audio data after echo cancellation at a plurality of preset angles respectively by adopting a preset beam forming algorithm to obtain enhanced audio data corresponding to each preset angle, wherein the preset angle is obtained according to a statistical incoming wave angle at which the use probability of a preset user reaches a preset probability;

the audio verification module is used for performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to each preset angle and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed;

8. An electronic device, characterized in that the electronic device comprises a processor, a memory and two microphones, the memory storing a computer program, characterized in that the processor is adapted to execute the application wake-up method according to any of claims 1 to 6 by invoking the computer program.

9. A storage medium, characterized in that, when a computer program stored in the storage medium is run on an electronic device comprising two microphones, the electronic device is caused to perform an application wake-up method according to any of claims 1 to 6.