CN110602624A

CN110602624A - Audio testing method and device, storage medium and electronic equipment

Info

Publication number: CN110602624A
Application number: CN201910818540.3A
Authority: CN
Inventors: 陈喆
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2019-12-20
Anticipated expiration: 2039-08-30
Also published as: CN110602624B

Abstract

The embodiment of the application discloses an audio test method, an audio test device, a storage medium and electronic equipment, wherein the electronic equipment circularly collects audio signals to be verified for preset times through a microphone and is used for primary verification of a special voice recognition chip and secondary verification of a processor, the preset counting application is utilized to receive first indication information sent by the special voice recognition chip when the primary verification is passed, statistics on the verification success times of the special voice recognition chip is achieved, a first counting result is obtained, the preset counting application is utilized to receive second indication information sent by the processor when the secondary verification is passed, statistics on the verification success times of the processor is achieved, and a second counting result is obtained. And finally, counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip, counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor, and realizing high-efficiency test of the awakening rate of the electronic equipment.

Description

Audio testing method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of audio testing, and in particular, to an audio testing method and apparatus, a storage medium, and an electronic device.

Background

Speech recognition is an important way for electronic devices such as smart phones and tablet computers to obtain the intention of users, and at present, a speech recognition function has become a standard configuration function of numerous electronic devices, for example, a user can speak a speech instruction to control the electronic device under the condition that the user is inconvenient to directly control the electronic device.

It should be noted that the voice recognition can be divided into two procedures of waking up and recognizing, and when the electronic device is woken up, the electronic device can be controlled by voice, which also makes the wake-up rate an important performance index of the electronic device, and how to obtain the wake-up rate of the electronic device through efficient testing becomes more important.

Disclosure of Invention

The embodiment of the application provides an audio test method, an audio test device, a storage medium and electronic equipment, which can efficiently test the awakening rate of the electronic equipment.

In a first aspect, an embodiment of the present application provides an audio testing method, which is applied to an electronic device, where the electronic device includes a microphone, a dedicated voice recognition chip and a processor, and the electronic device is placed in a pre-established testing environment, a voice playing device for playing testing voice is provided in the testing environment, the testing voice is a pure voice signal including a preset wake-up word, and the audio testing method includes:

acquiring audio through the microphone to obtain an audio signal to be verified, and providing the audio signal to be verified to the special voice recognition chip;

performing primary verification on the audio signal to be verified through the special voice recognition chip, providing the audio signal to be verified to the processor when the verification is passed, sending first indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip;

performing secondary verification on the audio signal to be verified through the processor, sending second indication information to the preset counting application when the verification is passed, and indicating the preset counting application to count so as to obtain a second counting result corresponding to the processor;

judging whether the number of times of primary verification reaches a preset number, if not, acquiring the audio signal to be verified again through the microphone for verification, and if so, acquiring the first counting result and the second counting result;

and counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip, and counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor.

In a second aspect, an embodiment of the present application provides an audio test apparatus, which is applied to an electronic device, the electronic device includes a microphone, a dedicated voice recognition chip and a processor, and the electronic device is placed in a pre-built test environment, a voice playing device for playing test voice is provided in the test environment, the test voice is a pure voice signal including a preset wake-up word, and the audio test apparatus includes:

the audio acquisition module is used for acquiring audio through the microphone to obtain an audio signal to be verified and providing the audio signal to be verified to the special voice recognition chip;

the primary checking module is used for performing primary checking on the audio signal to be checked through the special voice recognition chip, providing the audio signal to be checked to the processor when the checking is passed, sending first indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip;

the secondary checking module is used for carrying out secondary checking on the audio signal to be checked through the processor, sending second indication information to the preset counting application when the checking is passed, and indicating the preset counting application to count so as to obtain a second counting result corresponding to the processor;

the result acquisition module is used for judging whether the number of times of primary verification reaches a preset number, otherwise, the microphone collects the audio signal to be verified again for verification, and if so, the first counting result and the second counting result are acquired;

and the counting and counting module is used for counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip and counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor.

In a third aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, where the computer program is loaded by a processor and a dedicated speech recognition chip to execute an audio testing method provided by the embodiment of the present application.

In a fourth aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a microphone, a dedicated speech recognition chip, a processor, and a memory, where the memory stores a computer program, and the computer program is used to execute the audio test method provided in the embodiment of the present application when the computer program is called by the dedicated speech recognition chip and the processor.

In the embodiment of the application, the microphone circularly collects the preset times of audio signals to be verified, the audio signals are used for primary verification of the special voice recognition chip and secondary verification of the processor, the preset counting application is utilized to receive first indication information sent by the special voice recognition chip when the primary verification passes, statistics of the successful times of verification of the special voice recognition chip is achieved, a first counting result is obtained, the preset counting application is utilized to receive second indication information sent by the processor when the secondary verification passes, statistics of the successful times of verification of the processor is achieved, and a second counting result is obtained. And finally, counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip, counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor, and realizing high-efficiency test of the awakening rate of the electronic equipment.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an audio testing method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of invoking a primary text verification model in the embodiment of the present application.

Fig. 3 is a schematic diagram of a test environment built in the embodiment of the present application.

FIG. 4 is another schematic flow chart of an audio testing method according to an embodiment of the present application

Fig. 5 is a schematic structural diagram of an audio testing apparatus according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Fig. 7 is another schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is by way of example of particular embodiments of the present application and should not be construed as limiting the other particular embodiments of the present application that are not detailed herein.

Embodiments of the present application provide an audio testing method, an audio testing apparatus, a storage medium, and an electronic device, wherein an executing subject of the audio testing method may be the audio testing apparatus provided in the embodiments of the present application, or the electronic device integrated with the audio testing apparatus, wherein the audio testing apparatus may be implemented in hardware or software, and the electronic device may be a computing device such as a laptop computer, a computer monitor including an embedded computer, a tablet computer, a cellular phone, a media player, or other handheld or portable electronic devices, a smaller device (such as a wristwatch device, a hanging device, an earphone or headphone device, a device embedded in glasses or other devices worn on a user's head, or other wearable or miniature devices), a television, a computer display not including an embedded computer, a computer display, and a computer display, Gaming devices, navigation devices, embedded systems (such as systems in which an electronic device having a display is installed in a kiosk or automobile), and the like.

As shown in fig. 1, the flow of the audio testing method provided by the embodiment of the present application may be as follows:

101, acquiring audio through a microphone to obtain an audio signal to be verified, and providing the audio signal to be verified to a special voice recognition chip.

In the embodiment of the application, a test environment for audio test is set up in advance. For example, in order to get rid of external interference, can set up syllable-dividing test environment, wherein, be provided with the pronunciation playback devices who is used for broadcasting test pronunciation in the test environment, test pronunciation be for including the pure speech signal of predetermineeing the word of awakening up, for example, pronunciation playback devices can be artificial head, and it uses 5 seconds as the interval, and the circulation broadcast test pronunciation. It should be noted that the preset wake-up word may be set by a person skilled in the art according to actual needs, which is not specifically limited in the embodiment of the present application, for example, the preset wake-up word may be set to "small europe".

Before the audio test is started, the electronic equipment for audio test is placed in a test environment, so that the test voice is played through the voice playing equipment to simulate a real use scene to carry out the audio test on the electronic equipment, and the awakening rate of the electronic equipment is determined.

It should be noted that the electronic device in the embodiment of the present application includes a microphone, a dedicated voice recognition chip and a processor, wherein the dedicated voice recognition chip is a dedicated chip designed for voice recognition, such as a digital signal processing chip designed for voice recognition, an application specific integrated circuit chip designed for voice recognition, etc., which has lower power consumption but relatively weaker processing capability than a general-purpose processor. Because the processing capacity of the special voice recognition chip is not as good as that of the processor, when voice awakening is carried out, the special voice recognition chip carries out primary verification, namely rough verification, on the collected audio signals, when the primary verification passes, the processor carries out secondary verification on the collected audio signals, the accuracy of the whole verification is ensured, and when the secondary verification passes, voice interaction application is awakened again, so that voice interaction with a user is realized. Among other things, voice interactive applications are also known as voice assistants, such as "small europe" and the like.

When the audio test is carried out, the electronic equipment carries out audio acquisition through the arranged microphone, so that an audio signal corresponding to the test voice is acquired, and the audio signal is recorded as an audio signal to be verified.

The microphone provided in the electronic device may be an internal microphone or an external microphone (which may be a wired microphone or a wireless microphone). If the microphone is a microphone of an analog system, the acquired audio signal to be verified of the analog system needs to be subjected to analog-to-digital conversion at the moment, so that a digitized audio signal to be verified is obtained for subsequent processing. It can be understood by those skilled in the art that if the microphone disposed in the electronic device is a digital microphone, the digitized audio signal to be verified is directly collected without analog-to-digital conversion.

After the audio signal to be verified is acquired, the electronic equipment provides the acquired audio signal to be verified for the special voice recognition chip.

102, performing primary verification on the audio signal to be verified through the special voice recognition chip, providing the audio signal to be verified to the processor when the verification is passed, sending first indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip.

The first-level verification of the audio signal to be verified includes verifying a text feature of the audio signal to be verified, or verifying a text feature and a voiceprint feature of the audio signal to be verified, which may be specifically set by a person of ordinary skill in the art according to an actual situation, for example, in the embodiment of the present application, only the special voice recognition chip is used to verify the text feature of the audio signal to be verified.

In popular terms, the verification text characteristic is to verify whether the audio signal to be verified includes the preset awakening word, and as long as the audio signal to be verified includes the preset awakening word, the text characteristic passes the verification no matter who says the preset awakening word.

When the audio signal to be verified is subjected to primary verification, the special voice recognition chip can load a pre-trained primary awakening model used for verifying whether the audio signal comprises the preset awakening words or not, and primary verification is performed on the audio signal to be verified through the primary awakening model.

It should be noted that, in the embodiment of the present application, a message sending mechanism is further added to the dedicated voice recognition chip, so that the dedicated voice recognition chip sends the first indication information to the operating system of the electronic device when the primary verification of the audio signal to be verified is passed.

For example, the following description will be given by taking an example in which the electronic device runs the android system.

And when the primary verification of the audio signal to be verified is passed, the special voice recognition chip sends first indication information to an android system of the electronic equipment.

On the other hand, the embodiment of the present application is also designed with a counting application in advance, and can be obtained by selecting a suitable programming language for programming by a person of ordinary skill in the art according to actual needs. In order to know whether the primary verification of the audio signal to be verified by the special voice recognition chip passes or not, the preset counting application registers the first indication information in the android system in advance, so that the android system can push the first indication information to the preset counting application.

The preset counting application counts according to the first indication information when receiving the first indication information to obtain a first counting result corresponding to the dedicated voice recognition chip, for example, the preset counting application creates a first counting value which corresponds to the dedicated voice recognition chip and has an initial value of zero, and adds one to the first counting value each time the first indication information is received, that is, each time the primary verification of the acquired audio signal to be verified by the dedicated voice recognition chip passes, thereby realizing the statistics of the verification success times of the dedicated voice recognition chip and obtaining the first counting result.

In addition, when the primary verification of the audio signal to be verified is passed, the special voice recognition chip provides the audio signal to be verified, which is acquired by the microphone at this time, to the processor.

In addition, it should be noted that, due to the reason of the acquired audio signal to be verified and/or the reason of the primary wake-up model, the acquired audio signal to be verified cannot pass the primary verification, at this time, the first indication information is not sent, the acquired audio signal to be verified is discarded, and the process proceeds to 104.

And 103, performing secondary verification on the audio signal to be verified through the processor, sending second indication information to the preset counting application when the verification is passed, and indicating the preset counting application to count so as to obtain a second counting result corresponding to the processor.

The second-level verification of the audio signal to be verified includes verifying a text feature of the audio signal to be verified, or verifying a text feature and a voiceprint feature of the audio signal to be verified, which may be specifically set by a person of ordinary skill in the art according to an actual situation, for example, in the embodiment of the present application, the processor verifies the text feature and the voiceprint feature of the audio signal to be verified.

For example, when performing secondary verification on the audio signal to be verified, the processor may load a pre-trained secondary wake-up model for verifying whether the audio signal includes the preset wake-up word and whether the voiceprint feature matches the preset voiceprint feature, and perform secondary verification on the audio signal to be verified through the secondary wake-up model.

And when the secondary verification of the audio signal to be verified passes, the processor sends second indication information to an android system of the electronic equipment.

Correspondingly, in order to know whether the secondary verification of the audio signal to be verified by the processor passes or not, the preset counting application registers the second indication information in the android system in advance, so that the android system can push the second indication information to the preset counting application. Therefore, when the second indication information is received, counting is carried out according to the second indication information to obtain a second counting result of the corresponding processor, for example, a second counting value which corresponds to the processor and has an initial value of zero is established by preset counting application, and when the second indication information is received each time, namely, each time the secondary verification of the collected audio signal to be verified by the processor passes, the second counting value is added by one, so that the statistics of the verification success times of the processor is realized, and the second counting result is obtained.

In addition, it should be noted that, due to the reason of the acquired audio signal to be verified and/or the reason of the secondary wake-up model, the acquired audio signal to be verified cannot pass the secondary verification, at this time, the second indication information is not sent, the acquired audio signal to be verified is discarded, and the process proceeds to 104.

And 104, judging whether the number of times of primary verification reaches a preset number, otherwise, acquiring the audio signal to be verified again through the microphone for verification, and if so, acquiring a first counting result and a second counting result.

It should be noted that, the electronic device also counts the number of times of performing the primary verification, and after the secondary verification of the audio signal to be verified is completed through the processor each time, determines whether the number of times of performing the primary verification reaches a preset number of times. The preset number can be set by one skilled in the art according to actual needs, for example, the preset number can be set to 100.

And when the number of times of the primary verification reaches a preset number, the electronic equipment acquires a first counting result and a second counting result from a preset counting application.

When the number of times of the primary verification does not reach the preset number of times, the electronic equipment collects the audio signals to be verified again through the microphone for verification, and when the number of times of the primary verification reaches the preset number of times, a first counting result and a second counting result are obtained from a preset counting application.

And 105, counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip, and counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor.

For example, the preset number of times may be set to 100, that is, 100 primary checks need to be performed by the dedicated voice recognition chip, and assuming that the first counting result is 88, that is, 88 passes of the 100 primary checks are performed in total, it may be counted that the first wake-up rate of the dedicated voice recognition chip is 99/100 ═ 99%; assuming that the second count result is 98, as described above, since the second verification is not performed when the first verification fails, and the second verification is performed only when the first verification passes, that is, 98 passes are performed in total among 99 passes of the second verifications, the second wake-up rate of the processor can be statistically obtained as 98/99-99%. In addition, the overall wake-up rate of the dedicated voice recognition chip and the processor can be counted according to the second counting result and the preset number of times, and is recorded as a third wake-up rate which is 98/100-98%.

Therefore, the embodiment of the application collects the audio signals to be verified in the preset times through the microphone in a circulating mode and is used for primary verification of the special voice recognition chip and secondary verification of the processor, the first indication information sent by the special voice recognition chip when the primary verification is passed is received through the preset counting application, statistics of successful verification times of the special voice recognition chip is achieved, the first counting result is obtained, the second indication information sent by the processor when the secondary verification is passed is received through the preset counting application, statistics of successful verification times of the processor is achieved, and the second counting result is obtained. And finally, counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip, counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor, and realizing high-efficiency test of the awakening rate of the electronic equipment.

In an embodiment, the second indication information includes second text indication information and second fingerprint indication information, the second counting result includes a second text counting result and a second fingerprint counting result, the second wake-up rate includes a second text wake-up rate and a second fingerprint wake-up rate, "perform, by the processor, a secondary check on the audio signal to be checked," includes:

(1) calling a pre-trained secondary text verification model corresponding to a preset awakening word through a processor, verifying whether the audio signal to be verified comprises the preset awakening word, if the verification is passed, sending second text indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second text counting result corresponding to the processor;

(2) calling a pre-trained secondary voiceprint check model corresponding to the test voice through a processor, checking whether the voiceprint features of the audio signal to be checked are matched with the voiceprint features of the test voice, if the check is passed, sending second voiceprint indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second voiceprint counting result corresponding to the processor;

obtaining a second wake-up rate of the processor according to the first counting result and the second counting result, including:

(3) and counting according to the second text counting result and the first counting result to obtain a second text awakening rate, and counting according to the second fingerprint counting result and the second text counting result to obtain a second fingerprint awakening rate.

In the embodiment of the present application, the secondary verification performed by the processor is described as an example, where the secondary verification includes verification of a text feature and a voiceprint feature.

When the audio signal to be verified is subjected to secondary verification through the processor, firstly, a pre-trained secondary text verification model corresponding to the preset awakening words is called through the processor, and whether the audio signal to be verified comprises the preset awakening words is verified through the secondary text verification model.

For example, the secondary text verification model may be trained by a scoring function, where the scoring function is used to map a vector to a numerical value, and this is used as a constraint, and a person skilled in the art may select a suitable function as the scoring function according to actual needs, which is not limited in this embodiment of the present invention.

When the secondary text verification model is used for verifying whether the audio signal to be verified comprises the preset awakening words, firstly, the feature vector capable of representing the audio signal to be verified is extracted, the feature vector is input into the secondary text verification model for grading, and the corresponding grading score is obtained. And then, comparing the score with a discrimination score corresponding to the secondary text verification model, and if the score reaches the discrimination score corresponding to the secondary text verification model, judging that the audio signal to be verified comprises a preset awakening word.

And when the audio signal to be verified comprises the preset awakening word, the processor sends second text indication information to the android system. Correspondingly, in order to know whether the processor checks the text features of the audio signal to be checked, the preset counting application registers the second text indication information in the android system in advance, so that the android system can push the second text indication information to the preset counting application. And when receiving the second text indication information, counting according to the second text indication information to obtain a second text counting result corresponding to the processor, for example, a second text counting value which is created by a preset counting application and has an initial value of zero corresponding to the processor, and adding one to the second text counting value every time the second text indication information is received, that is, every time the processor passes the text feature verification of the acquired audio signal to be verified, thereby realizing the statistics of the number of times of success of the text feature verification of the processor and obtaining the second text counting result.

In addition, when the text features of the audio signal to be verified pass the verification, the electronic equipment calls a pre-trained secondary voiceprint verification model corresponding to the test voice through the processor, and verifies whether the voiceprint features of the audio signal to be verified are matched with the voiceprint features of the test voice through the secondary voiceprint verification model.

For example, the secondary voiceprint verification model can be further trained by the secondary text verification model based on the test speech. When the secondary voiceprint check model is used for checking whether the voiceprint characteristics of the audio signal to be checked are matched with the voiceprint characteristics of the test voice, firstly, the characteristic vector capable of representing the audio signal to be checked is extracted, and the characteristic vector is input into the secondary voiceprint check model to be scored, so that the corresponding scoring score is obtained. And then, comparing the score value with a discrimination score corresponding to the secondary voiceprint verification model, and if the score value reaches the discrimination score corresponding to the secondary voiceprint verification model, judging that the voiceprint characteristics of the audio signal to be verified are matched with the voiceprint characteristics of the test voice.

And when the voiceprint feature of the audio signal to be verified is matched with the voiceprint feature of the test voice, the processor sends second voiceprint indication information to the android system. Correspondingly, in order to know whether the processor checks the voiceprint features of the audio signal to be checked, the preset counting application registers second voiceprint indication information in the android system in advance, so that the android system can push the second voiceprint indication information to the preset counting application. Therefore, when receiving the second voiceprint indication information, counting is carried out according to the second voiceprint indication information so as to obtain a second voiceprint counting result of the corresponding processor, for example, a second voiceprint counting value which corresponds to the processor and is provided with an initial value of zero is established by preset counting application, and when receiving the second voiceprint indication information every time, namely, when the processor passes the voiceprint feature check of the collected audio signal to be checked, the second voiceprint counting value is increased by one, so that the statistics of the number of times of success of the voiceprint feature check of the processor is realized, and the second voiceprint counting result is obtained.

Furthermore, if the text or voiceprint characteristics of the audio signal to be verified have not been verified, a transition is made to 104.

In an embodiment, before "obtaining the audio signal to be verified by performing audio acquisition through a microphone", the method further includes:

(1) acquiring a pre-trained general verification model corresponding to a preset awakening word, and setting the general verification model as a secondary text verification model;

(2) carrying out audio acquisition through a microphone to obtain a sample audio signal;

(3) and extracting acoustic features of the sample audio signal, carrying out self-adaptive processing on the acoustic features based on the general verification model, and setting the general verification model after the self-adaptive processing as a secondary voiceprint verification model.

For example, before starting the audio test, sample audio signals of a preset wake-up word spoken by a plurality of people (e.g., 200 people) may be collected in advance, then acoustic features (e.g., mel-frequency cepstrum coefficients) of the sample audio signals are respectively extracted, and then a general verification model corresponding to the preset wake-up word is obtained through training according to the acoustic features of the sample audio signals. Since the universal verification model is trained by a large number of audio signals irrelevant to a specific person (i.e., a user), the universal verification model only fits the distribution of acoustic features of the person and does not represent a specific person.

In the embodiment of the application, before the audio test is started, a pre-trained general verification model corresponding to the preset awakening word is obtained, and the general verification model is set as a secondary text verification model.

In addition, the electronic equipment also acquires audio through the arranged microphone, so that an audio signal corresponding to the test voice is acquired, and the audio signal is recorded as a sample audio signal. And then, the electronic equipment extracts the acoustic characteristics of the sample audio signal, performs self-adaptive processing on the acoustic characteristics based on the general verification model, and sets the general verification model after the self-adaptive processing as a secondary voiceprint verification model. Wherein, the self-adapting process can be realized by adopting a maximum posterior estimation algorithm.

In an embodiment, a noise playing device is further disposed in the test scene, and the noise playing device is configured to play sample noise of a preset scene.

In the embodiment of the application, the noise playing device is further arranged and used for playing the sample noise of the preset scene, so that the awakening rate of the electronic device under the preset scene is tested. For example, the wake-up rate of the electronic device in the subway scene can be tested by playing the sample noise of the subway scene through the noise device.

(1) acquiring a first decibel value of the voice playing device playing the test voice and a second decibel value of the noise playing device playing the sample noise;

(2) and when the first decibel value and the second decibel value meet the preset test condition, audio acquisition is carried out through the microphone to obtain an audio signal to be verified.

In the embodiment of the present application, in order to ensure normal operation of the audio test, a certain signal-to-noise ratio during the test needs to be ensured.

For example, before the test is started, the decibel meter is placed at the same position of the electronic device, a first decibel value of the voice playing device playing the test voice and a second decibel value of the noise playing device playing the sample noise are obtained through the decibel meter, and then the ratio of the first decibel value to the second decibel value is calculated and used as the signal-to-noise ratio of the test environment.

Correspondingly, the preset test condition can be set to the state that the signal-to-noise ratio of the test environment reaches the preset signal-to-noise ratio, and for the value of the preset signal-to-noise ratio, a person skilled in the art can take the value according to actual needs.

After the electronic equipment calculates the signal-to-noise ratio of the test environment according to the first decibel value and the second decibel value, whether the signal-to-noise ratio reaches a preset signal-to-noise ratio is judged, if yes, audio acquisition is carried out through a microphone to obtain an audio signal to be verified, and audio test is started.

In one embodiment, "performing a primary verification on an audio signal to be verified through a dedicated voice recognition chip" includes:

(1) calling a pre-trained scene classification model through a special voice recognition chip to perform scene classification on the audio signal to be verified to obtain a scene classification result;

(2) and calling a pre-trained primary text verification model corresponding to the scene classification result through a special voice recognition chip to verify whether the audio signal to be verified comprises a preset awakening word.

In the embodiment of the present application, the first-level verification performed by the dedicated speech recognition chip including the verification of the text feature is taken as an example for explanation.

It should be noted that, in the embodiment of the present application, a scene classification model is trained in advance by using a machine learning algorithm according to sample audio signals of different known scenes, and the scene classification model can be used to classify the scene where the electronic device is located.

Because the test environment is provided with not only the voice playing device but also the noise playing device, the audio signal to be verified collected by the electronic device can be regarded as being composed of two parts, namely a part corresponding to the test voice and a part corresponding to the sample noise. Correspondingly, when the audio signal to be verified is subjected to primary verification through the special voice recognition chip, a pre-trained scene classification model is called through the special voice recognition chip, and the audio signal to be verified is classified by utilizing the scene classification model to obtain a scene classification result. The scene classification result describes the scene simulated by the noise playing device through playing the sample noise.

It should be noted that, in the embodiment of the present application, a primary text verification model set is preset in the electronic device, where the primary text verification model set includes a plurality of primary text verification models which are obtained by training in different scenes in advance and correspond to the preset wake-up words, so as to be suitable for the special voice recognition chip to load in different scenes, and thus, whether the acquired audio signal to be verified includes the preset wake-up words is verified more flexibly and accurately.

Correspondingly, after the scene classification result corresponding to the audio signal to be verified is obtained, the electronic device calls a primary text verification model corresponding to the scene classification result from the primary text verification model set through the special voice recognition chip, verifies whether the audio signal to be verified includes the preset awakening word through the primary text verification model, and if yes, judges that the audio signal to be verified passes the primary verification.

For example, referring to fig. 2, the primary text verification model set includes four primary text verification models, which are a primary text verification model a suitable for performing audio verification in a scene a, a primary text verification model B suitable for performing audio verification in a scene B, a primary text verification model C suitable for performing audio verification in a scene C, and a primary text verification model D suitable for performing audio verification in a scene D. If the scene classification result indicates that the scene corresponding to the audio signal to be verified is a scene B, the electronic equipment loads a primary text verification model B from a primary text verification model set through a special voice recognition chip; and if the scene classification result indicates that the scene corresponding to the audio signal to be verified is a scene B, the electronic equipment loads a primary text verification model B from the primary text verification model set through the special voice recognition chip, and the like.

Referring to fig. 3 and 4 in combination, fig. 3 is a schematic diagram of a test environment for performing an audio test in the embodiment of the present application, and as shown in fig. 3, a sound-proof test environment is first established, and a dummy head is set in the test environment as a voice playing device for playing test voice, and a speaker is set as a noise playing device for playing sample noise, and in addition, a computer is also set in the test environment as a main control device for performing play control on the dummy head and the speaker. The person skilled in the art determines the placement position of the electronic device in the test environment according to actual needs, and places the electronic device at the determined placement position.

The electronic device comprises a special voice recognition chip and a processor. When voice awakening is carried out, first-level verification is carried out on the collected audio signals by the special voice recognition chip, namely rough verification is carried out, when the first-level verification is passed, second-level verification is carried out on the collected audio signals by the processor, the accuracy of the whole verification is ensured, and when the second-level verification is passed, voice interaction application is awakened again, so that voice interaction with a user is realized. Among them, the voice interactive application is called a voice assistant, such as "kohma" or the like.

Under the control of a computer, the artificial head circularly plays a pure voice signal including a preset awakening word every 5 seconds, the pure voice signal is recorded as a test voice, a loudspeaker continuously plays sample noise, and a preset scene is simulated, so that the awakening rate of the electronic equipment under the preset scene is verified.

Before starting to carry out audio test, the decibel meter is placed at the same position of the electronic equipment, the electronic equipment obtains a first decibel value of the artificial head playing test voice through the decibel meter and obtains a second decibel value of the loudspeaker playing sample noise, a corresponding signal-to-noise ratio is obtained through calculation according to the first decibel value and the second decibel value, when the signal-to-noise ratio does not reach a preset signal-to-noise ratio, the electronic equipment sends indication information to a computer, the computer adjusts the playing volume of the artificial head and/or the loudspeaker, and when the signal-to-noise ratio reaches the preset signal-to-noise ratio, audio test is carried out according to an audio test flow shown in figure 4:

and 201, the special voice recognition chip acquires audio through a microphone to obtain an audio signal to be verified.

202, the special voice recognition chip loads a primary text awakening model to verify the audio signal to be verified, if the verification is passed, the step is switched to 203, and if the verification is failed, the step is switched to 208.

And 203, the special voice recognition chip provides the audio signal to be verified for the processor, sends first indication information to a preset counting application, and indicates the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip.

204, the processor calls the secondary text awakening model to verify the audio signal to be verified, if the verification is passed, the process proceeds to 205, and if the verification fails, the process proceeds to 208.

205, the processor sends the second text indication information to the preset counting application, and indicates the preset counting application to count to obtain a second text counting result corresponding to the processor.

And 206, calling a voiceprint wake-up model by the processor to check the audio signal to be checked, and if the check is passed, turning to 207, and if the check is failed, turning to 208.

And 207, the processor sends second voiceprint indication information to the preset counting application, and the preset counting application is indicated to count so as to obtain a second voiceprint counting result of the corresponding processor.

208, the processor determines whether the number of times of checking by the dedicated voice recognition chip reaches a preset number, if so, the process proceeds to 209, otherwise, the process proceeds to 201.

209, the processor obtains a first wake-up rate of the dedicated speech recognition chip according to the first counting result and the preset times, obtains a second text wake-up rate of the processor according to the second text counting result and the first counting result, and obtains a second fingerprint wake-up rate of the processor according to the second fingerprint counting result and the second text counting result.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an audio testing apparatus according to an embodiment of the present disclosure. This audio test device can be applied to electronic equipment, and this electronic equipment includes microphone, special speech recognition chip and treater, places electronic equipment in the test environment of buildding in advance, is provided with the pronunciation playback devices who is used for playing test pronunciation in the test environment, and test pronunciation is for including the pure speech signal of predetermineeing the word of awakening up. The audio testing apparatus may include an audio acquisition module 301, a primary verification module 302, a secondary verification module 303, a result acquisition module 304, and a count statistics module 305, wherein,

the audio acquisition module 301 is configured to acquire an audio signal to be verified through a microphone, and provide the audio signal to be verified to the dedicated voice recognition chip;

the primary checking module 302 is configured to perform primary checking on an audio signal to be checked through the dedicated voice recognition chip, provide the audio signal to be checked to the processor when the checking is passed, send first indication information to a preset counting application, and indicate the preset counting application to count so as to obtain a first counting result corresponding to the dedicated voice recognition chip;

the secondary checking module 303 is configured to perform secondary checking on the audio signal to be checked through the processor, send second indication information to the preset counting application when the checking is passed, and indicate the preset counting application to count so as to obtain a second counting result of the corresponding processor;

a result obtaining module 304, configured to determine whether the number of times of performing the primary verification reaches a preset number, otherwise, instruct the audio acquisition module 301 to acquire the audio signal to be verified again through the microphone for verification, and if so, obtain a first counting result and a second counting result;

the counting and counting module 305 is configured to count a first wake-up rate of the dedicated speech recognition chip according to the first counting result and a preset number of times, and count a second wake-up rate of the processor according to the first counting result and the second counting result.

In an embodiment, the second indication information includes second text indication information and second fingerprint indication information, the second counting result includes a second text counting result and a second fingerprint counting result, the second wake-up rate includes a second text wake-up rate and a second fingerprint wake-up rate, and when the secondary verification is performed on the audio signal to be verified through the processor, the secondary verification module 303 is configured to:

calling a pre-trained secondary text verification model corresponding to a preset awakening word through a processor, verifying whether the audio signal to be verified comprises the preset awakening word, if the verification is passed, sending second text indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second text counting result corresponding to the processor;

calling a pre-trained secondary voiceprint check model corresponding to the test voice through a processor, checking whether the voiceprint features of the audio signal to be checked are matched with the voiceprint features of the test voice, if the check is passed, sending second voiceprint indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second voiceprint counting result corresponding to the processor;

when the second wake-up rate of the processor is obtained according to the first counting result and the second counting result, the counting module 305 is configured to:

and counting according to the second text counting result and the first counting result to obtain a second text awakening rate, and counting according to the second fingerprint counting result and the second text counting result to obtain a second fingerprint awakening rate.

In an embodiment, the audio testing apparatus further includes a model training module, before the audio acquisition is performed through the microphone to obtain the audio signal to be verified, configured to:

acquiring a pre-trained general verification model corresponding to a preset awakening word, and setting the general verification model as a secondary text verification model;

carrying out audio acquisition through a microphone to obtain a sample audio signal;

and extracting acoustic features of the sample audio signal, carrying out self-adaptive processing on the acoustic features based on the general verification model, and setting the general verification model after the self-adaptive processing as a secondary voiceprint verification model.

In an embodiment, before the audio acquisition by the microphone obtains the audio signal to be verified, the audio acquisition module 301 is further configured to:

acquiring a first decibel value of the voice playing device playing the test voice and a second decibel value of the noise playing device playing the sample noise;

and when the first decibel value and the second decibel value meet the preset test condition, audio acquisition is carried out through the microphone to obtain an audio signal to be verified.

In an embodiment, when performing a primary verification on the audio signal to be verified through the dedicated speech recognition chip, the primary verification module 302 is configured to:

calling a pre-trained scene classification model through a special voice recognition chip to perform scene classification on the audio signal to be verified to obtain a scene classification result;

and calling a pre-trained primary text verification model corresponding to the scene classification result through a special voice recognition chip to verify whether the audio signal to be verified comprises a preset awakening word or not, and judging that the audio signal passes primary verification if the audio signal to be verified comprises the preset awakening word.

In an embodiment, the result obtaining module 304 is further configured to, when the primary verification or the secondary verification fails, determine whether the number of times of performing the primary verification reaches a preset number, if so, obtain a first counting result and a second counting result, otherwise, instruct the audio acquisition module 301 to acquire the audio signal to be verified through the microphone again for verification.

It should be noted that the audio test apparatus provided in the embodiment of the present application and the audio test method in the above embodiment belong to the same concept, and any method provided in the embodiment of the audio test method can be run on the audio test apparatus, and a specific implementation process thereof is described in detail in the embodiment of the feature obtaining method, and is not described here again.

Embodiments of the present application further provide a storage medium, on which a computer program is stored, and when the stored computer program is executed on an electronic device provided in an embodiment of the present application, the electronic device is caused to perform the steps in the audio testing method provided in the embodiment of the present application. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

Referring to fig. 6, the electronic device includes a processor 401, a memory 402, a microphone 403, and a dedicated voice recognition chip 404.

The processor 401 in the embodiment of the present application is a general-purpose processor, such as an ARM architecture processor.

The dedicated speech recognition chip 402 is a dedicated chip designed for speech recognition, such as a digital signal processing chip designed for speech recognition, an application specific integrated circuit chip designed for speech recognition, etc., which has lower power consumption but relatively weaker processing capability than the general-purpose processor 401.

The memory 402 stores therein a computer program, which may be a high-speed random access memory, or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401, the dedicated speech recognition chip 404, access to the memory 402.

The method comprises the steps of placing electronic equipment in a pre-built test environment, wherein voice playing equipment used for playing test voice is arranged in the test environment, and the test voice is a pure voice signal comprising a preset awakening word.

The processor 401 and the dedicated speech recognition chip 404 are adapted to perform, by calling the computer program in the memory 402, the following:

the special voice recognition chip 404 acquires audio through the microphone 403 to obtain an audio signal to be verified;

the special voice recognition chip 404 performs primary verification on the audio signal to be verified, provides the audio signal to be verified to the processor 401 when the verification is passed, sends first indication information to a preset counting application, and indicates the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip 404;

the processor 401 performs secondary verification on the audio signal to be verified, and sends second indication information to the preset counting application when the verification is passed, and the preset counting application is indicated to count so as to obtain a second counting result corresponding to the processor 401;

the processor 401 determines whether the number of times of primary verification reaches a preset number, if so, obtains a first counting result and a second counting result, otherwise, instructs the special voice recognition chip 404 to acquire the audio signal to be verified again through the microphone for verification;

the processor 401 obtains a first wake-up rate of the dedicated speech recognition chip 404 according to the first counting result and the preset number of times, and obtains a second wake-up rate of the processor 401 according to the first counting result and the second counting result.

Referring to fig. 7, fig. 7 is another schematic structural diagram of the electronic device according to the embodiment of the present disclosure, and the difference from the electronic device shown in fig. 6 is that the electronic device further includes components such as an input unit 405 and an output unit 406.

The input unit 405 may be used to receive input numbers, character information, or user characteristic information (such as a fingerprint), and generate a keyboard, a mouse, a joystick, an optical or trackball signal input, etc., related to user setting and function control, among others.

The output unit 406 may be used to display information input by the user or information provided to the user, such as a screen.

In the embodiment of the present application, the processor 401 and the dedicated speech recognition chip 404 are used to execute, by calling the computer program in the memory 402:

In an embodiment, the second indication information includes second text indication information and second fingerprint indication information, the second counting result includes a second text counting result and a second fingerprint counting result, the second wake-up rate includes a second text wake-up rate and a second fingerprint wake-up rate, and when performing secondary verification on the audio signal to be verified, the processor 401 is configured to perform:

calling a pre-trained secondary text verification model corresponding to a preset awakening word, verifying whether the audio signal to be verified comprises the preset awakening word, if the verification is passed, sending second text indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second text counting result corresponding to the processor 401;

calling a pre-trained secondary voiceprint check model corresponding to the test voice, checking whether the voiceprint features of the audio signal to be checked are matched with the voiceprint features of the test voice, if the check is passed, sending second voiceprint indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second voiceprint counting result corresponding to the processor 401;

when the second wake-up rate of the processor 401 is obtained according to the statistics of the first counting result and the second counting result, the processor 401 is configured to perform:

In an embodiment, before the audio signal to be verified is obtained by audio acquisition through a microphone, the processor 401 is further configured to perform:

acquiring audio through a microphone 403 to obtain a sample audio signal;

In an embodiment, before the audio signal to be verified is obtained by audio acquisition through a microphone, the dedicated speech recognition chip 404 is further configured to perform:

when the first decibel value and the second decibel value satisfy the preset test condition, audio acquisition is performed through the microphone 403 to obtain an audio signal to be verified.

In one embodiment, when performing a primary verification on the audio signal to be verified, the dedicated speech recognition chip 404 is configured to perform:

calling a pre-trained scene classification model to perform scene classification on the audio signal to be verified to obtain a scene classification result;

and calling a pre-trained primary text verification model corresponding to the scene classification result to verify whether the audio signal to be verified comprises a preset awakening word, and if so, judging that the audio signal to be verified passes the primary verification.

In an embodiment, when the primary verification or the secondary verification fails, the processor 401 proceeds to execute a judgment to determine whether the number of times of performing the primary verification reaches a preset number of times, if so, obtains a first counting result and a second counting result, otherwise, instructs the dedicated voice recognition chip 404 to acquire the audio signal to be verified through the microphone again for verification.

It should be noted that the electronic device provided in the embodiment of the present application and the audio testing method in the above embodiment belong to the same concept, and any method provided in the embodiment of the audio testing method may be run on the electronic device, and a specific implementation process thereof is described in detail in the embodiment of the feature obtaining method, and is not described here again.

It should be noted that, for the audio testing method of the embodiment of the present application, it can be understood by a person skilled in the art that all or part of the process of implementing the audio testing method of the embodiment of the present application can be completed by controlling the relevant hardware through a computer program, where the computer program can be stored in a computer readable storage medium, such as a memory of an electronic device, and executed by a processor and a dedicated voice recognition chip in the electronic device, and the process of executing the process can include, for example, the process of the embodiment of the audio testing method. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.

The audio testing method, the storage medium and the electronic device provided by the embodiment of the present application are described in detail above, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. The utility model provides an audio test method, is applied to electronic equipment, its characterized in that, electronic equipment includes microphone, special speech recognition chip and treater, will electronic equipment places in the test environment of buildding in advance, be provided with the pronunciation broadcast equipment that is used for playing test pronunciation in the test environment, test pronunciation is for including the pure speech signal who predetermines the word of awakening up, audio test method includes:

2. The audio testing method according to claim 1, wherein the second indication information includes second text indication information and second voiceprint indication information, the second counting result includes a second text counting result and a second voiceprint counting result, the second wake-up rate includes a second text wake-up rate and a second voiceprint wake-up rate, and the secondary verification of the audio signal to be verified by the processor includes:

calling a pre-trained secondary text verification model corresponding to the preset awakening word through the processor, verifying whether the audio signal to be verified comprises the preset awakening word, if so, sending second text indication information to the preset counting application, and indicating the preset counting application to count so as to obtain a second text counting result corresponding to the processor;

calling a pre-trained secondary voiceprint verification model corresponding to the test voice through the processor, verifying whether the voiceprint features of the audio signal to be verified are matched with the voiceprint features of the test voice, if so, sending second voiceprint indication information to the preset counting application, and indicating the preset counting application to count so as to obtain a second voiceprint counting result corresponding to the processor;

the obtaining a second wake-up rate of the processor according to the statistics of the first counting result and the second counting result includes:

and counting according to the second text counting result and the first counting result to obtain the second text awakening rate, and counting according to the second fingerprint counting result and the second text counting result to obtain the second fingerprint awakening rate.

3. The audio testing method of claim 2, wherein before the obtaining of the audio signal to be verified through the audio collection by the microphone, the method further comprises:

acquiring a pre-trained general verification model corresponding to the preset awakening word, and setting the general verification model as the secondary text verification model;

carrying out audio acquisition through the microphone to obtain a sample audio signal;

and extracting the acoustic features of the sample audio signal, carrying out self-adaptive processing on the acoustic features based on the general verification model, and setting the general verification model after the self-adaptive processing as the secondary voiceprint verification model.

4. The audio testing method according to any one of claims 1 to 3, wherein a noise playing device is further disposed in the test scene, and the noise playing device is configured to play sample noise of a preset scene.

5. The audio testing method according to claim 4, wherein before the audio acquisition by the microphone to obtain the audio signal to be verified, the method further comprises:

and when the first decibel value and the second decibel value meet preset test conditions, audio acquisition is carried out through the microphone to obtain an audio signal to be verified.

6. The audio testing method according to claim 5, wherein the performing of the primary verification on the audio signal to be verified by the dedicated voice recognition chip comprises:

calling a pre-trained scene classification model through the special voice recognition chip to perform scene classification on the audio signal to be verified to obtain a scene classification result;

calling a pre-trained primary text verification model corresponding to the scene classification result through the special voice recognition chip to verify whether the audio signal to be verified comprises the preset awakening words or not, and judging that the audio signal to be verified passes the primary verification if the audio signal to be verified comprises the preset awakening words.

7. The audio testing method according to any one of claims 1 to 3, further comprising, when the primary check or the secondary check fails, switching to the judgment whether the number of times of performing the primary check reaches a preset number of times.

8. The utility model provides an audio test device, is applied to electronic equipment, its characterized in that, electronic equipment includes microphone, special speech recognition chip and treater, will electronic equipment places in the test environment of buildding in advance, be provided with the pronunciation broadcast equipment that is used for playing test pronunciation in the test environment, test pronunciation is for including predetermineeing the pure speech signal who awakens the word up, the audio test method includes:

9. An electronic device comprising a microphone, a dedicated speech recognition chip, a processor and a memory, the memory having stored therein a computer program, the power consumption of the dedicated speech recognition chip being less than the power consumption of the processor, the computer program, when invoked by the dedicated speech recognition chip and the processor, being for performing the audio test method of any of claims 1-7.

10. A storage medium having stored thereon a computer program, characterized in that the computer program is loaded by a processor and a dedicated speech recognition chip for performing the audio testing method according to any of claims 1-7.