CN115474146A

CN115474146A - Voice test system, method and device

Info

Publication number: CN115474146A
Application number: CN202211037946.6A
Authority: CN
Inventors: 郑永萍; 郑立娟; 车婷婷
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2022-12-13

Abstract

The disclosure provides a voice test system, a method and a device, and relates to the technical field of voice. The specific implementation scheme is as follows: the voice testing module sends the audio to be tested to the first chip to execute the voice interaction function of the equipment to be tested, and records the cumulative sending frame number of the audio to be tested; reading the accumulated sending frame number to determine a triggering time point and sending the triggering time point to the computing equipment in response to a preset event generated by the voice interaction function; the computing device determines a response delay of a preset event based on the trigger time point. According to the voice test module, the audio to be tested can be automatically injected into the first chip, and the response delay of the preset event is automatically determined based on the trigger time point of the preset event. Full flow automation from audio infusion to test results, thereby improving test efficiency. The voice test module can complete automatic test only by mounting and dismounting a chip of the device to be tested and integrating corresponding voice interaction functions, and the scheme has expandability.

Description

Voice test system, method and device

Technical Field

The present disclosure relates to the field of data processing technology, and more particularly, to the field of speech technology.

Background

Voice interaction is an important area of human-computer interaction. Due to the convenience of interaction, the voice interaction technology is widely applied to various products, such as vehicle-mounted terminals, smart phones, smart appliances and the like. The voice interaction relies on voice interaction functionality. When the function is improved significantly, the related products need to be tested by voice to determine whether the voice interaction function can meet the requirements.

Disclosure of Invention

The present disclosure provides a system, method and apparatus for voice testing.

According to an aspect of the present disclosure, a voice test system is provided, including voice test module and computing device, be provided with the first chip that the equipment under test adopted in the voice test module, wherein:

the voice testing module is used for sending the audio frequency to be tested to the first chip to execute the voice interaction function and recording the accumulated sending frame number of the audio frequency to be tested; reading the number of accumulated sending frames in response to a preset event generated by a voice interaction function; determining a trigger time point of a preset event based on the accumulated sending frame number; sending the trigger time point of the preset event to the computing equipment;

and the computing equipment determines the response delay of the preset event based on the trigger time point and the marked time point of the preset event in the audio to be tested.

According to another aspect of the present disclosure, there is provided a voice testing method, including:

sending the audio to be tested to a first chip to execute the voice interaction function of the equipment to be tested, and recording the cumulative sending frame number of the audio to be tested;

responding to a preset event generated by a voice interaction function, reading an accumulated sending frame number, and determining a trigger time point of the preset event based on the accumulated sending frame number;

and sending the trigger time point of the preset event to the computing equipment so that the computing equipment determines the response delay of the preset event based on the trigger time point and the marking time point of the preset event in the audio to be detected.

receiving a trigger time point of a preset event, wherein the trigger time point is sent by a voice test module in response to the preset event of the voice interaction function; the voice test module is provided with a first chip for executing a voice interaction function;

and determining the response delay of the preset event based on the triggering time point and the marking time point of the preset event in the audio to be tested.

According to another aspect of the present disclosure, there is provided a voice test apparatus including:

the first determining module is used for sending the audio to be tested to the first chip so as to execute the voice interaction function of the equipment to be tested and recording the cumulative sending frame number of the audio to be tested;

the second determining module is used for responding to a preset event generated by the voice interaction function, reading the accumulated sending frame number and determining the trigger time point of the preset event based on the accumulated sending frame number;

and the sending module is used for sending the trigger time point of the preset event to the computing equipment so that the computing equipment determines the response delay of the preset event based on the trigger time point and the marking time point of the preset event in the audio to be detected.

the time point receiving module is used for receiving a trigger time point of a preset event, and the trigger time point is sent by the voice testing module in response to the preset event of the voice interaction function; the voice test module is provided with a first chip for executing a voice interaction function;

and the delay determining module is used for determining the response delay of the preset event based on the triggering time point and the marking time point of the preset event in the audio to be tested.

According to another aspect of the present disclosure, there is provided a voice test module, including:

a first chip;

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the voice testing methods of the present disclosure.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the voice testing method of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to execute a voice testing method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method of speech testing according to any of the embodiments of the present disclosure.

In this disclosure, the voice test module can automatically inject the audio frequency to be tested into the first chip, and automatically acquire the cumulative sending frame number based on the preset event, and then automatically determine the response delay of the preset event. The full process from audio infusion to test results is automated, thereby improving test efficiency. The voice test module can complete automatic test only by mounting and dismounting a chip of the equipment to be tested and integrating corresponding voice interaction functions, and the scheme has expandability.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of a voice testing system according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a voice testing method based on a voice testing system according to another embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a voice testing system according to another embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a voice test system performing a wake-up test according to another embodiment of the present disclosure;

FIG. 5 is a diagram illustrating a first packet response time test performed by a voice test system according to another embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a speech test system performing a hard delay test according to another embodiment of the present disclosure;

FIG. 7 is a schematic block diagram of a speech testing system according to another embodiment of the present disclosure;

FIG. 8 is a flow chart illustrating a voice testing method based on a voice testing system according to another embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a voice testing apparatus according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a voice testing apparatus according to another embodiment of the present disclosure;

FIG. 11 is a block diagram of an electronic device for implementing a voice testing method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, it will be recognized by those of ordinary skill in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such as a list of steps or elements. The methods, systems, articles, or apparatus need not be limited to the explicitly listed steps or elements, but may include other steps or elements not expressly listed or inherent to such processes, methods, articles, or apparatus.

When testing the voice interaction function of the device in the related art, the corresponding device to be tested is often required. Take intelligent audio amplifier as an example, when testing A style audio amplifier, need transport A audio amplifier to special test scene. When the B-type sound box is tested, the B-type sound box also needs to be transported to a special test scene. And the test scenario needs to be built for the voice interaction function. Therefore, the voice test is complicated.

In view of this, according to the first aspect of the present disclosure, a voice testing system 100 is provided for a device under test that needs to test a voice interaction function. As shown in fig. 1, the voice test system 100 includes a voice test module 101 and a computing device 102. The voice test module 101 is provided with a first chip 1011 used by the device under test, and is equipped with a voice interaction function of the device under test. Therefore, the voice test module 101 can execute the voice interaction function based on the first chip 1011 of the device under test, so as to implement the voice test on the device under test.

As shown in fig. 2, a schematic flow chart of a voice testing method based on the voice testing system 100 includes:

s201, the voice testing module 101 sends the audio to be tested to the first chip 1011 to execute the voice interaction function of the device to be tested, and records the cumulative sending frame number of the audio to be tested.

S202, the voice test module 101 responds to a preset event generated by the voice interaction function and reads the accumulated sending frame number.

In general, to avoid "talk spurts" of voice interactive devices. The voice interactive apparatus needs to wake up first and then to have a normal conversation with it. The voice test in the embodiment of the present disclosure includes a wake-up test and a voice recognition test. By wake-up test is understood a test whether a voice interaction device is able to wake up correctly in time. By speech recognition test is understood testing whether a speech interactive device is able to talk correctly after waking up.

In general, the wake-up test requires that the voice to be tested carries a wake-up word. And under the condition that the awakening word is detected, the device to be tested executes the awakening operation. Therefore, in the wake-up test scenario, the preset event may be a wake-up operation, that is, the preset event for generating the wake-up test may be determined by detecting the wake-up operation.

Similarly, in the voice recognition test, the audio to be tested is sent to the cloud end by the equipment to be tested for analysis and recognition, and then the recognition result is returned. Corresponding to the embodiment of the present disclosure, the voice testing module 101 sends the audio to be tested to the cloud, and then receives the recognition result. Therefore, in the voice recognition test scene, the preset event can be determined based on the return condition of the voice recognition result. For example, the voice recognition result includes a plurality of response packets, and the voice recognition result is returned to the first chip 1011 packet by packet. When the response time of the head packet needs to be determined (i.e., the time delay from the beginning of speaking to the beginning of receiving the recognition result is determined), the receiving of the head packet response packet can be taken as a preset event. When the hard delay needs to be determined (i.e., the time delay from the end of the speech to the completion of the recognition is determined), the last packet of response packets is received as a preset event.

It should be noted that the preset event may be determined based on the test requirement, and the embodiment of the disclosure does not limit this.

S203, the voice testing module 101 determines the trigger time point of the preset event based on the accumulated sending frame number; the trigger time point for the preset event is sent to the computing device 102.

S204, the computing device 102 determines a response delay of the preset event based on the trigger time point and a time point of the preset event marked in the audio to be tested.

In the embodiment of the present disclosure, the voice test module 101 integrates the first chip 1011 of the device under test, and can execute the voice interaction function based on the first chip 1011 to implement the voice test on the device under test. The voice test module 101 can automatically inject the audio to be tested into the first chip 1011, automatically acquire the number of accumulated sending frames based on the preset event, determine the trigger time point of the preset event, and then automatically determine the response delay of the preset event by the computing device 102. Therefore, based on the computing equipment 102 and the voice test module 101, the whole process from audio infusion to test results is automated, and the test efficiency is improved. In addition, in the embodiment of the present disclosure, the voice test module 101 can complete the automatic test only by mounting and detaching the chip of the corresponding device to be tested and integrating the voice interaction function of the corresponding device to be tested, so that the voice test method based on the system has better expandability.

Voice test among the correlation technique needs the manual work to shout the microphone to use aforementioned A audio amplifier as an example, need the manual work to shout the audio frequency that awaits measuring during the test, gather the chip that the back was transmitted for A audio amplifier in order to carry out the voice interaction function of A audio amplifier by the microphone of A audio amplifier. In this way, not only human resources are consumed, but also audio homology is difficult to guarantee when problems occur and time needs to be repeated. For example, shout people at reconnaissance are different from shout people before reconnaissance, resulting in different sources of audio. Even if the same shouting person has different states in different time periods, the emitted audio frequency has difference, and strict homology is difficult to guarantee.

Therefore, in order to ensure that the audio of the same device to be tested is homologous, the pre-recorded audio is used as a test set to perform the filling test on the voice test module 101 in the embodiment of the disclosure.

In some possible embodiments, as shown in fig. 3, the audio in the test set may be captured by the sampling chip 104. The sampling chip 104 may use multiple microphones to collect audio. For different devices to be tested, only the audio required by the devices to be tested needs to be extracted from the audio collected by the sampling chip 104. For example, the sampling chip 104 collects 8 microphone data and reference data. For the sound box a which needs two paths of microphone data, two paths of microphone data and reference data are extracted from the audio data collected by the sampling chip 104. For a B sound box which needs three microphone data, the three microphone data and the reference data are extracted from the audio data collected by the sampling chip 104.

Of course, if the microphones used by different devices to be tested are different, the audio data may be collected based on the sampling chip 104 and the corresponding microphones. When the performance difference between different microphones is not large or the test result is slightly influenced, the audio does not need to be repeatedly collected based on different microphones. In the implementation, the reasonable selection can be made according to the actual situation.

To implement the pouring test, in some possible embodiments, as shown in fig. 3, the voice test module 101 further includes a memory 1012 and a second chip 1013. The second chip 1013 is a low power consumption chip relative to the first chip 1011 and can mainly integrate a test program, in which a control function for the first chip 1011 is integrated, and the control function mainly controls the first chip 1011 to perform a voice interaction function. Such as SDK (Software Development Kit) with the second chip 1013 integrating voice interaction function. When the voice interaction functions of different devices to be tested are different, the first chip 1011 can be driven only by updating the SDK in the test program. If the chips adopted by the device to be tested are different, the chip of the device to be tested can be detachably mounted on the voice test module 101.

To implement the pouring test, as shown in fig. 3, the memory 1012 in the voice test module 101 is used to store the test set; the second chip 1013 of the voice test module 101 is configured to read the audio to be tested from the test set, and send the audio to be tested to the first chip 1011; a first chip 1011 for recording the cumulative number of transmission frames of the audio to be detected and generating a preset event based on the voice interaction function; the second chip 1013 in the voice test module 101 is further configured to, in response to a preset event, read the cumulative sending frame number of the audio to be tested from the first chip 1011, and determine a trigger time point of the preset event based on the cumulative sending frame number; the trigger time point of the preset event is sent to the computing device 102.

In the embodiment of the disclosure, based on the pre-recorded test set, it can be ensured that the test audio is homologous whenever the test is performed. After the audio record is made, the audio infused to the first chip 1011 is not mixed with noise other than the audio, especially is not affected by noise of a test site, and has no strict requirement on a test environment, so that the accuracy of a test result is ensured. In addition, compared with the first chip 1011 reading the test set, the second chip 1013 is used to control the second chip 1013 to perform the test, so that the functional modification of the first chip 1011 can be reduced, and the test can be performed conveniently.

In some possible embodiments, after obtaining the cumulative sending frame number, the voice testing module 101 determines the trigger time point of the preset event based on the cumulative sending frame number and the playing duration of the single-frame audio. For example, if the playing duration of a single frame of audio is 10ms, the cumulative sending frame number is 5 frames, and the triggering time point of the preset event is 50ms. That is, the trigger time point of the preset event is relative to the time axis of the test audio itself, and the trigger time point is the time length from the starting playing time point of the audio to be tested to the time when the preset event occurs.

In the embodiment of the disclosure, the trigger time point of the preset event can be quickly and accurately determined based on the accumulated sending frame number of the preset event. Based on the accumulated sending frame number, the trigger time point of the preset event is accurately determined, the wrong trigger time data cannot be generated, the wrong data does not need to be cleaned, and therefore the testing efficiency is improved.

For ease of understanding, the cumulative number of transmission frames is described in detail below for different test cases:

in case 1, under the condition of testing the voice wakeup capability, the voice test module 101 monitors a wakeup signal triggered by the first chip 1011 based on the audio to be tested; under the condition of monitoring the awakening signal, acquiring the cumulative sending frame number of the audio to be detected;

as shown in fig. 4, the audio to be tested is infused with a frame of audio to be tested to the second chip 1013 through the first chip 1011, the second chip 1013 accumulates the accumulated sending frame number of the audio to be tested in real time, and compresses the audio data of the audio to be tested in real time and sends the compressed audio data to the first chip 1011. The first chip 1011 sends the compressed packet of the audio to be detected to the cloud for detection in real time. The cloud detects the received audio data in real time and returns the identification result to the second chip 1013 in real time. The second chip 1013 sends the recognition result to the first chip 1011 in real time, and when the first chip 1011 determines that the recognition result contains a wakeup word, it determines that a wakeup operation needs to be performed. Thus, the first chip 1011 transmits an interrupt signal to wake up the voice interactive function. When the second chip 1013 detects the interrupt signal, it is determined that a preset event is generated. Based on the preset event, the second chip 1013 reads the cumulative transmission frame number from the first chip 1011, determines the trigger time point of the preset event based on the cumulative transmission frame number, and transmits to the computing device 102.

The computing device 102 determines a wakeup time point 1 based on the cumulative sending frame number, and then compares the wakeup time point 1 with a wakeup time point 2 of the audio label to be tested to obtain wakeup delay (i.e. wakeup response time). I.e. the time difference between wake-up time point 1 and wake-up time point 2 is determined as the wake-up delay.

Case 2, monitoring the receiving condition of the first chip 1011 to the voice recognition result of the audio to be tested under the condition of testing the voice recognition capability; and acquiring the cumulative sending frame number of the audio to be detected under the condition of receiving the first response packet of the voice recognition result.

As shown in fig. 5, the second chip 1013 transmits the audio to be detected compressed by the first chip 1011 to the cloud in real time, and the cloud analyzes and identifies the audio to be detected and then returns the response identification result to the second chip 1013 packet by packet. The second chip 1013 determines that a preset event is generated in the case where the first response packet is received, reads the cumulative transmission frame number based on the first chip 1011, and determines a trigger time point of the preset event based on the cumulative transmission frame number.

Of course, in other embodiments, the second chip 1013 may send the data packet returned from the cloud to the first chip 1011, and the first chip 1011 parses the data packet to determine whether the data packet is a response packet. If the response packet is received, the first chip 1011 generates an event of receiving the first response packet to the second chip 1013, so that the second chip 1013 determines that the preset event is monitored, reads the cumulative sending frame number based on the first chip 1011, then determines the trigger time point of the preset event based on the cumulative sending frame number, and sends the trigger time point to the computing device 102.

The computing device 102 compares the trigger time point of the preset event with the voice starting point marked in the audio to be tested to obtain the first packet response time. Wherein, the starting point of the audio is the time point of starting speaking in the audio to be tested.

And 3, acquiring the cumulative sending frame number of the audio to be detected under the condition of receiving the last response packet of the voice recognition result.

Similarly, as shown in fig. 6, the cloud sends the recognition result to the second chip 1013 of the voice mode module packet by packet. The last response packet may then be determined by the second chip 1013 as in the method shown in fig. 5. In the case where it is determined that the last response packet is obtained, the first chip 1011 reads the cumulative number of transmission frames from the second chip 1013, determines the trigger time point of a preset event based on the cumulative number of transmission frames, and transmits it to the computing apparatus 102.

Of course, in other embodiments, the first chip 1011 may determine that the last response packet is received.

The computing device 102 determines the time of receipt of the last response packet based on the accumulated number of transmitted frames, and then calculates the time difference with the end point of the speech marked in the audio to be tested as a hard delay. The end point of the speech is the time point when the speech is ended in the audio to be tested.

In the embodiment of the disclosure, aiming at the awakening and voice recognition capabilities, different trigger events are provided according to different capabilities, the cumulative sending frame number is obtained at an accurate time, and then an accurate trigger time point can be obtained aiming at each preset time, so that the accuracy of the test is improved.

Furthermore, the wake capability test, in addition to testing wake response time, may also count wake accuracy and false wake situations by the computing device 102. For example, when the wake-up is determined to be earlier than the marked wake-up time point based on the wake-up response time, the wake-up is a false wake-up, and based on the false wake-up, the false wake-up rate in the multiple wake-up tests can be counted. The wake-up accuracy is understood as the correct wake-up rate, which can be expressed as the ratio of the number of accurate wake-up tests to the total number of wake-up tests.

Similarly, the speech recognition capability test, in addition to the first packet response time and the hard delay, has other test indicators in the embodiment of the present disclosure including:

the character accuracy is that each character recognized in the voice recognition result is compared with each character in the audio frequency to be detected, and the character recognition accuracy is obtained;

for each sentence, if a recognition error including multiple characters, few characters, and a character error is generated, the sentence is determined as a sentence which is not correctly recognized. The ratio of the sentences which are not correctly recognized to all the sentences can be counted as the sentence accuracy, and the statistical index is used for measuring the voice interaction function of the equipment to be tested.

Of course, it should be noted that other voice test indexes are also applicable to the embodiment of the present disclosure, and are not limited thereto.

In order to facilitate the implementation of the management function of the test, the following scheme is further provided in the embodiment of the present disclosure.

In some possible embodiments, the technical equipment can support the testing personnel to flexibly control the testing task of the voice testing module 101. For example, the computing device 102 may generate an audio test instruction based on the input operation of the tester, where the audio test instruction includes an audio identifier of the audio to be tested. Then, the computing device 102 sends the audio test instruction to the voice test module 101; the voice test module 101 reads the audio to be tested from the test set based on the audio identifier in the audio test command. Then, the voice test module 101 feeds the audio to be tested to the first chip 1011 to perform the subsequent voice test.

The embodiment of the present disclosure implements accurate definition of the audio to be tested based on the audio test instruction, so as to flexibly allocate the test task to the voice test module 101, and enable the voice test module 101 to accurately complete the test based on the test task.

To further facilitate management of the voice test module 101, in the embodiment of the disclosure, the computing device 102 may also maintain the state of the voice test module 101. Such as:

under the condition that the voice test module 101 has no voice test task, marking the voice test module 101 as an idle state;

under the condition that the voice test module 101 executes the voice test task, the voice test module 101 is marked as executing;

in case of a failure of the voice test module 101, the voice test module 101 is marked as unusable.

Therefore, the voice test module 101 is defined in different states, so that the working condition of the voice test module 101 can be known conveniently, and the corresponding task can be distributed to the voice test module 101 conveniently.

For example, as shown in fig. 7, the same computing device 102 can perform voice testing on multiple voice testing modules 101 at the same time. The connected voice test modules 101 may be the same or different and are all suitable for use in the embodiments of the present disclosure. The status of each voice test module 101 is maintained in the computing device 102 so that the tester can know which voice test modules 101 have failed and which voice test modules are performing test tasks. It can be known which voice test modules 101 can be assigned tasks based on the idle state. In practice, the computing device 102 communicates with each voice test module 101 via a serial port. The tester can distribute the voice test task to the voice test module 1011 in fig. 7 through the serial port. For example, the voice test module 1011 sends the voice test command through the serial port. The voice test instruction may include an audio identifier of the audio to be tested, or may include audio identifiers of multiple audio to be tested. The voice test module 1011 executes a test task based on the language test command. If the audio test module 1011 includes a plurality of audio frequencies to be tested, the audio test module tests the plurality of audio frequencies to be tested one by one. Determining a trigger time point of a preset event based on the accumulated sending frame number; sent to the computing device 102 via the serial port. The test metrics are statistically analyzed by the computing device 102.

The tester can not only distribute tasks to the voice test module 1011, but also distribute tasks to any other idle voice test module 101, so that the same computing equipment 102 can control a plurality of voice test models to test, and only one tester can complete the test of the multi-voice test module 101, thereby reducing the labor cost and improving the test efficiency.

In the embodiment of the present disclosure, by maintaining the state of the voice testing module 101, it is convenient to understand the working condition of the voice testing module 101, and it is convenient to manage the voice testing module 101, thereby improving the testing efficiency.

In addition, the embodiment of the present disclosure also allows the state of the voice test module 101 to be modified as needed. May be implemented to modify the state of the voice test module 101 in response to user manipulation by the computing device 102. The voice test module 101 executes corresponding operations based on the state change. For example, a tester may manually modify the state of a voice test module 101 through the computing device 102 when it finds that it is time to assign a task to that voice test module. If the modification is not available, the voice test module 101 will terminate the test. Then, the tester modifies the voice test module 101 to an idle state, and may re-assign the test task to the voice test module 101.

For another example, when the voice test module 101 is not available, it indicates that the test is problematic. After the problem is solved, the tester can change the voice test module 101 to an idle state, so that the voice test module 101 can perform the voice test task again.

In summary, the present disclosure supports flexible updating of the state of the voice test module 101, and can improve the efficiency of test management.

In addition, the embodiment of the disclosure can also provide an alarm capability, so that a tester can know key information even if the tester is not in a test field. For example, in the event of a failure of the voice test module 101, the computing device 102 performs an alarm operation. Such as a hardware failure in the voice test module 101, no corresponding audio to be tested in the test set of the memory 1012, an audio file error of the audio to be tested, and no wake-up for a long time.

For the faults of the voice testing module 101, part of the faults may be reported to the computing device 102 by the voice testing module 101, and part of the faults may be detected by the computing device 102. For example, when the audio to be tested cannot be sent to the cloud, the voice test module 101 may report the audio to the computing device 102. For example, after sending the audio to be tested, the voice test module 101 outputs the voice recognition result of the audio to be tested to the computing device 102, and when the computing device 102 cannot obtain the voice recognition result for a long time or the voice recognition result does not include a wakeup word, the computing device 102 determines that the voice recognition module has a fault. In any way, the computing device 102 can perform an alarm operation regardless of the failure of the voice recognition module. For example, the alarm information may be sent to the tester through an instant messenger, may be sent to the tester through a mail, and may be sounded for alarm through an audio player built in the computing device 102. Specific alarm modes the embodiments of the present disclosure are not limited.

In conclusion, the embodiment of the present disclosure supports the alarm operation when a fault occurs, so as to know that a test task has a problem in time, and can ensure organized operation of the system.

To facilitate understanding of the overall testing process, the following description is made with reference to fig. 8, which, as shown in fig. 8, includes:

s801, the sampling chip 104 collects a test set, and writes the test set into an SD Card (Secure Digital Memory Card/SD Card) of the voice test module 101.

Wherein, the test set comprises audio data of different test scenes. Such as wake-up tests, voice recognition tests. Wherein, under different test scenes, the sub-test scenes can be further subdivided. For example, tests for different distances, such as tests for waking up at distances of 1 meter, 3 meters, and 5 meters from the device under test, and tests for voice recognition are performed.

Besides the sub-test scenes with different distances, the method can further comprise an internal dryness scene test and an external dryness scene test. The difference between interior and exterior dryness is whether the source of the noise is from the interior or exterior environment. For example, when a wake-up test is performed, the user speaks while having ambient noise, and the noise comes from outside the device to be tested, which is a condition of external noise. When the user speaks, the device to be tested plays music, and the acquired audio frequency contains the audio frequency content of the device to be tested, so that the situation is internal dryness.

S802, the voice test module 101 responds to the programming instruction of the test program and writes the test program.

The test program comprises test cases of different test scenes and also integrates the SDK of the voice interaction function.

S803, the computing device 102 responds to the user operation, generates an audio test instruction and sends the audio test instruction to the voice test module 101.

S804, the voice test module 101 reads the audio to be tested from the SD card based on the audio test command, sends the audio to be tested to the first chip 1011 of the voice test module 101, and records the cumulative sending frame number.

S805, the voice testing module 101 reads the cumulative sending frame number based on the preset event generated by the voice interaction function, and determines the product of the cumulative sending frame number and the single-frame sending duration as the trigger time point of the preset event.

It should be noted that, in the case that it is determined that the audio test to be tested is completed, the computing device 102 resets the cumulative sending frame number recorded by the first chip 1011. That is, after the test of each audio frequency to be tested is finished, the accumulated sending frame number needs to be reset so as to be suitable for the test of the next audio frequency to be tested.

In addition, in the embodiment of the present disclosure, the first chip 1011 and the second chip 1013 of the voice test module 101 may implement communication based on an SPI (Serial Peripheral Interface).

S806, the voice testing module 101 sends the trigger time point of the preset event to the computing device 102.

S807, the computing device 102 determines a response delay based on the trigger time point of the preset event and the annotation time point of the preset event in the audio to be tested.

For the wake-up scenario and the voice recognition scenario, if the statistical test index is already described in the foregoing, details are not repeated here.

And S808, the computing equipment 102 sends alarm information to the terminal equipment under the condition that the voice test module 101 is determined to have a fault.

In summary, in the embodiment of the present disclosure, based on the voice test module 101 and the computing device 102, the voice test can be automatically performed, so as to obtain a plurality of voice test indexes, and implement full-process management of the voice test.

In addition, in order to ensure that the voice test can be normally performed, in the embodiment of the present disclosure, before the test is started each time, or when a new device to be tested or a new voice interaction function needs to be tested, the test verification may be performed first, and under the condition that the test verification passes, all tests are completed.

The test and verification comprises the following steps:

1. and data transmission consistency checking, namely verifying whether the data transmitted by the second chip to the first chip 1011 are consistent. During implementation, the first packet of data of each frame of the first chip 1011 and the second chip can be printed for comparison, and if the first packet of data is consistent, the data transmission consistency check is passed. Experiments prove that the data in the embodiment of the invention are completely consistent, and the frame loss condition can not occur even after long-time pouring test, so that the voice test method provided by the embodiment of the invention is stable and reliable.

2. And (3) checking the accuracy of the test result, namely, detecting and comparing the method provided by the embodiment of the disclosure with the standard test result, and finding that the error rate of the method is extremely low through comparison, so that the method is an accurate and feasible method.

Based on the same technical concept, the embodiment of the present disclosure further provides a voice testing method, which is applied to the voice testing module 101, and the execution method of the voice testing module 101 has been described in the foregoing, for example, the related methods of the voice testing module 101 in fig. 2 and fig. 8 are executed, and are not described herein again.

Similarly, based on the same technical concept, the embodiment of the present disclosure further provides a voice testing method, which is applied to the foregoing computing device 102, and the execution method of the computing device 102 is already described in the foregoing, for example, the related method of the computing device 102 in fig. 2 and fig. 8 is executed, and is not repeated here.

Based on the same technical concept, an embodiment of the present disclosure further provides a voice testing apparatus 900, as shown in fig. 9, which is a schematic structural diagram of the apparatus, including:

a first determining module 901, configured to send the audio to be tested to a first chip to execute a voice interaction function of the device to be tested, and record an accumulated sending frame number of the audio to be tested;

a second determining module 902, configured to read an accumulated sending frame number in response to a preset event generated by the voice interaction function, and determine a trigger time point of the preset event based on the accumulated sending frame number;

a sending module 903, configured to send the trigger time point of the preset event to the computing device, so that the computing device determines a response delay of the preset event based on the trigger time point and a labeled time point of the preset event in the audio to be detected.

In some embodiments, the second determining module is specifically configured to:

acquiring the accumulated sending frame number based on at least one of the following methods:

the method comprises the steps that 1, under the condition of testing voice awakening capacity, an awakening signal triggered by a first chip based on audio to be tested is monitored; under the condition of monitoring the awakening signal, acquiring the cumulative sending frame number of the audio to be detected;

the method 2 includes monitoring the receiving condition of the voice recognition result of the audio to be tested under the condition of testing the voice recognition capability; acquiring the cumulative sending frame number of the audio to be detected under the condition of receiving the first response packet of the voice recognition result;

In some embodiments, the second determining module is specifically configured to: and determining the trigger time point of the preset event based on the accumulated sending frame number and the playing time length of the single-frame audio.

In some embodiments, the apparatus further comprises:

the audio receiving module is used for receiving an audio test instruction before the determining module sends the audio to be tested to the first chip, wherein the audio test instruction comprises an audio identifier of the audio to be tested;

the first determining module is further used for reading the audio to be tested from the test set based on the audio identification.

Based on the same technical concept, an embodiment of the present disclosure further provides a voice testing apparatus 1000, as shown in fig. 10, including:

a time point receiving module 1001, configured to receive a trigger time point of a preset event, where the trigger time point is sent by the voice test module in response to the preset event of the voice interaction function; the voice test module is provided with a first chip for executing a voice interaction function;

the delay determining module 1002 is configured to determine a response delay of a preset event based on a trigger time point and a time point of a preset event marked in the audio to be detected.

In some embodiments, further comprising:

the instruction sending module is used for sending the audio test instruction to the voice test module; the audio test instruction comprises an audio identifier of the audio to be tested.

In some embodiments, the system further comprises a state management module configured to:

under the condition that the voice test module has no voice test task, marking the voice test module as an idle state;

under the condition that the voice test module executes the voice test task, marking the voice test module as executing;

and marking the voice test module as unavailable under the condition that the voice test module has a fault.

In some embodiments, the state management module is further configured to modify the state of the voice test module in response to a user operation.

In some embodiments, further comprising:

and the warning module is used for executing warning operation under the condition that the voice test module fails.

For a description of specific functions and examples of each module and sub-module of the apparatus in the embodiment of the present disclosure, reference may be made to the description of corresponding steps in the foregoing method embodiments, and details are not repeated here.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

According to an embodiment of the present disclosure, the present disclosure also provides a voice test module, an electronic device, a readable storage medium, and a computer program product.

The voice test module comprises:

a first chip;

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the methods described above.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as the voice test method. For example, in some embodiments, the voice testing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM1102 and/or communications unit 1109. When the computer program is loaded into RAM 1103 and executed by the computing unit 1101, one or more steps of the speech testing method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the method voice test method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain. The cloud as in the embodiments of the present disclosure may be implemented as a server.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. The utility model provides a voice test system, includes voice test module and computing equipment, be provided with the first chip that the equipment under test adopted in the voice test module, wherein:

the voice test module is provided with a voice interaction function of the equipment to be tested and is used for sending the audio frequency to be tested to the first chip so as to execute the voice interaction function and recording the accumulated sending frame number of the audio frequency to be tested; reading the accumulated sending frame number in response to a preset event generated by the voice interaction function; determining a trigger time point of the preset event based on the accumulated sending frame number; sending a trigger time point of the preset event to the computing device;

and the computing equipment determines the response delay of the preset event based on the trigger time point and the marking time point of the preset event in the audio to be tested.

2. The system of claim 1, wherein the voice test module further comprises a memory and a second chip:

the memory is used for storing a test set;

the second chip is used for reading the audio to be tested from the test set and sending the audio to be tested to the first chip;

the first chip is used for recording the accumulated sending frame number of the audio to be tested and generating a preset event based on the voice interaction function;

the second chip is also used for responding to the preset event and reading the accumulated sending frame number of the audio to be tested from the first chip; determining a trigger time point of the preset event based on the accumulated sending frame number; and sending the trigger time point of the preset event to the computing equipment.

3. The system of claim 1 or 2,

the computing equipment is also used for sending an audio test instruction to the voice test module; the audio test instruction comprises an audio identifier of the audio to be tested;

and the voice test module is used for reading the audio to be tested from the test set based on the audio identification in the audio test instruction.

4. The system of any of claims 1-3, the computing device to further:

under the condition that the voice test module executes a voice test task, marking the voice test module as executing;

and under the condition that the voice test module fails, marking the voice test module as unavailable.

5. The system of claim 4, the computing device further to modify a state of the voice test module in response to a user operation;

and the voice test module executes corresponding operation based on the state change.

6. The system of any one of claims 1-5, the computing device further to perform an alarm operation in the event of a failure of the voice test module.

7. The system according to any of claims 1-6, wherein the speech test module is specifically configured to obtain the cumulative sending frame number based on at least one of the following methods:

the method 1 includes monitoring a wake-up signal triggered by the first chip based on the audio to be tested under the condition of testing voice wake-up capability; acquiring the cumulative sending frame number of the audio to be detected under the condition of monitoring the awakening signal;

the method 2 comprises the steps of monitoring the receiving condition of the voice recognition result of the audio to be tested under the condition of testing the voice recognition capability; under the condition of receiving the first response packet of the voice recognition result, acquiring the cumulative sending frame number of the audio to be detected;

8. The system of any one of claims 1-7, the voice test module, in particular to: and determining the trigger time point of the preset event based on the accumulated sending frame number and the playing time length of the single-frame audio, and sending the trigger time point of the preset event to the computing equipment.

9. A voice testing method, comprising:

responding to a preset event generated by the voice interaction function, reading the accumulated sending frame number, and determining a trigger time point of the preset event based on the accumulated sending frame number;

sending the trigger time point of the preset event to computing equipment, so that the computing equipment determines the response delay of the preset event based on the trigger time point and the marking time point of the preset event in the audio to be tested.

10. The method of claim 9, wherein reading the accumulated number of transmission frames in response to a preset event generated by the voice interactive function comprises:

the method 1 includes monitoring a wake-up signal triggered by the first chip based on the audio to be tested under the condition of testing voice wake-up capability; under the condition of monitoring the awakening signal, acquiring the cumulative sending frame number of the audio to be detected;

11. The method of claim 9 or 10, wherein the determining a trigger time point of the preset event based on the accumulated number of transmission frames comprises:

and determining the trigger time point of the preset event based on the accumulated sending frame number and the playing time length of the single-frame audio.

12. The method according to any of claims 9-11, wherein prior to sending audio under test to the first chip, further comprising:

receiving an audio test instruction, wherein the audio test instruction comprises an audio identifier of the audio to be tested;

and reading the audio to be tested from a test set based on the audio identification.

13. A voice testing method, comprising:

receiving a trigger time point of a preset event, wherein the trigger time point is sent by a voice test module in response to the preset event of the voice interaction function; the voice test module is provided with a first chip for executing the voice interaction function;

and determining the response delay of the preset event based on the trigger time point and the marked time point of the preset event in the audio to be tested.

14. The method of claim 13, further comprising:

sending an audio test instruction to the voice test module; the audio test instruction comprises the audio identification of the audio to be tested.

15. The method of claim 13 or 14, further comprising:

16. The method of claim 15, further comprising:

and responding to user operation, and modifying the state of the voice test module.

17. The method according to any one of claims 13-16, further including:

and executing alarm operation under the condition that the voice test module fails.

18. A voice testing apparatus comprising:

the first determining module is used for sending the audio to be tested to the first chip so as to execute the voice interaction function of the equipment to be tested and recording the accumulated sending frame number of the audio to be tested;

a second determining module, configured to read the cumulative sending frame number in response to a preset event generated by the voice interaction function, and determine a trigger time point of the preset event based on the cumulative sending frame number;

and the sending module is used for sending the trigger time point of the preset event to the computing equipment so that the computing equipment determines the response delay of the preset event based on the trigger time point and the marking time point of the preset event in the audio to be tested.

19. The apparatus of claim 18, wherein the second determining module is specifically configured to:

20. The apparatus according to claim 18 or 19, wherein the second determining means is specifically configured to: and determining the trigger time point of the preset event based on the accumulated sending frame number and the playing time length of the single-frame audio.

21. The apparatus of any of claims 18-20, further comprising:

the first determining module is further configured to read the audio to be tested from a test set based on the audio identifier.

22. A voice testing apparatus comprising:

the voice interaction module is used for responding to a preset event of a voice interaction function and sending a trigger time point of the preset event to the voice test module; the voice test module is provided with a first chip for executing the voice interaction function;

23. The apparatus of claim 22, further comprising:

the instruction sending module is used for sending an audio test instruction to the voice test module; the audio test instruction comprises the audio identification of the audio to be tested.

24. The apparatus of claim 22 or 23, further comprising a state management module to:

25. The apparatus of claim 24, the status management module further configured to modify the status of the voice test module in response to a user action.

26. The apparatus of any of claims 21-25, further comprising:

27. A voice test module, comprising:

a first chip;

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 9-12.

28. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 13-17.

29. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 9-17.

30. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 9-17.