CN111696523B

CN111696523B - Accuracy testing method and device of voice recognition engine and electronic equipment

Info

Publication number: CN111696523B
Application number: CN201910184695.6A
Authority: CN
Inventors: 肖搏文; 林芊
Original assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Current assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2024-03-01
Anticipated expiration: 2039-03-12
Also published as: CN111696523A

Abstract

The invention discloses an accuracy testing method and device of a voice recognition engine and electronic equipment; the method comprises the following steps: obtaining a test data set comprising at least one piece of test data; generating an expected test result according to the test data set; acquiring configuration data, and setting configuration parameters of a voice recognition engine according to the configuration data; inputting the test data set into the voice recognition engine, and performing voice recognition on the test data to obtain an actual test result; and generating a test report according to the actual test result and the expected test result. The invention can realize the accuracy test of the automatic voice recognition engine, has high detection efficiency and lower cost, and can effectively improve the reliability of the accuracy test result of the voice recognition engine.

Description

Accuracy testing method and device of voice recognition engine and electronic equipment

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to a method and an apparatus for testing accuracy of a speech recognition engine, and an electronic device.

Background

Currently, with the development of speech recognition technology, more and more speech recognition engines are applied to electronic devices commonly used by people in daily life; the existing voice recognition engine can perform voice recognition and semantic understanding on voice of a user, and control the electronic equipment to correspondingly execute preset operations according to voice recognition and semantic understanding results. However, because the working environment of the speech recognition engine and the related hardware and software structure are complex, errors can easily occur in use, and accuracy is affected, so that the speech recognition engine needs to be subjected to regular accuracy tests. The existing accuracy test for the voice recognition engine is generally manual detection, and the manual detection has various defects, such as unstable test process, nonstandard pronunciation of a tester, negligence of judgment of the tester and the like, which can influence the reliability of the accuracy test result of the voice recognition engine; in addition, manual detection consumes a lot of manpower, is inefficient, and is costly.

Disclosure of Invention

Therefore, the invention aims to provide a method, a device and an electronic device for testing the accuracy of a voice recognition engine, which can realize the automatic accuracy test of the voice recognition engine, have high detection efficiency and lower cost, and can effectively improve the reliability of the accuracy test result of the voice recognition engine.

Based on the above object, the present invention provides a method for testing the accuracy of a speech recognition engine. A method for accuracy testing of a speech recognition engine, comprising:

obtaining a test data set comprising at least one piece of test data;

generating an expected test result according to the test data set;

acquiring configuration data, and setting configuration parameters of a voice recognition engine according to the configuration data;

inputting the test data set into the voice recognition engine, and performing voice recognition on the test data to obtain an actual test result;

and generating a test report according to the actual test result and the expected test result.

In other embodiments of the invention, the test data is: audio data collected under an application environment of the speech recognition engine; when at least two pieces of test data exist, at least two pieces of test data are all collected from the same sound source.

In some other embodiments of the present invention, the acquiring a test data set including at least one piece of test data includes:

invoking a preset voiceprint template, and generating at least one piece of audio data based on the voiceprint template;

the at least one piece of audio data is acquired as the at least one piece of test data.

In some other embodiments of the present invention, the setting the configuration parameters of the speech recognition engine according to the configuration data includes:

analyzing the configuration data to obtain language configuration information; setting languages used by the voice recognition engine according to the language configuration information;

analyzing the configuration data to obtain working mode configuration information; setting the working mode of the voice recognition engine according to the working mode configuration information, and calling a preset voice recognition word stock according to the corresponding working mode.

In some other embodiments of the present invention, the inputting the test data set into the speech recognition engine performs speech recognition on the test data to obtain an actual test result, including:

inputting the test data in the test data set through an audio data input interface of the voice recognition engine, and performing voice recognition;

recording a voice recognition result, a semantic recognition result, an operation type and an operation parameter of the test data;

and establishing a table, and storing the test data and the corresponding voice recognition result, semantic recognition result, operation type and operation parameter in the form of table items respectively to obtain the actual test result.

In some other embodiments of the present invention, the generating a test report according to the actual test result and the expected test result includes:

integrating the actual test result and the expected test result into a table;

comparing the actual test result with the corresponding table item in the expected test result, and generating a comparison result;

and highlighting the table entries with different comparison results.

In other embodiments of the present invention, the method for testing accuracy of a speech recognition engine further includes:

and sending the test report to a pre-associated test report receiving terminal.

receiving update data;

and updating the test data set and the configuration data according to the update data.

Based on the same inventive concept, the invention also provides an accurate testing device of the voice recognition engine. An accurate test apparatus of a speech recognition engine, comprising:

an acquisition module for acquiring a test data set comprising at least one piece of test data;

the generation module is used for generating expected test results according to the test data set;

the configuration module is used for acquiring configuration data and setting configuration parameters of the voice recognition engine according to the configuration data;

the test module is used for inputting the test data set into the voice recognition engine, and carrying out voice recognition on the test data to obtain an actual test result;

and the result processing module is used for generating a test report according to the actual test result and the expected test result.

Based on the same inventive concept, the invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to implement the method according to any one of the embodiments.

From the above, it can be seen that the accuracy testing method, device and electronic equipment for the speech recognition engine provided by the invention can test the input data of the speech recognition engine through the generated testing data set, so as to avoid the adverse effects of the testing environment factors and the artificial subjective factors caused by the manual testing in the prior art; meanwhile, configuration parameters of the voice recognition engine are set through configuration data, so that automatic testing is achieved, human participation and intervention in the testing process are reduced to the greatest extent, automatic accuracy testing of the voice recognition engine is achieved, detection efficiency is high, cost is low, and reliability of accuracy testing results of the voice recognition engine can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an embodiment of a method for testing accuracy of a speech recognition engine according to the present invention;

FIG. 2 is a flow chart illustrating a method for testing accuracy of a speech recognition engine according to another embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for testing accuracy of a speech recognition engine according to another embodiment of the present invention.

Detailed Description

The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

The traditional method for testing the accuracy of the voice recognition engine by using the method adopts manual testing, and although the testing method can play a role in testing to a certain extent, subjective factors of testers play a certain interference role in the testing process so that the testing result is inaccurate. The method, the device and the electronic equipment provided by the invention adopt full-automatic test of the accuracy of the voice recognition engine, greatly reduce test errors caused by subjectivity of testers, have high detection efficiency and lower cost, and can effectively improve the reliability of accuracy test results of the voice recognition engine.

In one embodiment of the present invention, a method for testing accuracy of a speech recognition engine, referring to fig. 1, includes:

step 101: obtaining a test data set comprising at least one piece of test data;

in this embodiment, a test data set including at least one piece of test data is obtained according to a test requirement, where the test requirement refers to a certain rule that needs to be recognized by a speech recognition engine or a certain requirement that needs to be recognized by a speech recognition engine by a user, for example, through the user speech recognition engine recognition: searching for a place, or opening, closing a program, etc. Specifically, for example, the test requirements are: searching for a sea hall in a starching area, and generating a test data set including at least one piece of test data corresponding to the sea hall may be: the sea hall in the sea lake area is searched through the audio file recorded by the professional.

In this embodiment, according to the test requirement, the test data set including at least one piece of test data can be obtained by directly collecting the existing test data. Wherein, the test data is: audio data collected under an application environment of a speech recognition engine; when there are at least two pieces of test data, at least two pieces of test data are all collected from the same sound source. Of course, in other embodiments of the invention, at least two test data may also be collected from different sound sources.

In this embodiment, according to the test requirement, the method may further include, after generating the test data, obtaining a test data set including at least one piece of test data, including: invoking a preset voiceprint template, and generating at least one piece of audio data based on the voiceprint template; the at least one piece of audio data is acquired as the at least one piece of test data.

Specifically, the generation of at least one piece of audio data through the voiceprint template belongs to the process that the test requirement is input into a computer and other equipment, the computer and other equipment control a device with the voiceprint template to automatically generate at least one piece of audio data as test data, and the generation of the test data belongs to the whole process without participation of people and full automation. The method for automatically generating the test data through the voiceprint template further realizes the automation of the whole test process, and the audio data generated according to the same voiceprint template cannot be unnecessarily disturbed by some unstable factors of people, so that the detection efficiency is high and the cost is lower.

The voiceprint template is a preset program which is used for extracting voiceprint characteristics of a specific user and generating audio data with the voiceprint characteristics based on the extracted voiceprint characteristics. Voiceprint features can be formed by analyzing the voice of a particular user by algorithms conventional in the art, which are not limited in this application. In addition, the selection of a specific user depends on the voice parameters such as the language, tone and the like of the voice required by the test method of the present invention, and those skilled in the art can reasonably set and select according to the implementation needs, which is not limited in this application.

It should be noted that in other embodiments of the present invention, the test data set may be an audio file recorded by a professional for meeting the test requirements. In particular, the audio files recorded by the professional can be used as both existing test data and on-site generated test data, depending on the specific application scenario, personnel configuration, etc.

The audio file recorded by the professional is adopted, and the audio file not only comprises the sound containing the test requirement recorded by the professional, but also comprises background sounds sent by surrounding environments, so that the sound of a user in a specific use environment can be restored more truly, and the accuracy of the voice recognition engine in an actual environment is higher. For example, a professional records an audio file in a car that needs to navigate to a place, and then the audio file includes not only a sound of "navigating to a place" but also a background sound that is specific to the environment, such as an engine sound of the car.

Of course, the audio data generated by the voiceprint template or the audio file recorded by the professional can be stored in a device with a storage function such as a computer for later steps.

Step 102: generating an expected test result according to the test data set;

specifically, the expected test results are generated according to the test data set, wherein in the embodiment, the expected results obtained according to the test data set are shown in table 1:

TABLE 1 expected test results

As can be seen from table 1, in the test dataset audio file 1, the voice recognition result query is "turn off the air conditioner", wherein the domain corresponding to the voice recognition result query is the air conditioner cns. In the test data set audio file 2, the voice recognition result query is "adjust player to rock mode", wherein the corresponding domain is player cns.equalizer, the semantic recognition result is adjustment switch, and the parameter is rock mode { 'mode': [ 'rock' ] };

the analysis process in other audio files such as the audio file 3 is also performed by the method described above, and will not be described herein.

Step 103: acquiring configuration data, and setting configuration parameters of a voice recognition engine according to the configuration data;

specifically, according to the test requirement, the configuration data is determined, and the configuration data is obtained. The determined and further acquired configuration data are correspondingly different according to different test requirements. The configuration data may be configuration information including language, working mode, audio type, etc., and after obtaining the corresponding configuration information, the corresponding configuration parameters of the speech recognition engine may be set. The configuration parameters are parameters which are used by the voice recognition engine according to specific test requirements. In this embodiment, setting configuration parameters of the speech recognition engine according to the configuration data includes:

analyzing the configuration data to obtain language configuration information; setting languages used by a voice recognition engine according to the language configuration information; wherein the languages can be Chinese, english, german, etc. For example, the test requirement is "turn off the air conditioner" described above, and after a series of recognition, the language used by the speech recognition engine is set to be chinese; the test requirement is Turn off the air conditioner, after a series of recognition, the language used by the speech engine is set to be english, and the recognition modes of other languages are the same as the above, and are not repeated here. The arrangement mode can enable the voice recognition engine to respond timely according to different languages to be recognized and replace the voice recognition engine with the corresponding language, so that the voice recognition engine can realize automatic testing.

Analyzing the configuration data to obtain working mode configuration information; setting the working mode of the voice recognition engine according to the working mode configuration information, and calling a preset voice recognition word stock according to the corresponding working mode. Wherein, the working mode includes: an offline (local) mode and an online (server) mode. Specifically, the method for calling the preset speech recognition word stock corresponding to the offline mode is a local word stock, namely, the word stock pre-existing in the equipment for testing the speech recognition engine is not required to be connected, the accuracy test of the speech engine can be completed according to the word stock, and the local word stock provided by the offline mode cannot lose word stock information due to other reasons, so that the method can be widely used; the corresponding call preset voice of the online mode is the word stock used as the server word stock, namely, in the testing process, a series of word stocks provided on the Internet can be obtained in real time through networking, the adopted voice recognition word stock is large and can be updated in time, and of course, in the online mode, the local word stock can be adopted at the same time for the needs of time.

Step 104: inputting the test data set into a voice recognition engine, and performing voice recognition on the test data to obtain an actual test result;

in this embodiment, inputting the test data set into the speech recognition engine, and performing speech recognition on the test data to obtain an actual test result, including:

inputting test data in the test data set through an audio data input interface of a voice recognition engine, and performing voice recognition;

and establishing a table, and respectively storing the test data, the corresponding voice recognition result, the semantic recognition result, the operation type and the operation parameters in the form of table items so as to obtain an actual test result.

The established table can be completed by any visual table generating software, and the purpose of the established table is to display the actual test result (the same is applicable to the expected test result and the test report which are described below) to the user in a simple and visual manner and realize convenient data storage. In this embodiment, an excel table is taken as an example for explanation of the embodiment, and it is obvious that the excel table is only an alternative embodiment, and does not limit the specific means selected for creating the table.

Specifically, the following table 2 shows:

TABLE 2 comparison of actual test results with expected results

In this embodiment, as shown in the table above, the actual test result entry file, i.e., the test data set, is "turn off the air conditioner, wav", the voice recognition result is "turn on the fan", the domain is "cns.air-control", the semantic recognition result is "open", and the operation parameter Slot is { }; the expected result input file, namely the test data set, is "turn off air conditioner and wav", the voice recognition result is "turn off air conditioner", the domain is "cns.air-control", the semantic recognition result is "close", and the operation parameter Slot is { }. In other embodiments of the present invention, an excel table is also built as described above, and stored in the form of excel table entries to obtain the actual test result.

Step 105: and generating a test report according to the actual test result and the expected test result.

Specifically, in this embodiment, generating a test report according to an actual test result and an expected test result includes:

integrating the actual test result and the expected test result into an excel table;

comparing the actual test result with the corresponding excel table item in the expected test result, and generating a comparison result;

and highlighting excel table entries with different comparison results.

Specifically, the following table 3 shows:

table 3 generation of comparison results table

As can be seen from table 3, in this embodiment, whether the speech recognition, the semantic recognition, and the like are wrong is determined by comparing whether the corresponding excel table item in the actual test result is the same as the expected test result, if the corresponding excel table item in the actual test result is the same as the expected test result, the speech recognition and the semantic recognition are both correct, and if not, the speech recognition engine recognizes the speech correctly, otherwise, the speech recognition engine recognizes the speech recognition incorrectly.

Specifically, in the present embodiment, there are mainly the following cases that cause the speech recognition engine to be wrong: firstly, the voice recognition is wrong, the semantic recognition is correct, and the operation parameters and the like are correct; secondly, the voice recognition is correct, the semantic recognition is incorrect, and the operation parameters and the like are correct; and thirdly, the voice recognition is correct, the semantic recognition is correct, and the operation parameters and the like are wrong. Of course, there are also recognition errors in two or more of speech recognition, semantic recognition, operation parameters, and the like, resulting in a speech recognition engine recognition error.

In addition, in still another embodiment of the present invention, compared with the present embodiment, only the comparison result table may be generated for the user to know, that is, in some alternative embodiments, the actual test result and the expected result may not be presented in the form of an excel table, but the comparison result table may be generated by comparing the results after internal recognition processing such as the electronic device in the background, so that the setting manner is simpler and more intuitive, and is beneficial for the user to directly obtain the comparison result information. Of course, the comparison table of the actual test result and the expected result table is generated first in the embodiment, and then the comparison structure table is generated, so that the recognition process of the whole speech recognition engine can be presented to the user more completely, and the user can master the recognition process of the speech recognition engine in real time. The results provided by the two embodiments are different in presentation mode, and have respective advantages, and can be determined according to the specific application scene, the user requirement and the like.

Specifically, after the test report is obtained, the test report is sent to a pre-associated test report terminal, and the accuracy of the test is counted, as shown in table 4:

table 4 statistics of test accuracy

Total	Pass	Fail	Error
				526	446	80	0
100％	84.79％	15.21％	0.00％

And then, the statistical results in the table 4 are summarized and sent to related engineers in the form of files such as mails, so that the reliability of the test results is ensured, and the accuracy of the automatic test voice engine recognition is realized.

In some preferred embodiments of the present invention, referring to fig. 2, the steps further include:

step 106: and sending the test report to a pre-associated test report receiving terminal.

Specifically, after the test report is obtained, the test report is sent to a pre-associated test report terminal and stored in corresponding storage equipment for backup and later data analysis.

In some preferred embodiments of the present invention, referring to fig. 3, the steps further include:

step 107: receiving update data; and updating the test data set and the configuration data according to the update data.

The method provided by the invention can test more than once, and can test again only by acquiring corresponding different test data sets and configuration parameters according to different test requirements and updated data.

It should be noted that, step 106 and step 107 are both preferred embodiments of the present invention, and for other embodiments of the present invention, the above sequence or the embodiments of step 106 and step 107 are not required, and are determined according to the specific application scenario and the configuration of the staff.

Based on the same inventive concept, the invention also provides an accuracy testing device of the voice recognition engine, comprising:

Based on the same inventive concept, the invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to implement the method in any one of the above embodiments.

The device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the invention. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.

Claims

1. A method for testing accuracy of a speech recognition engine, comprising:

obtaining a test data set comprising at least one piece of test data;

generating an expected test result according to the test data set;

acquiring configuration data according to the test requirements;

setting configuration parameters of a voice recognition engine according to the configuration data;

generating a test report according to the actual test result and the expected test result;

wherein the acquiring a test data set comprising at least one piece of test data comprises:

calling a preset voiceprint template according to the test requirement, and generating at least one piece of audio data based on the voiceprint template;

acquiring the at least one piece of audio data as the at least one piece of test data;

the setting the configuration parameters of the voice recognition engine according to the configuration data comprises the following steps:

2. The method of claim 1, wherein the test data is: audio data collected under an application environment of the speech recognition engine; when at least two pieces of test data exist, at least two pieces of test data are all collected from the same sound source.

3. The method for testing the accuracy of a speech recognition engine according to claim 1, wherein inputting the test data set into the speech recognition engine, performing speech recognition on the test data, and obtaining an actual test result comprises:

4. A method of testing the accuracy of a speech recognition engine according to claim 3, wherein said generating a test report based on said actual test result and said expected test result comprises:

integrating the actual test result and the expected test result into a table;

and highlighting the table entries with different comparison results.

5. The method for testing the accuracy of a speech recognition engine of claim 1, further comprising:

and sending the test report to a pre-associated test report receiving terminal.

6. The method for testing the accuracy of a speech recognition engine of claim 1, further comprising:

receiving update data;

7. An accuracy testing apparatus of a speech recognition engine, comprising:

a first acquisition module for acquiring a test data set including at least one piece of test data;

the second acquisition module is used for acquiring configuration data according to the test requirements;

the configuration module is used for setting configuration parameters of the voice recognition engine according to the configuration data;

the result processing module is used for generating a test report according to the actual test result and the expected test result;

the first acquisition module is used for: calling a preset voiceprint template according to the test requirement, and generating at least one piece of audio data based on the voiceprint template; acquiring the at least one piece of audio data as the at least one piece of test data;

the second acquisition module is used for: analyzing the configuration data to obtain language configuration information; setting languages used by the voice recognition engine according to the language configuration information; analyzing the configuration data to obtain working mode configuration information; setting the working mode of the voice recognition engine according to the working mode configuration information, and calling a preset voice recognition word stock according to the corresponding working mode.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 6 when the program is executed by the processor.