CN116665648A

CN116665648A - Voice recognition test method and device

Info

Publication number: CN116665648A
Application number: CN202210152341.5A
Authority: CN
Inventors: 周丽君; 蒋宁; 周迅溢; 王洪斌; 吴海英; 郝征鹏
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2023-08-29

Abstract

The embodiment of the specification provides a voice recognition test method and a device, wherein the voice recognition test method comprises the following steps: receiving a voice recognition request carrying a voice stream sent by a client according to concurrency control parameters; responding to the voice recognition request, carrying out active voice detection on the voice stream based on the active voice detection parameters, and extracting active voice fragments in the voice stream according to the detection result; respectively inputting the active voice fragments into each voice recognition algorithm to be tested to perform voice recognition test; and sending a voice recognition test result of each voice recognition algorithm under the concurrency control parameter to the client. According to the embodiment of the application, the voice quality as the test input is improved through the movable voice fragments, so that the accuracy of the voice recognition test is improved, the efficiency of the voice recognition test is improved, and the voice recognition stability of each voice recognition algorithm under different concurrency control parameters is verified by means of concurrency control parameters.

Description

Voice recognition test method and device

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for testing speech recognition.

Background

With the continuous development of speech recognition technology, speech recognition technology is gradually entering various fields, such as smart home and automobile electronics. Before the voice recognition system is formally put into use, the performance, stability and other aspects of the voice recognition system are required to be tested, so that the voice recognition system can be ensured to normally operate, and in a specific test process, most of the voice recognition systems are tested manually, so that the test cost is high.

Disclosure of Invention

In a first aspect, an embodiment of the present application provides a method for testing speech recognition, which is applied to a server, and includes:

receiving a voice recognition request carrying a voice stream sent by a client according to a concurrency control parameter, wherein the concurrency control parameter is used for controlling the frequency of the voice recognition request sending of the client to the server;

responding to the voice recognition request, carrying out active voice detection on the voice stream based on the active voice detection parameters, and extracting active voice fragments in the voice stream according to the detection result;

respectively inputting the active voice fragments into each voice recognition algorithm to be tested to perform voice recognition test;

and sending a voice recognition test result of each voice recognition algorithm under the concurrency control parameter to the client.

In a second aspect, an embodiment of the present application provides a method for testing speech recognition, which is applied to a client, and includes:

reading a voice file under a target path based on a first starting parameter, and converting the read voice file into a voice stream;

sending a voice recognition request carrying the voice stream to a server according to the concurrency control parameter;

receiving a voice recognition test result of each voice recognition algorithm to be tested, which is sent by the server in response to the voice recognition request, under the concurrence control parameters;

and generating a test log according to the voice recognition test result, and calculating test indexes of each voice recognition algorithm under the concurrency control parameters according to the test log and the annotation files corresponding to the voice files.

In a third aspect, an embodiment of the present application provides a speech recognition testing apparatus, which operates on a server, including:

the system comprises an identification request receiving module, a voice identification module and a voice identification module, wherein the identification request receiving module is used for receiving a voice identification request carrying a voice stream, which is sent by a client according to concurrency control parameters, and the concurrency control parameters are used for controlling the frequency of the voice identification request sending of the client to the server;

the active voice segment extraction module is used for responding to the voice recognition request, carrying out active voice detection on the voice stream based on the active voice detection parameters, and extracting active voice segments in the voice stream according to the detection result;

The voice recognition test module is used for respectively inputting the active voice fragments into each voice recognition algorithm to be tested to carry out voice recognition test;

and the recognition test result transmitting module is used for transmitting the voice recognition test result of each voice recognition algorithm under the concurrency control parameter to the client.

In a fourth aspect, an embodiment of the present application provides a speech recognition testing apparatus, running on a client, including:

the voice file reading module is used for reading the voice file under the target path based on the first starting parameter and converting the read voice file into a voice stream;

the voice recognition request sending module is used for sending a voice recognition request carrying the voice stream to the server according to the concurrence control parameter;

the recognition test result receiving module is used for receiving the voice recognition test result of each voice recognition algorithm to be tested, which is sent by the server in response to the voice recognition request, under the concurrence control parameters;

and the test log generation module is used for generating a test log according to the voice recognition test result so as to calculate test indexes of each voice recognition algorithm under the concurrence control parameters according to the test log and the annotation files corresponding to the voice files.

In a fifth aspect, an embodiment of the present application provides a speech recognition test apparatus, including: a processor; and a memory configured to store computer-executable instructions that, when executed, cause the processor to perform the speech recognition test method of the first aspect.

In a sixth aspect, an embodiment of the present application provides a speech recognition test apparatus, including: a processor; and a memory configured to store computer-executable instructions that, when executed, cause the processor to perform the speech recognition test method of the second aspect.

In a seventh aspect, embodiments of the present application provide a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the speech recognition test method of the first aspect.

In an eighth aspect, embodiments of the present application provide a computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the speech recognition test method according to the second aspect.

It can be seen that in the embodiment of the present application, in the process of performing a voice recognition test in cooperation with a client, active voice detection is performed on a voice stream carried in a voice recognition request for a voice recognition request sent by the client according to concurrent control parameters, so as to extract an active voice fragment in the voice stream, and the active voice fragment is input into each voice recognition algorithm to be tested to perform the voice recognition test, so that the voice quality as a test input is improved through the active voice fragment, thereby improving the accuracy of the voice recognition test, saving the test time of the voice recognition test, and further improving the efficiency of the voice recognition test; and the frequency of sending the voice recognition request to the server by the client is controlled by updating the concurrency control parameters configured at the client to test the test result of each voice recognition algorithm under the sending frequency controlled by different concurrency control parameters, so as to verify the voice recognition stability of each voice recognition algorithm under different concurrency control parameters.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, it will be obvious that the drawings in the following description are only some embodiments described in the present specification, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art;

FIG. 1 is a process flow diagram of a speech recognition testing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a speech recognition test process according to an embodiment of the present application;

FIG. 3 is a schematic diagram of another speech recognition testing process according to an embodiment of the present application;

FIG. 4 is a processing timing chart of a speech recognition test method applied to customer service speech scene according to an embodiment of the present application;

fig. 5 is a processing timing chart of a voice recognition test method applied to a traffic voice scene according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating another method for testing speech recognition according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a speech recognition testing apparatus according to an embodiment of the present application;

FIG. 8 is a schematic diagram of another speech recognition testing apparatus according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a voice recognition test apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of another voice recognition test apparatus according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the embodiments of the present application, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

In practical applications, an additional tool is often needed in the test process of performance or stability test for a speech recognition system, for example, by means of FreeSwitch (a soft switch solution of a phone) to provide speech driving, in this process, a tester needs to know the configuration method and the operation method of FreeSwitch, so that the test efficiency is low, and in the process of performing speech recognition test, the speech recognition system is often tested directly according to the acquired speech file, so that the speech quality of the test input cannot be ensured.

In view of this, in the voice recognition testing method provided in this embodiment, during the process of performing a voice recognition test in cooperation with a client, active voice detection is performed on a voice stream carried in a voice recognition request sent by the client according to a concurrence control parameter, active voice fragments are extracted from the voice stream according to a detection result, the extracted active voice fragments are input into each voice recognition algorithm to be tested to perform the voice recognition test, so that voice quality in the voice recognition testing process is controlled by extracting the active voice fragments from the voice stream, the accuracy of the voice recognition test is improved, and meanwhile, the testing time of the voice recognition test is saved, thereby improving the efficiency of the voice recognition test; and the method comprises the steps of carrying out voice recognition test on different voice recognition algorithms through concurrency control parameters configured by a client, obtaining different frequencies of voice recognition requests sent by the client to a server under different concurrency control parameters by updating the concurrency control parameters, testing test results of the voice recognition algorithms under different concurrency control parameters, and further verifying voice recognition stability of the voice recognition algorithms under different concurrency control parameters.

Referring to fig. 1, the voice recognition testing method provided in the present embodiment specifically includes steps S102 to S108.

Step S102, a voice recognition request carrying a voice stream sent by a client according to the concurrence control parameter is received.

The concurrency control parameter in this embodiment refers to sleep time increased in a gap when the client sends a voice recognition request to the server, that is, after the client sends a voice recognition request, the client sleeps for a while according to the sleep time, and then sends the next voice recognition request; the concurrency control parameter is set in the configuration file of the client, as shown in fig. 2, where the concurrency control parameter is set to-t, for example, t is 100ms. Optionally, the concurrency control parameter is used for controlling the frequency of sending the voice recognition request to the server by the client.

Specifically, the concurrency control parameter is used for controlling the frequency of sending the voice recognition requests to the server by the client, and affecting the concurrency number counted on the server, wherein the concurrency number refers to the request number of the voice recognition requests which are received by the server from the client and are not responded by each voice recognition algorithm to be tested. For example, t is 1000ms, 1 speech recognition request is sent per second, and the concurrence number is 2; t is 100ms, 10 voice recognition requests are sent every second, and the concurrence number is 30; i.e. the smaller the value of the concurrency control parameter t the larger the concurrency number.

In addition, before the voice recognition test, a first starting parameter and a second starting parameter are further set in the configuration file of the client, on the basis, the client utilizes the first starting parameter to automatically read the voice file and convert the voice stream under the target path, sends a voice recognition request carrying the voice stream according to the concurrency control parameter, circularly reads the voice file based on the second starting parameter after detecting that the concurrency control parameter is updated, converts the read voice file into the voice stream, and calculates the voice recognition test effect under different concurrency control parameters by updating the concurrency control parameter, so that the stability of the voice recognition system is verified.

As shown in fig. 2, the first starting parameter is-f, so that the client can read the voice file and convert the voice stream under the target path; the second starting parameter is-p, so that the client can circularly and repeatedly read the voice file and convert the voice stream under the target path; the concurrency control parameter is-t, the frequency of sending the voice recognition request to the server by the client can be controlled, the concurrency number is further controlled, and the test effect of each voice recognition algorithm to be tested under different concurrency numbers can be counted by updating the concurrency control parameter.

In a specific execution process, on the basis of setting a first starting parameter and a concurrency control parameter in a configuration file of a client, as shown in fig. 2, the client automatically reads a voice file under a target path based on the first starting parameter, converts the read voice file into a voice stream, then sends a voice recognition request carrying the voice stream according to the concurrency control parameter, and facilitates subsequent voice recognition test by a server through conversion of the voice file, thereby improving test efficiency and convenience. Correspondingly, the server receives a voice recognition request carrying a voice stream sent by the client according to the concurrence control parameter. In addition, the client can also directly read the voice stream based on the first starting parameter so as to adapt to various voice recognition testing scenes.

It should be noted that, the number of the voice files may be not fixed, may be single or plural, if plural voice files are provided, the client needs to process the voice files in batch, and in the process of processing the voice files in batch, the client reads the voice files in sequence under the target path based on the first start parameter, and converts the read voice files into the voice stream in sequence. The processing of the single voice file is similar to that of the plurality of voice files, and if the single voice file is the single voice file, the client reads the single voice file under the target path based on the first starting parameter and converts the single voice file into a voice stream.

Step S104, responding to the voice recognition request, carrying out active voice detection on the voice stream based on the active voice detection parameters, and extracting active voice fragments in the voice stream according to the detection result.

The active voice detection parameters in this embodiment refer to parameters involved in the active voice detection (VAD, voice Activity Detection) process, and optionally, the active voice detection parameters are set in a configuration file of the server. The active voice detection parameters comprise 3 subparameters, namely vad_mod (active voice detection mode), vad-speech-timeout and vad-silence-timeout. For example, 3 sub-parameters of the active voice detection parameter, vad_mod, vad-speech-timeout, vad-silence-timeout, are set in the configuration file of the server in fig. 2.

In the voice recognition test process, the active voice detection parameters are set in the configuration file of the server, and the quality of the voice stream in the voice recognition test process is controlled, so that the accuracy of the voice recognition test is improved.

Optionally, a channel parameter is further set in the configuration file of the server, where the channel parameter is used to control whether the server receives the voice recognition request sent by the client. The channel parameters are used for controlling the maximum concurrency number, so as to control whether the server receives a voice recognition request sent by the client; for example, the channel parameter is max-channel-count, the parameter value of the channel parameter is 40, and the maximum concurrency is limited to 40.

In practical application, the response speed of each voice recognition algorithm to be tested to the voice recognition request is limited, and in order to avoid too slow or abnormal response of the voice recognition algorithm caused by backlog of a large number of voice recognition requests, channel parameters can be set in the configuration file of the server for controlling the maximum concurrence number, so as to control whether the server receives the voice recognition request sent by the client.

In an optional implementation manner provided in this embodiment, whether the server receives the voice recognition request sent by the client is controlled by:

if the parameter value of the channel parameter is greater than or equal to the number of requests to be responded, suspending receiving the voice recognition request sent by the client; the number of the requests to be responded is the number of the requests which are received and not responded by the voice recognition algorithms;

and if the parameter value of the channel parameter is smaller than the number of the requests to be responded, receiving a voice recognition request sent by the client.

It should be added that the parameter value of the channel parameter can also be adjusted and updated in the voice recognition test process to test and obtain the maximum supportable concurrency number of each voice recognition algorithm to be tested, and obtain the maximum test capacity of each voice recognition algorithm.

The server receives a voice recognition request carrying a voice stream sent by a client according to the concurrency control parameters, responds to the voice recognition request, carries out active voice detection on the voice stream based on the active voice detection parameters set in the configuration file, and extracts active voice fragments in the voice stream according to the detection result.

In practical application, a voice stream which does not meet the test requirements may cause poor effect of the voice recognition test, so that in order to improve the effect of the voice recognition test, active voice detection is performed on the voice stream, active voice fragments are extracted from the voice stream, and each voice recognition algorithm to be tested is input to perform the voice recognition test. In an alternative implementation provided in this embodiment, the active speech segments are extracted from the speech stream by:

detecting critical voice points in the voice stream, and judging whether a local voice stream between two continuous critical voice points in the voice stream is an active voice stream or not;

if yes, extracting the local voice stream from the voice stream to serve as the active voice fragment;

if not, the processing is not performed.

Wherein, the active voice stream and the inactive voice stream exist in the voice stream, and optionally, the active voice stream is a local voice stream with voice duration exceeding the threshold value of the active voice duration and/or a local voice stream of continuous active voice; specifically, the active voice stream may be a local voice stream with a voice duration exceeding the active voice duration threshold or a local voice stream of continuous active voice, or may be a local voice stream with a voice duration exceeding the active voice duration threshold and continuous active voice.

For example, the active voice duration threshold is set to 150ms, and an active voice stream is considered only if the voice duration of the local voice stream between two consecutive critical voice points exceeds 150ms and the local voice stream appears to be continuous and uninterrupted. The critical voice point refers to a critical point for distinguishing between active voice and inactive voice in the voice stream, for example, a standard for distinguishing between active voice and inactive voice is that the voice value is greater than 0 db, and the critical voice point in the voice stream refers to a point with the voice value of 0 db.

Specifically, in order to improve the voice recognition test effect, improve the accuracy and effectiveness of the voice recognition test, the critical voice points in the voice stream can be detected, and whether the voice duration of the local voice stream between two continuous critical voice points in the voice stream exceeds the active voice duration threshold value is judged according to the critical voice points; if not, not processing; if yes, the local voice stream is extracted from the voice stream to be used as an active voice segment, and if no, the processing is not performed.

It should be noted that the active speech segment may be one active speech segment, or may be a plurality of active speech segments, where the plurality of active speech segments form a segment set.

Step S106, the active voice fragments are respectively input into each voice recognition algorithm to be tested to carry out voice recognition test.

The receiving client side carries the voice recognition request of the voice stream according to the concurrency control parameters, responds to the voice recognition request, carries out active voice detection on the voice stream based on the active voice detection parameters, extracts active voice fragments in the voice stream according to the detection result, and carries out voice recognition test on each voice recognition algorithm to be tested by using the active voice fragments in the step, so as to obtain a voice recognition test result of each voice recognition algorithm under the concurrency control parameters.

In specific implementation, the above-mentioned local voice stream meeting the requirements of the active voice stream is extracted from the voice stream as an active voice segment, and on the basis of this, the active voice segment is respectively input into each voice recognition algorithm to be tested to perform a voice recognition test.

(1) And inputting a first active voice fragment in a fragment set formed by the active voice fragments into each voice recognition algorithm to perform voice recognition test.

Specifically, in order to improve the efficiency of the speech recognition test, the first active speech fragment in the fragment set formed by the active speech fragments may be simultaneously input into each speech recognition algorithm to be tested to perform the speech recognition test.

(2) If a first test result of the voice recognition algorithm for performing the voice recognition test on the first active voice segment by any one of the voice recognition algorithms is obtained, inputting a second active voice segment in the segment set into the any one of the voice recognition algorithms for performing the voice recognition test.

Specifically, because the response time length of each voice recognition algorithm is different from the voice recognition rate, the time for each voice recognition algorithm to output the test result may be different, and after the first test result of any voice recognition algorithm for performing the voice recognition test on the first active voice segment is obtained, the second active voice segment in the segment set is input into any voice recognition algorithm for performing the voice recognition test, so as to obtain the second test result of any voice recognition algorithm for performing the voice recognition test on the second active voice segment.

It should be noted that any one of the above-mentioned voice recognition algorithms represents any one of the voice recognition algorithms to be tested, that is, as long as a first test result of performing a voice recognition test on a first active voice segment by one or more voice recognition algorithms in each voice recognition algorithm to be tested is obtained, a second active voice segment in the segment set is input into the one or more voice recognition algorithms to perform a voice recognition test; and combining the first test result and the second test result of any one voice recognition algorithm to obtain the voice recognition test result of the any one voice recognition algorithm under the current concurrence control parameters.

Step S108, sending the voice recognition test result of each voice recognition algorithm under the concurrency control parameter to the client.

On the basis that the active voice fragments are respectively input into each voice recognition algorithm to be tested to carry out voice recognition test, voice recognition test results output by each voice recognition algorithm are obtained. Correspondingly, the client receives a voice recognition test result of each voice recognition algorithm to be tested under the current concurrency control parameter sent by the server, and generates a test log according to the voice recognition test result so as to calculate a test index of each voice recognition algorithm under the current concurrency control parameter according to the test log and a labeling file corresponding to the voice file.

The test indexes comprise word error rate and sentence error rate, and can also comprise any one or more of the word error rate and the sentence error rate or other test indexes except the word error rate and the sentence error rate. Through calculation of various test indexes, the flexibility of the voice recognition test is realized so as to adapt to various voice recognition test scenes.

For example, in fig. 2, a client reads a voice file under a target path based on a first start parameter, converts the read voice file into a voice stream, sends a voice recognition request carrying the voice stream to a server according to a concurrency control parameter, correspondingly, the server receives the voice recognition request carrying the voice stream sent by the client according to the concurrency control parameter, and responds to the voice recognition request, carries out active voice detection on the voice stream carried in the voice recognition request based on an active voice detection parameter, extracts an active voice fragment from the voice stream according to a detection result, inputs the active voice fragment into each voice recognition algorithm to be tested to carry out voice recognition test, obtains a voice recognition test result, and sends the voice recognition test result to the client. The client receives the voice recognition test result sent by the server, generates a test log according to the voice recognition test result, and calculates test indexes of each voice recognition algorithm under concurrent control parameters by combining the markup file. It should be noted that the test index may be calculated by the client or by the test tool.

On the basis, in order to count the influence of different concurrency control parameters on each voice recognition algorithm, the test effect of each voice recognition algorithm under different concurrency control parameters is tested, and the client updates the concurrency control parameters and circularly carries out voice recognition test. Specifically, the client performs the speech recognition test in a loop by:

if the concurrency control parameter is detected to be updated, reading a second starting parameter set in the configuration file of the client, wherein the second starting parameter is used for controlling the client to read the voice file under the target path based on the first starting parameter, and converting the read voice file into a voice stream.

Further, after obtaining the test index of each voice recognition algorithm under the tested concurrency control parameter, the client tests and evaluates each voice recognition algorithm according to the test index under the tested concurrency control parameter. Specifically, the client performs test evaluation on each voice recognition algorithm by the following manner:

reading test indexes of each voice recognition algorithm under the tested concurrency control parameters; the measured concurrency control parameters comprise concurrency control parameters and updated concurrency control parameters;

And testing and evaluating each voice recognition algorithm according to the test index under the tested concurrency control parameters.

The measured concurrency control parameters refer to all the tested concurrency control parameters, and the performance and stability of each voice recognition algorithm are evaluated according to the test indexes of each voice recognition algorithm under the measured concurrency control parameters.

It should be noted that, updating the concurrency control parameter may be performed by the client, or may be performed by the test tool, and performing test evaluation on each speech recognition algorithm may be performed by the client, or may be performed by the test tool. As shown in fig. 2, the client updates the current concurrency control parameter, and re-performs the voice recognition test according to the updated concurrency control parameter, calculates the test index of each voice recognition algorithm under the updated concurrency control parameter, and performs test evaluation on each voice recognition algorithm.

It should be added that, in the process of updating the concurrency control parameters, the number of times of updating the concurrency control parameters is not fixed, and in view of the fact that the test effect of each voice recognition algorithm under different concurrency control parameters needs to be counted, the updating of the concurrency control parameters is performed for a plurality of times, and then the test indexes of each voice recognition algorithm for the updated concurrency control parameters are also a plurality of. Specifically, a voice recognition test is performed according to each concurrency control parameter in the updated concurrency control parameters, test indexes of each voice recognition algorithm for each concurrency control parameter are calculated, and test evaluation is performed on each voice recognition algorithm according to the test indexes of each concurrency control parameter.

In a specific voice recognition test process, since the test process needs to be performed multiple times to obtain the test index of each voice recognition algorithm under each concurrency control parameter, the voice recognition test method provided above can be specific to any one time in the voice recognition test process, specifically, the first time, or any one time in the subsequent voice recognition test process (non-first voice recognition test) except the first time.

In an actual application scenario, the server needs to interface with voice recognition algorithms of different manufacturers, then the voice recognition algorithms of different manufacturers need to be customized and developed in the server to meet different functional requirements, after development is completed, functional test and stability test are performed on the server, and when new voice recognition algorithms need to be tested, functional test and stability test and the like are also required to be performed on the new voice recognition algorithms. These test requirements can be met by retrofitting the client.

For example, in the case of performing a functional test on a server and a vendor's voice recognition algorithm, the following commands are used to implement the client to read the specified voice file:

(1) Unimmcpclient-f./test1. Wav; representing a read voice file 1;

(2) Unimmcpclient-f./test2.wav; representing a read voice file 2;

(3) Unimmcpclient-f./test3.wav; representing a read voice file 3.

Specifically, the method comprises the steps of reading a specified voice file client and converting the voice file client into a voice stream, performing voice recognition test according to the converted voice stream to obtain a voice recognition test result, further calculating test indexes by combining the voice recognition test result and a labeling file, if the test indexes meet index thresholds, representing that the server and the voice recognition algorithm meet functional requirements, and if the test indexes do not meet the index thresholds, representing that the server and the voice recognition algorithm do not meet the functional requirements.

The following further describes the voice recognition test method provided in this embodiment by taking the application of the voice recognition test method provided in this embodiment to a customer service voice scene as an example, and referring to fig. 4, the voice recognition test method applied to the customer service voice scene specifically includes the following steps.

In an actual customer service voice scenario, the server usually needs to interface with voice recognition algorithms of different manufacturers, and because the voice recognition algorithms of different manufacturers have different interfaces provided outside, the server needs to be customized and developed according to the requirements of each manufacturer. And customizing and developing the customer service voice recognition system according to different functional requirements, and performing performance test and stability test on the customer service voice recognition system after development is completed to ensure that the service can normally run. The performance and stability of the speech recognition algorithm also need to be tested when interfacing with the new manufacturer's speech recognition algorithm. The following is a specific test procedure of the voice recognition test method applied to customer service voice scenes.

Step S408, a voice recognition request carrying customer service voice stream sent by the client according to the concurrence control parameter is received.

Wherein the concurrency control parameter is set to-t, e.g., t1 is 120ms.

Step S410, in response to the voice recognition request, detects critical voice points in the customer service voice stream based on the activity detection parameters.

Step S412, judging whether the local customer service voice stream between two continuous critical voice points in the customer service voice stream is an active customer service voice stream;

if yes, go to step S414 to step S420, and step S430 to step S432;

if not, the processing is not performed.

The active customer service voice stream is a local customer service voice stream with voice duration exceeding an active voice duration threshold (such as 150 ms) and/or a local customer service voice stream of continuous active customer service voice.

The 3 sub-parameters included in the active voice detection parameter are respectively vad_mod (active voice detection mode), vad-speech-timeout and vad-silence-timeout, for example, parameter values of the active voice detection parameter are respectively set to be 3 ms, 100ms and 300ms.

Step S414, extracting local customer service voice stream from the customer service voice stream as the active customer service voice segment.

Step S416, the first customer service voice segment in the segment set composed of the active customer service voice segments is input into each voice recognition algorithm to be tested for voice recognition test.

Step S418, if a first test result of any voice recognition algorithm in the voice recognition algorithms for performing voice recognition test on the first customer service voice segment is obtained, a second customer service voice segment in the segment set is input into the any voice recognition algorithm for performing voice recognition test, and a second test result is obtained.

After a first test result and a second test result of any voice recognition algorithm are obtained, the first test result and the second test result are combined to obtain a voice recognition test result of any voice recognition algorithm.

Step S420, the voice recognition test result of each voice recognition algorithm under the concurrent control parameter is sent to the client.

Step S430, performing voice recognition test on each voice recognition algorithm according to the customer service voice stream carried in the voice recognition request.

Step S432, sending the second speech recognition test result under the updated concurrency control parameter to the client.

Specifically, the counted concurrency number on the server is controlled by the concurrency control parameters, and the test effect under different concurrency numbers is counted by updating the concurrency control parameters, so that the stability of the server and each voice recognition algorithm to be tested is verified under the conditions of high concurrency number and long-time working.

For example, the following commands are set in the configuration file of the client to achieve the test effect under the statistics of different concurrency numbers, and the stability of the server and each voice recognition algorithm to be tested is verified under the conditions of high concurrency numbers and long-time working:

(1)./unimrcpclient-f./data/test/-p 1-t 500ms

description: circularly reading the voice file under the data/test/path, wherein the sleep time interval of the voice recognition request transmission, namely the parameter value of the concurrency control parameter is 500ms, 2 voice recognition requests are transmitted in 1 second, and the concurrency number is 5;

(2)./unimrcpclient-f./data/test/-p 1-t 200ms

description: circularly reading the voice file under the data/test/path, wherein the sleep time interval of the voice recognition request transmission, namely the parameter value of the concurrency control parameter is 200ms, and the concurrency number of the voice recognition request transmission is 20;

(3)./unimrcpclient-f./data/test/-p 1-t 100ms

description: and circularly reading the voice file under the/data/test/path, wherein the sleep time interval of voice recognition request transmission, namely the parameter value of concurrency control parameters, is 100ms, and 10 voice recognition requests are transmitted in 1 second, and the concurrency number is 30.

The second starting parameter is-p, and the second starting parameter is used for controlling the client to read the customer service voice file in the target path/data/test/based on the first starting parameter-f circulation and converting the customer service voice file into a customer service voice stream.

It can be seen that the smaller the parameter value of the concurrency control parameter, i.e. the shorter the sleep time interval, the larger the concurrency number; the performance effect of each voice recognition algorithm under different concurrency numbers is tested by updating the concurrency control parameters, and when the parameter value of the concurrency control parameters is set to be large, the working stability of the server and each voice recognition algorithm under the conditions of high concurrency numbers and long-time working can be evaluated.

The following further describes the voice recognition test method provided in this embodiment by taking the application of the voice recognition test method provided in this embodiment to a traffic voice scene as an example, and referring to fig. 5, the voice recognition test method applied to the traffic voice scene specifically includes the following steps.

Step S508, receiving the voice recognition request carrying the telephone traffic voice stream sent by the client according to the concurrency control parameter.

Step S510, responding to the voice recognition request, carrying out active voice detection on the traffic voice stream based on the active voice detection parameters, and extracting the active traffic voice fragments in the traffic voice stream according to the detection result.

The 3 sub-parameters included in the active voice detection parameter are respectively vad_mod (active voice detection mode), vad-speech-timeout (voice timeout time), vad-silence-timeout (silence timeout time), for example, parameter values of the active voice detection parameter are respectively set to 3 ms, 100ms and 300ms.

Step S512, the active telephone traffic voice fragments are respectively input into each voice recognition algorithm to be tested for voice recognition test.

Step S514, the voice recognition test result of each voice recognition algorithm under the concurrent control parameter is sent to the client.

Step S520, performing voice recognition test on each voice recognition algorithm according to the voice stream carried in the voice recognition request, obtaining a second voice recognition test result under the updated concurrency control parameter, and sending the second voice recognition test result to the client.

The implementation process of the voice recognition test method for the two application scenes of customer service voice and traffic voice provided by the embodiment of the method is executed by a server, and the implementation process of the voice recognition test method for the two application scenes of customer service voice and traffic voice provided by the embodiment of the method is executed by a client, and the two are matched in the execution process, so that the corresponding contents of the embodiment of the method are read.

Referring to fig. 6, the voice recognition testing method provided in the present embodiment specifically includes steps S602 to S608.

The voice recognition test method provided in the present embodiment is applied to the client, and the voice recognition test method provided in the above method embodiment is applied to the server, and the two are mutually matched in the execution process, so that the corresponding content of the above method embodiment is referred to in reading the present embodiment, and the present embodiment is not repeated here.

Step S602, reading the voice file under the target path based on the first starting parameter, and converting the read voice file into a voice stream.

The purpose of setting the first start parameter in this embodiment is to enable the client to automatically read the voice file stored in the target path and convert the voice file into a voice stream. As shown in fig. 2, the first starting parameter is-f, so as to implement reading of the voice file and conversion of the voice stream under the target path.

In this embodiment, before a voice recognition test, a first start parameter is set in a configuration file of the client, and in addition, a second start parameter is set in the configuration file of the client, on the basis of which, the client automatically reads a voice file stored in a target path based on the first start parameter, converts the voice file into a voice stream, and after detecting that a concurrency control parameter is updated, circularly reads the voice file based on the second start parameter, and converts the read voice file into the voice stream. As shown in FIG. 2, the second starting parameter is-p, so that the repeated reading of the voice file and the conversion of the voice stream can be realized by utilizing the dead loop, and the server can conveniently perform the subsequent voice recognition test through the conversion of the voice file, thereby improving the test efficiency and convenience. In addition, the client can also directly read the voice stream based on the first starting parameter so as to adapt to various voice recognition testing scenes.

Step S604, a voice recognition request carrying the voice stream is sent to a server according to the concurrence control parameter.

The concurrency control parameter in this embodiment refers to sleep time increased in a gap when the client sends a voice recognition request to the server, that is, after the client sends a voice recognition request, the client sleeps for a while according to the sleep time, and then sends the next voice recognition request; the concurrency control parameter is set in the configuration file of the client, as shown in fig. 2, where the concurrency control parameter is set to-t, for example, t is 100ms.

Specifically, the concurrency control parameter is used for controlling the frequency of sending the voice recognition requests to the server by the client, and affecting the counted concurrency number on the server, wherein the concurrency number refers to the number of the voice recognition requests which are received by the server from the client and are not responded by each voice recognition algorithm to be tested. For example, t is 1000ms, 1 speech recognition request is sent per second, and the concurrence number is 2; t is 100ms, 10 voice recognition requests are sent every second, and the concurrence number is 30; i.e. the smaller the value of the concurrency control parameter t the larger the concurrency number.

And the client reads the voice file under the target path based on the first starting parameter and converts the read voice file into a voice stream, and in the step, a voice recognition request carrying the voice stream is sent to the server according to the concurrency control parameter.

Correspondingly, the server receives a voice recognition request carrying a voice stream sent by the client according to the concurrency control parameter, responds to the voice recognition request, carries out active voice detection on the voice stream based on the active voice detection parameter, controls the quality of the voice stream by introducing the active voice detection parameter in the voice recognition test process, improves the accuracy of the voice recognition test, extracts active voice fragments in the voice stream according to the detection result, respectively inputs the extracted active voice fragments into each voice recognition algorithm to be tested to carry out the voice recognition test, obtains the voice recognition test result of each voice recognition algorithm aiming at the concurrency control parameter, and sends the voice recognition test result to the client.

Wherein, the active voice detection parameter refers to a parameter involved in the process of performing active voice detection (VAD, voice Activity Detection), and the active voice detection parameter is set in a configuration file of the server. The active voice detection parameters comprise 3 subparameters, namely vad_mod (active voice detection mode), vad-speech-timeout and vad-silence-timeout. For example, FIG. 2 sets 3 sub-parameters of the active voice detection parameter, vad_mod, vad-speech-timeout, vad-silence-timeout, in the configuration file of the server.

It is to be added that the configuration file of the server is further provided with channel parameters, and the channel parameters are used for controlling whether the server receives the voice recognition request sent by the client. The channel parameters are used for controlling the maximum concurrency number, so as to control whether the server receives a voice recognition request sent by the client; for example, the channel parameter is max-channel-count, the parameter value of the channel parameter is 40, and the maximum concurrency is limited to 40.

Specifically, whether the server receives the voice recognition request sent by the client is controlled by the following manner:

if the parameter value of the channel parameter is greater than or equal to the number of requests to be responded, suspending receiving the voice recognition request sent by the client; the number of the requests to be responded is the number of the requests which are received and not responded by each voice recognition algorithm;

In practical application, a voice stream which does not meet the test requirements may cause poor effect of the voice recognition test, so in order to improve the effect of the voice recognition test, the server performs active voice detection on the voice stream, extracts active voice fragments from the voice stream and inputs each voice recognition algorithm to be tested to perform the voice recognition test. Specifically, the active speech segments are extracted from the speech stream by:

if not, the processing is not performed.

The active voice stream and the inactive voice stream exist in the voice stream, and the active voice stream is a local voice stream with voice duration exceeding an active voice duration threshold value and/or a local voice stream of continuous active voice; specifically, the active voice stream may be a local voice stream with a voice duration exceeding the active voice duration threshold or a local voice stream of continuous active voice, or may be a local voice stream with a voice duration exceeding the active voice duration threshold and continuous active voice.

Specifically, in order to improve the voice recognition test effect and improve the accuracy and effectiveness of the voice recognition test, the server can detect critical voice points in the voice stream, and judge whether the voice duration of a local voice stream between two continuous critical voice points in the voice stream exceeds an active voice duration threshold according to the critical voice points; if not, not processing; if yes, the server judges whether the local voice stream is continuous active voice, if yes, the server extracts the local voice stream from the voice stream as an active voice fragment, and if no, the server does not process.

On the basis, the server performs voice recognition test on each voice recognition algorithm to be tested by utilizing the active voice fragments to obtain voice recognition test results of each voice recognition algorithm under concurrent control parameters, and the influence of the inactive voice fragments on the voice recognition test accuracy is reduced by extracting the active voice fragments from the voice stream, so that the voice recognition test accuracy is improved, the test time of the voice recognition test is reduced, and the test efficiency of the voice recognition test is improved.

(1) And inputting a first active voice fragment in the fragment set formed by the active voice fragments into each voice recognition algorithm to perform voice recognition test.

Specifically, in order to improve the efficiency of the speech recognition test, the server may input the first active speech segment in the segment set formed by the active speech segments into each speech recognition algorithm to be tested at the same time to perform the speech recognition test.

(2) If a first test result of any voice recognition algorithm in the voice recognition algorithms for carrying out voice recognition test on the first active voice fragments is obtained, inputting the second active voice fragments in the fragment set into the any voice recognition algorithm for carrying out voice recognition test.

Specifically, because the response time length of each voice recognition algorithm is different from the voice recognition rate, the time for each voice recognition algorithm to output the test result may be different, and after the server obtains the first test result of any voice recognition algorithm for performing the voice recognition test on the first active voice segment, the server inputs the second active voice segment in the segment set into any voice recognition algorithm for performing the voice recognition test, so as to obtain the second test result of any voice recognition algorithm for performing the voice recognition test on the second active voice segment.

And after the server obtains the voice recognition test result of each voice recognition algorithm under the current concurrence control parameter, sending the voice recognition test result to the client.

Step S606, receiving a voice recognition test result of each voice recognition algorithm to be tested under the concurrence control parameter, which is sent by the server in response to the voice recognition request.

The client sends a voice recognition request carrying a voice stream to the server according to the concurrency control parameters, the server responds to the voice recognition request, carries out active voice detection on the voice stream based on the active voice detection parameters, extracts active voice fragments in the voice stream according to detection results, respectively inputs the active voice fragments into each voice recognition algorithm to be tested to carry out voice recognition test, obtains voice recognition test results under the concurrency control parameters, which are output by each voice recognition algorithm, and sends the voice recognition test results to the client.

Step S608, generating a test log according to the voice recognition test result, so as to calculate a test index of each voice recognition algorithm under the concurrency control parameter according to the test log and the markup file corresponding to the voice file.

The client receives a voice recognition test result of each voice recognition algorithm to be tested under the concurrency control parameter, which is sent by the server in response to the voice recognition request, and on the basis, the client generates a test log according to the voice recognition test result so as to calculate a test index of each voice recognition algorithm under the concurrency control parameter according to the test log and a labeling file corresponding to the voice file; the markup file is a markup file corresponding to a voice file.

On the basis, in order to count the influence of different concurrency control parameters on each voice recognition algorithm, the test effect of each voice recognition algorithm under different concurrency control parameters is tested, the concurrency control parameters are updated, and the voice recognition test is circularly carried out. In an alternative implementation manner provided in this embodiment, the speech recognition test is circularly performed in the following manner:

and if the concurrency control parameter is detected to be updated, reading a second starting parameter set in the configuration file of the client, wherein the second starting parameter is used for controlling the client to read the voice file under the target path based on the first starting parameter, and converting the read voice file into the voice stream.

Further, after obtaining the test index of each voice recognition algorithm under the tested concurrency control parameter, the client tests and evaluates each voice recognition algorithm according to the test index under the tested concurrency control parameter. In an alternative implementation manner provided in this embodiment, each speech recognition algorithm is tested and evaluated in the following manner:

reading test indexes of each voice recognition algorithm under the tested concurrency control parameters; the measured concurrency control parameters comprise the concurrency control parameters and updated concurrency control parameters;

And carrying out test evaluation on each voice recognition algorithm according to the test index under the tested concurrency control parameters.

(1) Unimmcpclient-f./test1. Wav; representing a read voice file 1;

(2) Unimmcpclient-f./test2.wav; representing a read voice file 2;

(3) Unimmcpclient-f./test3.wav; representing a read voice file 3.

Step S402, a customer service voice file stored in the target path is read based on the first starting parameter.

For example, if the first startup parameter is-f and the target path is set to./data/test/, the client automatically reads the customer service voice file under the./data/test/path.

Step S404, converting the read customer service voice file into a customer service voice stream.

Step S406, a voice recognition request carrying the customer service voice stream is sent to the server according to the concurrence control parameter.

Wherein the concurrency control parameter is set to-t, e.g., t1 is 120ms.

Step S422, a test log is generated according to the voice recognition test result.

Step S424, calculating the word error rate of each voice recognition algorithm according to the test log and the labeling file.

For example, the speech recognition algorithm to be tested includes a first speech recognition algorithm and a second speech recognition algorithm, the word error rate of the first speech recognition algorithm is calculated to be 5%, and the word error rate of the second speech recognition algorithm is calculated to be 8%.

Step S426, if the concurrency control parameter is detected to be updated, reading a second starting parameter set in the configuration file of the client, and based on the second starting parameter, reading the customer service voice file again according to the first starting parameter and converting the customer service voice stream.

Specifically, the frequency of sending the voice recognition request to the server by the client is controlled through the concurrency control parameter, the counted concurrency number on the server is further controlled, and the test effect of each voice recognition algorithm under different concurrency numbers is counted through updating the concurrency control parameter, so that the stability of the server and each voice recognition algorithm to be tested is verified under the conditions of high concurrency number and long-time working.

For example, the following commands are set in the configuration file of the client to achieve the test effect of counting each voice recognition algorithm under different concurrency numbers, and the stability of the server and each voice recognition algorithm to be tested is verified under the conditions of high concurrency numbers and long-time operation:

(1)./unimrcpclient-f./data/test/-p 1-t 500ms

(2)./unimrcpclient-f./data/test/-p 1-t 200ms

(3)./unimrcpclient-f./data/test/-p 1-t 100ms

Step S428, a voice recognition request of the customer service voice stream is sent to the server according to the updated concurrency control parameters.

For example, t2 is 150ms for the updated concurrency control parameter.

Step S434, a second test log is generated according to the second speech recognition test result.

Step S436, calculating a second word error rate of each voice recognition algorithm under the updated concurrency control parameters according to the second test log and the labeling file.

Along the above example, the second word error rate of the first speech recognition algorithm is calculated to be 4%, and the second word error rate of the second speech recognition algorithm is calculated to be 5%.

Step S438, the word error rate of each voice recognition algorithm under the tested concurrency control parameters is read, and the test evaluation is carried out on each voice recognition algorithm according to the read word error rate.

The measured concurrency control parameters comprise concurrency control parameters and updated concurrency control parameters.

And using the above example, testing and evaluating each voice recognition algorithm according to the word error rate of each voice recognition algorithm under the tested concurrency control parameters.

The read word error rate comprises 5% of the word error rate of the first voice recognition algorithm, 8% of the word error rate of the second voice recognition algorithm, 4% of the second word error rate of the first voice recognition algorithm and 5% of the second word error rate of the second voice recognition algorithm.

The following further describes the voice recognition test method provided in this embodiment by taking the application of the voice recognition test method provided in this embodiment to a test tool scene as an example, and referring to fig. 5, the voice recognition test method applied to the test tool scene specifically includes the following steps.

Step S502, based on the first starting parameter, the telephone traffic voice file is read under the target path.

Step S504, the read traffic voice file is converted into a traffic voice stream.

Step S506, a voice recognition request carrying the telephone traffic voice stream is sent to the server according to the concurrence control parameter.

Step S516, a test log is generated according to the voice recognition test result.

As shown in fig. 3, the test tool reads the voice recognition test result from the test log, calculates the test index according to the voice recognition test result and the annotation file, and updates the concurrency control parameter.

For example, the sentence error rate of the first speech recognition algorithm is calculated to be 2.5%, and the sentence error rate of the second speech recognition algorithm is calculated to be 3%.

And the client reads the telephone traffic voice file under the target path based on the first starting parameter on the basis of the second starting parameter set in the configuration file, and converts the read telephone traffic voice file into a telephone traffic voice stream.

Step S518, a voice recognition request carrying the telephone traffic voice stream is sent to the server according to the updated concurrency control parameters.

Step S522, a second test log is generated according to the second voice recognition test result.

As shown in fig. 3, the test tool obtains a second test log, and calculates a second test index by combining the second test log and the labeling file, and the test tool performs test evaluation on the first voice recognition algorithm and the second voice recognition algorithm according to the test index of each voice recognition algorithm under the tested concurrency control parameter (including the test index under the concurrency control parameter and the second test index under the updated concurrency control parameter).

According to the above example, the test tool tests and evaluates the first voice recognition algorithm and the second voice recognition algorithm according to the test indexes of the first voice recognition algorithm and the second voice recognition algorithm under the tested concurrency control parameters; the test indexes of the first voice recognition algorithm and the second voice recognition algorithm under the tested concurrency control parameters comprise: the sentence error rate of the first voice recognition algorithm is 2.5%, the sentence error rate of the second voice recognition algorithm is 3%, and the second sentence error rate of the first voice recognition algorithm is 2.3%, and the second sentence error rate of the second voice recognition algorithm is 2.8%.

The embodiment of a voice recognition testing device provided in the present specification is as follows:

in the foregoing embodiments, a voice recognition testing method applied to a server is provided, and a voice recognition testing device running on the server is also provided correspondingly, which is described below with reference to the accompanying drawings.

Referring to fig. 7, a schematic diagram of a voice recognition testing apparatus according to the present embodiment is shown.

Since the apparatus embodiments correspond to the method embodiments, the description is relatively simple, and the relevant portions should be referred to the corresponding descriptions of the method embodiments provided above. The device embodiments described below are merely illustrative.

The present embodiment provides a speech recognition testing apparatus, including:

the recognition request receiving module 702 is configured to receive a voice recognition request carrying a voice stream sent by a client according to a concurrency control parameter, where the concurrency control parameter is used to control a frequency of sending the voice recognition request to the server;

an active speech segment extraction module 704, configured to respond to the speech recognition request, perform active speech detection on the speech stream based on an active speech detection parameter, and extract an active speech segment in the speech stream according to a detection result;

the voice recognition testing module 706 is configured to input the active voice segments into respective voice recognition algorithms to be tested for performing a voice recognition test;

and the recognition test result sending module 708 is configured to send a speech recognition test result of each speech recognition algorithm under the concurrency control parameter to the client.

Another embodiment of a speech recognition testing apparatus provided in the present specification is as follows:

in the foregoing embodiments, a voice recognition testing method applied to a client is provided, and a voice recognition testing device running on the client is also provided correspondingly, which is described below with reference to the accompanying drawings.

Referring to fig. 8, a schematic diagram of a voice recognition testing apparatus according to the present embodiment is shown.

a voice file reading module 802, configured to read a voice file under a target path based on a first start parameter, and convert the read voice file into a voice stream;

a voice recognition request sending module 804, configured to send a voice recognition request carrying the voice stream to a server according to the concurrence control parameter;

a recognition test result receiving module 806, configured to receive a speech recognition test result of each speech recognition algorithm to be tested under the concurrence control parameter, which is sent by the server in response to the speech recognition request;

and the test log generating module 808 is configured to generate a test log according to the voice recognition test result, so as to calculate a test index of each voice recognition algorithm under the concurrency control parameter according to the test log and a markup file corresponding to the voice file.

An embodiment of a speech recognition test apparatus provided in the present specification is as follows:

corresponding to the above-described voice recognition testing method applied to the server, based on the same technical concept, the embodiment of the application further provides a voice recognition testing device, where the voice recognition testing device is used to execute the above-provided testing method, and fig. 9 is a schematic structural diagram of a voice recognition testing device provided by the embodiment of the application.

The voice recognition test device provided in this embodiment includes:

as shown in fig. 9, the speech recognition test apparatus may have a relatively large difference due to different configurations or performances, and may include one or more processors 901 and a memory 902, where the memory 902 may store one or more storage applications or data. Wherein the memory 902 may be transient storage or persistent storage. The application programs stored in the memory 902 may include one or more modules (not shown in the figures), each of which may include a series of computer-executable instructions in the speech recognition test apparatus. Still further, the processor 901 may be arranged to communicate with the memory 902 and execute a series of computer executable instructions in the memory 902 on the speech recognition test apparatus. The speech recognition test apparatus may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input/output interfaces 905, one or more keyboards 906, and the like.

In a particular embodiment, the speech recognition test apparatus includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the speech recognition test apparatus, and the execution of the one or more programs by the one or more processors comprises computer-executable instructions for:

Another embodiment of a speech recognition test apparatus provided in the present specification is as follows:

corresponding to the above-described voice recognition test method applied to the client, based on the same technical concept, the embodiment of the application further provides a voice recognition test device, where the voice recognition test device is used to execute the above-provided test method, and fig. 10 is a schematic structural diagram of a voice recognition test device provided by the embodiment of the application.

The voice recognition test device provided in this embodiment includes:

as shown in fig. 10, the speech recognition test apparatus may have a relatively large difference due to different configurations or performances, and may include one or more processors 1001 and a memory 1002, where the memory 1002 may store one or more storage applications or data. Wherein the memory 1002 may be transient storage or persistent storage. The application program stored in memory 1002 may include one or more modules (not shown in the figures), each of which may include a series of computer-executable instructions in the speech recognition test apparatus. Still further, the processor 1001 may be configured to communicate with the memory 1002 and execute a series of computer executable instructions in the memory 1002 on the speech recognition test apparatus. The speech recognition test apparatus may also include one or more power supplies 1003, one or more wired or wireless network interfaces 1004, one or more input/output interfaces 1005, one or more keyboards 1006, etc.

An embodiment of a computer-readable storage medium provided in the present specification is as follows:

corresponding to the above-described voice recognition testing method applied to the server, the embodiment of the application further provides a computer readable storage medium based on the same technical concept.

The present embodiment provides a computer-readable storage medium for storing computer-executable instructions that, when executed by a processor, implement the following flow:

It should be noted that, in the present specification, the embodiments related to the computer readable storage medium and the embodiments related to the voice recognition testing method in the present specification are based on the same inventive concept, so that the specific implementation of the embodiments may refer to the implementation of the corresponding method, and the repetition is omitted.

Another computer-readable storage medium embodiment provided in this specification is as follows:

corresponding to the above-described voice recognition testing method applied to the client, the embodiment of the application further provides a computer readable storage medium based on the same technical concept.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-readable storage media (including, but not limited to, magnetic disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable test apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable test apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable test apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable test apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

Embodiments of the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is by way of example only and is not intended to limit the present disclosure. Various modifications and changes may occur to those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present document are intended to be included within the scope of the claims of the present document.

Claims

1. A method for testing speech recognition, applied to a server, the method comprising:

2. The method of claim 1, wherein the concurrency control parameter is set in a configuration file of the client;

the active voice detection parameters are set in a configuration file of the server.

3. The method according to claim 1, wherein the step of inputting the active speech segments into the respective speech recognition algorithms to be tested for speech recognition testing comprises:

Inputting a first active voice fragment in a fragment set formed by the active voice fragments into each voice recognition algorithm to perform voice recognition test;

if a first test result of the voice recognition algorithm for performing the voice recognition test on the first active voice segment by any one of the voice recognition algorithms is obtained, inputting a second active voice segment in the segment set into the any one of the voice recognition algorithms for performing the voice recognition test.

4. The method of claim 1, wherein the performing active voice detection on the voice stream and extracting active voice segments in the voice stream according to the detection result comprises:

if yes, extracting the local voice stream from the voice stream as the active voice fragment.

5. The method of claim 4, wherein the active voice stream is a local voice stream having a voice duration exceeding an active voice duration threshold and/or a local voice stream of continuous active voice.

6. The method according to claim 2, wherein a channel parameter is further set in the configuration file of the server, and the channel parameter is used for controlling whether the server receives the voice recognition request sent by the client;

The method further comprises the steps of:

7. A method for testing speech recognition, applied to a client, the method comprising:

8. The method of claim 7, wherein the method further comprises:

9. The method of claim 8, wherein the method further comprises:

10. A speech recognition testing apparatus, operable on a server, the apparatus comprising:

the recognition request receiving module is used for receiving a voice recognition request carrying a voice stream, which is sent by a client according to a concurrency control parameter, wherein the concurrency control parameter is used for controlling the frequency of sending the voice recognition request to the server;

and the recognition test result transmitting module is used for transmitting the voice recognition test result of each voice recognition algorithm under the concurrent test parameters to the client.

11. A speech recognition testing apparatus, operable on a client, the apparatus comprising:

12. A speech recognition testing apparatus, the apparatus comprising:

a processor; and a memory configured to store computer-executable instructions that, when executed, cause the processor to perform the speech recognition testing method of any of claims 1-6.

13. A speech recognition testing apparatus, the apparatus comprising:

a processor; and a memory configured to store computer-executable instructions that, when executed, cause the processor to perform the speech recognition testing method of any of claims 7-9.

14. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the speech recognition test method of any one of claims 1-6.

15. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the speech recognition test method of any one of claims 7-9.