CN114840422A

CN114840422A - Test method, test device, electronic equipment and storage medium

Info

Publication number: CN114840422A
Application number: CN202210467969.4A
Authority: CN
Inventors: 董斌; 戴美; 朱云峰; 张致远
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-08-02

Abstract

The embodiment of the application provides a test method, a test device, electronic equipment and a storage medium. The test method comprises the following steps: determining at least one target AI model to be called by the application to be tested; configuring a test script for performing a collaborative test on the at least one target AI model based on the service operation flow of the application to be tested; and executing the test script, and determining the running state of the at least one target AI model based on the test data generated in the process of executing the test script. According to the embodiment of the application, the AI models can be automatically tested in combination with the application, and the at least one target AI model to be called by the application to be tested can be automatically tested in a cooperative manner by executing the test script configured based on the service operation flow of the application to be tested, so that the test process is simpler, more convenient and more efficient, and the influence factors among different target AI models are considered in the cooperative test of the plurality of target AI models, and therefore, the accuracy of the test result is higher.

Description

Test method, test device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a testing method, an apparatus, an electronic device, and a storage medium.

Background

AI (Artificial Intelligence) is a subject of studying a computer to simulate some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) of a human, and mainly includes a principle that the computer realizes Intelligence, and a computer manufactured to be similar to human brain Intelligence, so that the computer can realize higher-level application. The AI can be applied to various scenes such as fingerprint identification, voiceprint identification, face identification, voice identification, semantic identification and the like.

The current AI model is generally deployed on a cloud resource pool, and an application can call the AI model on the cloud resource pool in the running process so as to execute relevant operations. However, with time, the operating conditions of the AI model may become abnormal, resulting in a decrease in the accuracy of the identification of the AI model.

In the prior art, only the running state of hardware of a cloud resource pool can be monitored, an AI model deployed on the cloud resource pool is an application process, and the running state of the AI model is usually determined by regular manual analysis for the AI model. However, the method has complex treatment process and low efficiency.

Disclosure of Invention

In view of the foregoing problems, embodiments of the present application provide a testing method, an apparatus, an electronic device, and a storage medium, which can automatically test an operating state of an AI model.

According to an aspect of embodiments of the present application, there is provided a test method, the method including:

determining at least one target artificial intelligence AI model to be called by the application to be tested;

configuring a test script for performing a collaborative test on the at least one target AI model based on the service operation flow of the application to be tested;

and executing the test script, and determining the running state of the at least one target AI model based on the test data generated in the process of executing the test script.

Optionally, after configuring a test script for performing a collaborative test on the at least one target AI model, the method further includes: configuring an execution policy of the test script, the execution policy including at least one of: starting to execute time points, execution cycles and execution times; executing the test script comprises: and executing the test script according to the execution strategy.

Optionally, in a case that the execution policy includes the start execution time, the start execution time point is a time point when the application to be tested is in an idle state.

Optionally, before configuring a test script for performing a collaborative test on the at least one target AI model, the method further includes: configuring a target sample required to be used by the at least one target AI model; the target sample comprises sample input data, expected output data of each target AI model and expected output data of the application to be tested; configuring a test script for performing a collaborative test on the at least one target AI model based on the service operation process of the application to be tested, including: configuring an execution sequence of each target AI model, information of a calling interface, information of input data, information of actual output data and information of expected output data, and information of actual output data and information of expected output data of the to-be-tested application based on the service operation flow of the to-be-tested application; the input data of the first executed target AI model is the sample input data, and the input data of the latter executed target AI model is the output data of the former executed target AI model.

Optionally, configuring a target sample that the at least one target AI model needs to use includes: determining a first executed target AI model in the at least one target AI model based on the service operation process of the application to be tested; configuring a sample needed to be used by the first executed target AI model, and making the sample as the target sample; the target sample traverses a plurality of business scenarios of the first executed target AI model.

Optionally, determining the running state of the at least one target AI model based on the test data generated during the execution of the test script includes: determining an operating state of the at least one target AI model based on test data of the target AI model generated during execution of the test script.

Optionally, determining the operating state of the at least one target AI model based on the test data of the target AI model generated during the execution of the test script includes: acquiring interface calling data of the current target AI model generated in the process of executing the test script aiming at each target AI model, wherein the interface calling data is used for indicating whether the interface calling is successful or not; and under the condition that the interface calling data of the current target AI model indicate that the interface calling fails, determining that the running state of the current target AI model is abnormal.

Optionally, determining the operating state of the at least one target AI model based on the test data of the target AI model generated during the execution of the test script includes: acquiring actual output data of the first executed target AI model generated in the process of executing the test script, and determining that the running state of the first executed target AI model is abnormal under the condition that the error between the actual output data and expected output data of the first executed target AI model exceeds a first preset error; and for each target AI model except the first executed target AI model, calculating a comprehensive error based on an error between actual output data and expected output data of the current target AI model and an error between actual output data and expected output data of the target AI model executed before the current target AI model, which are generated in the process of executing the test script, and determining that the running state of the current target AI model is abnormal under the condition that the comprehensive error exceeds a second preset error.

Optionally, determining the running state of the at least one target AI model based on the test data generated during the execution of the test script includes: and determining the running state of the at least one target AI model based on the test data of the application to be tested, which is generated in the process of executing the test script.

Optionally, determining the running state of the at least one target AI model based on the test data of the application to be tested generated during the execution of the test script includes: acquiring actual output data of the application to be tested, which is generated in the process of executing the test script; and under the condition that the error between the actual output data and the expected output data of the application to be tested exceeds a third preset error, determining that a target AI model with abnormal operation state exists in the at least one target AI model.

Optionally, the method further comprises: and determining the identification accuracy of the at least one target AI model based on the test data generated in the process of executing the test script for multiple times.

According to another aspect of embodiments of the present application, there is provided a test apparatus, the apparatus including:

the determining module is used for determining at least one target artificial intelligence AI model to be called by the application to be tested;

the first configuration module is used for configuring a test script for performing a collaborative test on the at least one target AI model based on the service operation process of the application to be tested;

and the test module is used for executing the test script and determining the running state of the at least one target AI model based on the test data generated in the process of executing the test script.

Optionally, the apparatus further comprises: a second configuration module, configured to configure an execution policy of the test script, where the execution policy includes at least one of: starting to execute time points, execution cycles and execution times; the test module is specifically configured to execute the test script according to the execution policy.

Optionally, the apparatus further comprises: a third configuration module, configured to configure a target sample that needs to be used by the at least one target AI model, where the target sample includes sample input data, expected output data of each target AI model, and expected output data of the application to be tested; the first configuration module is specifically configured to configure, based on the service operation flow of the application to be tested, an execution sequence of each target AI model, information of a call interface, information of input data, information of actual output data, information of expected output data, and information of actual output data and information of expected output data of the application to be tested; the input data of the first executed target AI model is the sample input data, and the input data of the latter executed target AI model is the output data of the former executed target AI model.

Optionally, the third configuration module is specifically configured to determine, based on the service operation flow of the application to be tested, a target AI model executed first in the at least one target AI model; configuring a sample needed to be used by the first executed target AI model, and making the sample as the target sample; the target sample traverses a plurality of business scenarios of the first executed target AI model.

Optionally, the test module comprises: a first testing unit, configured to determine an operating state of the at least one target AI model based on the test data of the target AI model generated during execution of the test script.

Optionally, the first test unit comprises: the first obtaining subunit is configured to obtain, for each target AI model, interface call data of a current target AI model generated in the process of executing the test script, where the interface call data is used to indicate whether interface call is successful; and the first determining subunit is used for determining that the running state of the current target AI model is abnormal under the condition that the interface calling data of the current target AI model indicates that the interface calling fails.

Optionally, the first test unit comprises: a second determining subunit, configured to obtain actual output data of the first executed target AI model generated during execution of the test script, and determine that an operating state of the first executed target AI model is abnormal when an error between the actual output data and expected output data of the first executed target AI model exceeds a first preset error; and a third determining subunit, configured to calculate, for each target AI model other than the first executed target AI model, a composite error based on an error between actual output data and expected output data of the current target AI model, and an error between actual output data and expected output data of a target AI model executed before the current target AI model, which are generated during execution of the test script, and determine that the operating state of the current target AI model is abnormal if the composite error exceeds a second preset error.

Optionally, the test module comprises: and the second testing unit is used for determining the running state of the at least one target AI model based on the test data of the application to be tested, which is generated in the process of executing the test script.

Optionally, the second test unit comprises: the second acquisition subunit is used for acquiring actual output data of the application to be tested, which is generated in the process of executing the test script; and the fourth determining subunit is used for determining that a target AI model with an abnormal operation state exists in the at least one target AI model when the error between the actual output data and the expected output data of the application to be tested exceeds a third preset error.

Optionally, the test module is further configured to determine an identification accuracy of the at least one target AI model based on test data generated during multiple executions of the test script.

According to another aspect of embodiments of the present application, there is provided an electronic device including: one or more processors; and one or more computer-readable storage media having instructions stored thereon; the instructions, when executed by the one or more processors, cause the processors to perform a testing method as described in any one of the above.

According to another aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform a test method as defined in any one of the above.

In the embodiment of the application, at least one target AI model to be called by the application to be tested is determined; configuring a test script for performing a collaborative test on the at least one target AI model based on the service operation flow of the application to be tested; executing the test script; and determining the running state of the at least one target AI model based on the test data generated in the process of executing the test script. Therefore, in the embodiment of the application, the AI models can be automatically tested in combination with the application, and based on the service operation flow of the application to be tested, the test script for performing the cooperative test on the at least one target AI model to be called by the application to be tested can be configured, so that the at least one target AI model to be called by the application to be tested is automatically subjected to the cooperative test by executing the test script, the test process is simpler, more convenient and more efficient, and the influence factors among different target AI models are considered in the cooperative test of the plurality of target AI models, so that the accuracy of the test result is higher.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some drawings of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a system according to an embodiment of the present application.

FIG. 2 is a flow chart illustrating steps of a testing method according to an embodiment of the present application.

FIG. 3 is a flow chart of steps of another testing method of an embodiment of the present application.

FIG. 4 is a schematic diagram of a configuration test script according to an embodiment of the present application.

Fig. 5 is a block diagram of a test apparatus according to an embodiment of the present application.

Fig. 6 is a block diagram of another test apparatus according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a system structure diagram of the embodiment of the present application is shown.

As shown in fig. 1, in the embodiment of the present application, the testing apparatus may be communicatively connected to an AI model (also referred to as an AI capability engine) and an intelligent application. The testing device can call the corresponding AI model and the intelligent application according to the service operation flow of the intelligent application.

The AI model may include a Speech Recognition model (such as an ASR (Automatic Speech Recognition) model), a semantic Recognition model (such as an NLP (Natural Language Processing) model), a voiceprint Recognition model (such as an SPR (Speaker Recognition) model), a face Recognition model, a fingerprint Recognition model, and so on, as shown in fig. 1. The smart applications may include smart application 1, smart application 2 … … smart application n shown in fig. 1.

The test method in the embodiment of the present application is performed by the test apparatus shown in fig. 1, and is described in detail below.

Referring to fig. 2, a flow chart of steps of a testing method of an embodiment of the present application is shown.

As shown in fig. 2, the testing method may include the steps of:

step 201, at least one target AI model to be called by the application to be tested is determined.

The application to be tested may be any of the intelligent applications shown in fig. 1. The intelligent application can be any APP (application) or the like that needs to invoke the AI model.

Illustratively, for the application to be tested, at least one target AI model to be called by the application to be tested may be determined based on the business operation flow of the application to be tested. Specifically, the AI model included in the service operation flow of the application to be tested is the target AI model that needs to be called by the application to be tested. According to the actual situation, the target AI model is one or more.

For example, a service operation flow of a certain application to be tested includes a process of calling a speech recognition model to perform speech recognition on speech data and then calling a semantic recognition model to perform semantic recognition on a speech recognition result, so that a target AI model to be called by the application to be tested is the speech recognition model and the semantic recognition model.

Step 202, configuring a test script for performing a collaborative test on the at least one target AI model based on the service operation flow of the application to be tested.

In this embodiment, the to-be-tested application is taken as a whole, and a test script for performing a collaborative test on at least one target AI model that needs to be called by the to-be-tested application is configured, so that the at least one target AI model can be tested together. The specific process will be discussed in detail in the following examples.

Step 203, executing the test script, and determining the running state of the at least one target AI model based on the test data generated in the process of executing the test script.

And performing cooperative test on at least one target AI model to be called by the application to be tested by executing the test script. The operating state of the at least one target AI model may be determined based on test data generated during the test. The specific process will be discussed in detail in the following examples.

In the embodiment of the application, the AI models can be automatically tested in combination with the application, and based on the service operation flow of the application to be tested, the test script for performing the cooperative test on the at least one target AI model to be called by the application to be tested can be configured, so that the at least one target AI model to be called by the application to be tested is automatically subjected to the cooperative test by executing the test script, the test process is simpler, more convenient and more efficient, and the influence factors among different target AI models are considered in the cooperative test of the plurality of target AI models, so that the accuracy of the test result is higher.

Referring to FIG. 3, a flow chart of steps of another testing method of an embodiment of the present application is shown.

As shown in fig. 3, the testing method may include the steps of:

step 301, at least one target AI model to be called by the application to be tested is determined.

Step 302, configuring a target sample needed to be used by the at least one target AI model.

Illustratively, the process of configuring the target samples to be used by the at least one target AI model may include: determining a first executed target AI model in the at least one target AI model based on the service operation process of the application to be tested; and configuring a sample needed to be used by the first executed target AI model to be used as the target sample. The testing device may provide a sample database into which the target samples may be stored.

Based on the business operation flow of the application to be tested, at least one target AI model to be called by the application to be tested and the execution sequence of each target AI model in the business operation flow can be determined, and a sample required to be used by the first executed target AI model in the business operation flow is determined as a target sample required to be used by the at least one target AI model.

The target sample can traverse various service scenes of the first executed target AI model and comprises the service types and the leaf nodes of the applications to be tested, so that the diversity of the target sample is increased, and the testing accuracy is improved.

The target samples may include sample input data, expected output data for each target AI model, and expected output data for the application to be tested. Wherein the sample input data is different according to a type of the first executed AI model. Illustratively, if the first performed AI model is a speech recognition model, then the sample input data is speech data (e.g., recorded data); if the first executed AI model is a semantic recognition model, the sample input data is text data; the first performed AI model is a face recognition model, the sample input data is image data, and so on.

Step 303, configuring a test script for performing a collaborative test on the at least one target AI model based on the service operation flow of the application to be tested.

Illustratively, the process of configuring a test script for performing a collaborative test on the at least one target AI model based on the service operation flow of the application to be tested may include: and configuring the execution sequence of each target AI model, the information of a calling interface, the information of input data, the information of actual output data and the information of expected output data, and the information of the actual output data and the information of the expected output data of the application to be tested based on the service operation flow of the application to be tested. Wherein the input data of the first executed target AI model is the sample input data, and starting from the second executed target AI model, the input data of the latter executed target AI model is the output data of the former executed target AI model.

First, a data template is defined according to the interface of each AI model and the interface of each application. The data template content may include: information of a calling interface of the AI model (including an IP (Internet Protocol) address of the calling interface of the AI model, etc.), information of input data (including a location, a file name, etc. of input data of the AI model), information of actual output data (including a location, a file name, etc. of actual output data of the AI model), and information of expected output data (including a location, a file name, etc. of expected output data of the AI model), and information of a calling interface of an application (including an IP address, etc. of the calling interface of the application), information of actual output data (including a location, a file name, etc. of actual output data of the application), and information of expected output data (including a location, a file name, etc. of expected output data of the application).

And then configuring the test script into a graphical interface, wherein the graphical interface is realized in a module building mode, in the test script configuration interface, the flow of the test script is defined based on the service running flow of the application to be tested, and each node in the flow of the test script is configured according to the data template. The flow of the test script may include: the execution sequence of each target AI model, the information of the calling interface, the information of the input data, the information of the actual output data and the information of the expected output data, and the information of the actual output data and the information of the expected output data of the application to be tested.

Illustratively, in consideration of differences of interfaces of AI models of different types and manufacturers, the testing apparatus may provide an interface calling module, and the interface calling module may package a calling interface of an AI model as a standard calling interface and provide a visual standard calling interface.

The following examples are given. FIG. 4 is a schematic diagram of a configuration test script according to an embodiment of the present application. As shown in fig. 4, the application flow includes speech recognition → semantic recognition → application output. The contents of the data template adopted by the voice recognition include a voice recognition model IP, recording data (i.e., input data of the voice recognition model), a voice recognition return result (i.e., actual output data of the voice recognition model), and a voice recognition expected result (i.e., expected output data of the voice recognition model); the content of the data template adopted by the semantic recognition comprises a semantic recognition model IP, a voice recognition return result (namely input data of the semantic recognition model), a semantic recognition return result (namely actual output data of the semantic recognition model) and a semantic recognition expected result (namely expected output data of the semantic recognition model); the content of the data template adopted by the application output comprises the actual output data of the application and the expected output data of the application.

Step 304, configuring the execution strategy of the test script.

Illustratively, the execution policy of the test script may include, but is not limited to, at least one of: execution mode (including manual execution, automatic execution), starting execution time point, execution cycle, execution times, and the like.

Wherein, when the execution policy includes the start execution time, the start execution time point is a time point when the application to be tested is in an idle state. For example, the start execution time point may be set to two hours in the morning, three hours in the morning, or the like. The mode can avoid the peak period of use of the application to be tested and reduce the pressure of the system.

And 305, executing the test script according to the execution strategy, and determining the running state of the at least one target AI model based on the test data generated in the process of executing the test script.

And automatically executing the test script according to the configured execution strategy, calling the AI model through a calling interface of the AI model, loading a target sample to be used, and acquiring and storing test data generated in the process of executing the test script.

In an alternative embodiment, the AI model may be automatically tested at the device level. In this manner, the operating state of the at least one target AI model is determined based on the test data of the target AI model generated during execution of the test script.

Illustratively, the process of determining the operation state of the at least one target AI model based on the test data of the target AI model generated during the execution of the test script may include the following steps a1 to a 2:

step a1, for each target AI model, obtaining interface call data of the current target AI model generated in the process of executing the test script.

And the interface calling data of the current target AI model is used for indicating whether the interface calling of the current target AI model is successful or not.

Step a2, in case that the interface call data of the current target AI model indicates that the interface call has failed, determining that the running state of the current target AI model is abnormal.

And aiming at any target AI model, if the interface of the target AI model is failed to call, indicating that the running state of the target AI model is abnormal.

Illustratively, the process of determining the operation state of the at least one target AI model based on the test data of the target AI model generated during the execution of the test script may include the following steps B1 to B2:

and step B1, acquiring actual output data of the first executed target AI model generated in the process of executing the test script, and determining that the running state of the first executed target AI model is abnormal when an error between the actual output data and expected output data of the first executed target AI model exceeds a first preset error.

Since the first executed target AI model is not affected by other target AI models, the operating state of the first executed target AI model may be determined based on the error between the actual output data and the expected output data of the first executed target AI model.

The error between the actual output data and the expected output data of the first executed target AI model may be a difference between the actual output data and the expected output data of the first executed target AI model. In the case where the error corresponding to the first executed target AI model exceeds the first preset error, it may be determined that the operation state of the first executed target AI model is abnormal. The specific value of the first preset error may be set according to actual conditions, and this embodiment does not limit this.

And step B2, for each target AI model except the first executed target AI model, calculating a composite error based on an error between actual output data and expected output data of the current target AI model and an error between actual output data and expected output data of a target AI model executed before the current target AI model, which are generated during the execution of the test script, and determining that the operation state of the current target AI model is abnormal if the composite error exceeds a second preset error.

Since the input data of the target AI model executed later is the output data of the target AI model executed earlier, starting from the second executed target AI model, the current target AI model may be affected by the target AI model executed earlier. Therefore, starting from the second executed target AI model, a composite error is calculated based on the error between the actual output data and the expected output data of the current target AI model and the error between the actual output data and the expected output data of the target AI model (which may be one or more than one) executed before the current target AI model, and the operating state of the current target AI model is determined based on the composite error.

The error between the actual output data and the expected output data of the current target AI model may be a difference between the actual output data of the current target AI model and the expected output data of the current target AI model.

Illustratively, the composite error may be a result of performing a weighted calculation on an error between actual output data and expected output data of the current target AI model and an error between actual output data and expected output data of a target AI model executed before the current target AI model. In the weighting calculation process, the weight of the error corresponding to each target AI model can be set according to the actual situation.

And determining that the running state of the current target AI model is abnormal under the condition that the comprehensive error corresponding to the current target AI model exceeds a second preset error. The specific value of the second preset error may be set according to actual conditions, and this embodiment does not limit this.

In another alternative embodiment, the AI model may be automatically tested for service level. In this way, the operating state of the at least one target AI model is determined based on the test data of the application to be tested generated during the execution of the test script.

Illustratively, the process of determining the operation state of the at least one target AI model based on the test data of the application to be tested generated during the execution of the test script may include the following steps C1 to C2:

and step C1, acquiring the actual output data of the application to be tested generated in the process of executing the test script.

And step C2, determining that the target AI model with abnormal operation state exists in the at least one target AI model when the error between the actual output data and the expected output data of the application to be tested exceeds a third preset error.

In the case that an error between actual output data and expected output data of an application to be tested exceeds a third preset error, a target AI model with an abnormal operating state may be determined. As to which target AI model is abnormal in operation state, the test data of the target AI model may be analyzed and determined in the above manner by obtaining the test data of each target AI model as described above.

The specific value of the third preset error may be set according to actual conditions, and this embodiment does not limit this.

Step 306, generating an alarm in the presence of the target AI model with abnormal operating conditions.

The alarm information of the alarm here may include an identifier of the application to be tested, an identifier of the target AI model in which the operation state is abnormal, an abnormal type, an alarm level, and the like.

Illustratively, the exception types may include, but are not limited to, interface call failures, high model identification errors, high application output errors, and the like. The alert levels include, but are not limited to, general alerts, important alerts, and the like. For example, an interface call failure generates an important alarm, a model identification error is higher to generate a general alarm, an application output error is higher to generate an important alarm, and the like.

Step 307, determining the identification accuracy of the at least one target AI model based on the test data generated during the multiple executions of the test script.

For each target AI model, the identification accuracy of the target AI model can be calculated based on the test data of the target AI model generated in the process of executing the test script for multiple times. The accuracy rate may be a ratio of the number of times the recognition result is accurate to the total number of times of recognition.

And 308, generating an alarm under the condition that the identification accuracy of any one target AI model is less than the preset accuracy.

The alarm information to be alarmed here may include an identifier of the application to be tested, an identifier of the target AI model whose recognition accuracy is smaller than a preset accuracy, a recognition accuracy of the target AI model whose recognition accuracy is smaller than the preset accuracy, an alarm level, and the like. Illustratively, the lower the identification accuracy of the target AI model, the higher the level of alarm generated.

The following examples are given.

Taking an application scenario that an intelligent voice customer service robot inquires about a certain communication package service as an example, the main service operation flow is as follows:

(1) the user calls in through the call center.

(2) The user consults a certain communication package, calls the voice recognition model to perform voice recognition on the voice of the user, and converts the voice of the user into characters.

(3) And calling a semantic recognition model to perform semantic recognition on the converted characters, judging that the user consults a certain communication package, and returning a Code (identification Code) corresponding to the communication package by the semantic recognition model.

(4) And inquiring a service system according To the Code To obtain the service condition of the user package, and calling TTS (Text To Speech) for broadcasting.

The service operation process of the application uses two key AI models, namely a speech recognition model and a semantic recognition model, as key monitoring nodes of the process, while the service quality of the TTS engine generally does not fluctuate and can not be used as monitoring nodes.

The test procedure was as follows:

(1) and determining a target AI model to be called as a voice recognition model and a semantic recognition model based on the service operation flow of the application.

(2) The voice recognition model is the first executed target AI model, and the target sample needed to be used by the voice recognition model is configured.

The target sample can be recorded data, the recorded data is recorded data of different questions, different backgrounds and different voices selected from accumulated user communication package inquiry recording training samples, the expected output data of the voice recognition model is a correct transcription result corresponding to the recorded data, and the expected output data of the semantic recognition model is a Code of the communication package.

(3) And configuring the test script.

I.e., scheduling execution flow, including setting an execution order, setting information of a calling interface, setting information of a target sample that needs to be loaded, setting information of actual output data, setting information of desired output data, and the like.

(4) And configuring an execution strategy.

For example, the configuration is 2: 00 automatic execution, etc.

(5) And generating an alarm when the interface of the target AI model fails to be called in the test script execution process, and generating an alarm when the identification error of the target AI model exceeds a preset error.

(6) And after the test script is executed for one period (namely, multiple times), counting the identification accuracy of the target AI model, wherein the identification accuracy is lower than the preset accuracy and generating an alarm.

An automatic testing device independent of the AI model is constructed in the embodiment of the application, dynamic arrangement of the intelligent application process and dynamic configuration of the data sample are provided, and the running state and the identification accuracy of each AI model in the intelligent application process are monitored by periodically and automatically executing the testing script. The test of the AI model supports both input and output determination according to the application end-to-end flow and input and output determination according to the AI model.

The method and the device solve the problems that the running state of the AI model cannot be actively monitored and the identification accuracy rate cannot be achieved, can be applied to actively and automatically monitoring the deployed AI model of the existing network, and reduce the workload of manual regular assessment. The testing device is independent of the existing network AI model and the intelligent application system, and a data acquisition module is not required to be arranged in the systems, so that the flexible arrangement and dynamic adjustment of the service testing process are realized, and the development change of the service process is better adapted. The method supports multiple automatic and manual execution modes, can also be provided for operation and maintenance personnel, and can collect the execution data of the key process nodes by configuring the actual output data and the expected output data of a user complaint scene and then executing a test script when a user complaints, thereby realizing the quick positioning of problems.

Referring to fig. 5, a block diagram of a test apparatus according to an embodiment of the present application is shown.

As shown in fig. 5, the test apparatus may include the following modules:

a determining module 501, configured to determine at least one target artificial intelligence AI model that needs to be called by an application to be tested;

a first configuration module 502, configured to configure a test script for performing a collaborative test on the at least one target AI model based on the service operation flow of the application to be tested;

a test module 503, configured to execute the test script, and determine an operation state of the at least one target AI model based on test data generated during the execution of the test script.

Referring to fig. 6, a block diagram of another testing apparatus according to an embodiment of the present application is shown.

As shown in fig. 6, the test apparatus may include the following modules:

a determining module 601, configured to determine at least one target artificial intelligence AI model that needs to be called by an application to be tested;

a first configuration module 602, configured to configure a test script for performing a collaborative test on the at least one target AI model based on the service operation flow of the application to be tested;

the test module 603 is configured to execute the test script, and determine an operating state of the at least one target AI model based on test data generated during the execution of the test script.

Optionally, the apparatus further comprises: a second configuration module 604, configured to configure an execution policy of the test script, where the execution policy includes at least one of: starting to execute time points, execution cycles and execution times; the test module 603 is specifically configured to execute the test script according to the execution policy.

Optionally, the apparatus further comprises: a third configuration module 605, configured to configure a target sample that needs to be used by the at least one target AI model, where the target sample includes sample input data, expected output data of each target AI model, and expected output data of the application to be tested; the first configuration module 602 is specifically configured to configure, based on the service operation flow of the application to be tested, an execution sequence of each target AI model, information of a call interface, information of input data, information of actual output data, information of expected output data, and information of actual output data and information of expected output data of the application to be tested; the input data of the first executed target AI model is the sample input data, and the input data of the latter executed target AI model is the output data of the former executed target AI model.

Optionally, the third configuration module 605 is specifically configured to determine, based on the service operation flow of the application to be tested, a target AI model executed first in the at least one target AI model; configuring a sample needed to be used by the first executed target AI model, and making the sample as the target sample; the target sample traverses a plurality of business scenarios of the first executed target AI model.

Optionally, the test module 603 includes: a first testing unit, configured to determine an operating state of the at least one target AI model based on the test data of the target AI model generated during execution of the test script.

Optionally, the first test unit comprises: a second determining subunit, configured to obtain actual output data of the first executed target AI model generated during execution of the test script, and determine that an operation state of the first executed target AI model is abnormal when an error between the actual output data and expected output data of the first executed target AI model exceeds a first preset error; and a third determining subunit, configured to calculate, for each target AI model other than the first executed target AI model, a composite error based on an error between actual output data and expected output data of the current target AI model, and an error between actual output data and expected output data of a target AI model executed before the current target AI model, which are generated during execution of the test script, and determine that the operating state of the current target AI model is abnormal if the composite error exceeds a second preset error.

Optionally, the test module 603 includes: and the second testing unit is used for determining the running state of the at least one target AI model based on the test data of the application to be tested, which is generated in the process of executing the test script.

Optionally, the test module 603 is further configured to determine an identification accuracy of the at least one target AI model based on test data generated during multiple times of executing the test script.

Optionally, the apparatus further comprises: and the sample database 606 is used for storing samples.

Optionally, the apparatus further comprises: and the interface calling module 607 is used for packaging a standard calling interface of the AI model.

Optionally, the apparatus further comprises: the alarm module 608 is configured to generate an alarm when a target AI model with an abnormal operating state exists, and generate an alarm when the recognition accuracy of any one target AI model is less than a preset accuracy.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

In an embodiment of the application, an electronic device is also provided. The electronic device may include one or more processors, and one or more computer-readable storage media having instructions, such as an application program, stored thereon. The instructions, when executed by the one or more processors, cause the processors to perform the testing method of any of the embodiments described above.

Referring to fig. 7, a schematic diagram of an electronic device structure according to an embodiment of the present application is shown. As shown in fig. 7, the electronic device includes a processor 701, a communication interface 702, a memory 703, and a communication bus 704. The processor 701, the communication interface 702, and the memory 703 complete communication with each other through the communication bus 704.

A memory 703 for storing a computer program.

The processor 701 is configured to implement the testing method according to any of the embodiments described above when executing the program stored in the memory 703.

The communication interface 702 is used for communication between the above-described electronic apparatus and other apparatuses.

The communication bus 704 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The above mentioned processors 701 may include, but are not limited to: a Central Processing Unit (CPU), a Network Processor (NP), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and so on.

The aforementioned memory 703 may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), Compact Disc Read Only Memory (CD-ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), hard disk, floppy disk, flash Memory, and the like.

In an embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon a computer program executable by a processor of an electronic device, the computer program, when executed by the processor, causing the processor to execute the information processing method according to any one of the above embodiments.

The embodiments in the present specification are related to each other, and all the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "include", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or terminal device including a series of elements includes not only those elements but also other elements not explicitly listed or inherent to such process, method, article, or terminal device. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM, RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Those of ordinary skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed in the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. In view of the above, the description should not be taken as limiting the application.

Claims

1. A method of testing, the method comprising:

2. The method of claim 1,

after configuring the test script for performing the collaborative test on the at least one target AI model, the method further includes: configuring an execution policy of the test script, the execution policy including at least one of: starting to execute time points, execution cycles and execution times;

executing the test script comprises: and executing the test script according to the execution strategy.

3. The method of claim 2, wherein in the case that the execution policy includes the start execution time, the start execution time point is a time point at which the application to be tested is in an idle state.

4. The method of claim 1,

before configuring a test script for performing a collaborative test on the at least one target AI model, the method further includes: configuring a target sample required to be used by the at least one target AI model; the target sample comprises sample input data, expected output data of each target AI model and expected output data of the application to be tested;

configuring a test script for performing a collaborative test on the at least one target AI model based on the service operation flow of the application to be tested, including: configuring an execution sequence of each target AI model, information of a calling interface, information of input data, information of actual output data and information of expected output data, and information of actual output data and information of expected output data of the to-be-tested application based on the service operation flow of the to-be-tested application;

the input data of the first executed target AI model is the sample input data, and the input data of the latter executed target AI model is the output data of the former executed target AI model.

5. The method of claim 4, wherein configuring the target samples needed to be used by the at least one target AI model comprises:

determining a first executed target AI model in the at least one target AI model based on the service operation process of the application to be tested;

configuring a sample needed to be used by the first executed target AI model, and making the sample as the target sample; the target sample traverses a plurality of business scenarios of the first executed target AI model.

6. The method of claim 4, wherein determining the operational state of the at least one target AI model based on test data generated during execution of the test script comprises:

determining an operating state of the at least one target AI model based on test data of the target AI model generated during execution of the test script.

7. The method of claim 6, wherein determining the operational state of the at least one target AI model based on the test data for the target AI model generated during the execution of the test script comprises:

acquiring interface calling data of the current target AI model generated in the process of executing the test script aiming at each target AI model, wherein the interface calling data is used for indicating whether the interface calling is successful or not;

and determining that the running state of the current target AI model is abnormal under the condition that the interface calling data of the current target AI model indicates that the interface calling fails.

8. The method of claim 6, wherein determining the operational state of the at least one target AI model based on the test data for the target AI model generated during the execution of the test script comprises:

acquiring actual output data of the first executed target AI model generated in the process of executing the test script, and determining that the running state of the first executed target AI model is abnormal under the condition that the error between the actual output data and expected output data of the first executed target AI model exceeds a first preset error;

and calculating a comprehensive error for each target AI model except the first executed target AI model based on an error between the actual output data and the expected output data of the current target AI model and an error between the actual output data and the expected output data of the target AI model executed before the current target AI model, wherein the error is generated in the process of executing the test script, and the abnormal operation state of the current target AI model is determined in the case that the comprehensive error exceeds a second preset error.

9. The method of claim 4, wherein determining the operational state of the at least one target AI model based on test data generated during execution of the test script comprises:

and determining the running state of the at least one target AI model based on the test data of the application to be tested, which is generated in the process of executing the test script.

10. The method of claim 9, wherein determining the operational state of the at least one target AI model based on test data of the application under test generated during execution of the test script comprises:

acquiring actual output data of the application to be tested, which is generated in the process of executing the test script;

and under the condition that the error between the actual output data and the expected output data of the application to be tested exceeds a third preset error, determining that a target AI model with abnormal operation state exists in the at least one target AI model.

11. The method of claim 1, further comprising:

and determining the identification accuracy of the at least one target AI model based on the test data generated in the process of executing the test script for multiple times.

12. A test apparatus, the apparatus comprising:

13. An electronic device, comprising:

one or more processors; and

one or more computer-readable storage media having instructions stored thereon;

the instructions, when executed by the one or more processors, cause the processors to perform a testing method according to any one of claims 1 to 11.

14. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, causes the processor to carry out the testing method of any one of claims 1 to 11.