CN114443483A

CN114443483A - Test method and device of artificial intelligence system, electronic equipment and medium

Info

Publication number: CN114443483A
Application number: CN202210104967.9A
Authority: CN
Inventors: 孙帅; 李凡平; 石柱国
Original assignee: ISSA Technology Co Ltd
Current assignee: ISSA Technology Co Ltd
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-06

Abstract

The application provides a test method and device of an artificial intelligence system, electronic equipment and a medium. After model decision factors of at least two deep learning models are obtained, test data corresponding to the model decision factors of the at least two deep learning models and corresponding real output results are obtained; for each deep learning model, testing the deep learning model based on test data input by the deep learning model to obtain a test result output by the deep learning model and a current model index; determining test state information of the deep learning model based on the current model index and the historical model index of the deep learning model; determining the prediction behavior of the deep learning model according to a test thermodynamic diagram generated by test data input by the deep learning model and an output test result and a corresponding real output result; and generating a test report. The method can accurately position the data model to be optimized in the AI system, thereby realizing the accuracy of optimizing the AI system.

Description

Test method and device of artificial intelligence system, electronic equipment and medium

Technical Field

The application relates to the technical field of model testing, in particular to a testing method and device of an artificial intelligence system, electronic equipment and a medium.

Background

With the development of scientific technology, Artificial Intelligence (AI) has been applied to various fields of society, such as smart cities, smart finance, smart homes, and the like. Deep learning is one of the important directions in the field of artificial intelligence. With the increasing popularity of deep learning model-based applications and the complexity of its own technology, the quality of deep learning applications is more and more problematic. This is mainly reflected in data quality, feature engineering, model effect, product function, etc. According to IBM's estimate of data cost in 2016, the annual economic cost is about $ 3.1 trillion because of poor data quality. Therefore, the quality guarantee of the deep learning application program is an important ring for the application of the deep learning application program in the service place. For the test of traditional software and internet products, the test method and the quality guarantee system are relatively mature. While testing for artificial intelligence systems is a different, newer direction.

The AI system test is to construct a batch of trained data sets and data sets without data training under the known and predicted conditions, and then to obtain whether the number of samples falling in the data sets is within the known interval range through data model calculation.

In the industry, the tests of the AI system mainly comprise end-to-end black box tests, the operation mechanism of a data model in the system cannot be observed, and the AI system is formed by nesting and combining deep learning models layer by layer, so that the performance reduction of an algorithm of a certain data model cannot be accurately positioned, and the AI system cannot be accurately optimized.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method, an apparatus, an electronic device, and a medium for testing an artificial intelligence system, so as to solve the above problems in the prior art, and accurately locate a data model to be optimized in an AI system, thereby achieving accuracy of optimizing the AI system.

In a first aspect, a method for testing an artificial intelligence system is provided, and the method may include:

performing code instrumentation on at least two deep learning models in an artificial intelligence system to be tested to obtain model decision factors of the at least two deep learning models; the model decision factors comprise execution conditions, input data forms and expected decision information;

obtaining test data corresponding to model decision factors of the at least two deep learning models and corresponding real output results;

for each deep learning model, testing the deep learning model based on test data corresponding to the deep learning model to obtain a test result output by the deep learning model and a current model index; the test result comprises an output result which is the same as the expected decision information of the deep learning model and an output result which is different from the expected decision information;

determining test state information of the deep learning model based on the current model index and a historical model index of the deep learning model; the historical model index is a model index of a previous version of the current version of the deep learning model;

determining the prediction behavior of the deep learning model according to a test thermodynamic diagram corresponding to the test result output by the deep learning model and a real output result corresponding to the test data;

generating a test report, wherein the test report comprises the test state information, the test result and the corresponding prediction behaviors of the at least two deep learning models.

In one implementation, after generating the test report, the method further comprises:

and sending a test report to an optimization terminal so that the optimization terminal optimizes the corresponding deep learning model according to the optimization instruction of the optimization personnel.

In an implementation manner, code instrumentation is performed on at least two deep learning models in an artificial intelligence system to be tested, and model decision factors of the at least two deep learning models are obtained, including:

carrying out model scheduling analysis on the artificial intelligence system to be tested to obtain scheduling paths of deep learning models in the artificial intelligence system to be tested;

and adding probe information before the input layer and after the output layer in each deep learning model according to the scheduling path to obtain model decision factors of the at least two deep learning models.

In an implementation manner, after obtaining the test data corresponding to the model decision factors of the at least two deep learning models, the method further includes:

determining a model execution sequence corresponding to the at least two deep learning models based on decision factors of the at least two deep learning models;

for each deep learning model, testing the deep learning model based on the test data corresponding to the deep learning model to obtain the test result output by the deep learning model and the current model index, and the method comprises the following steps:

testing the current deep learning model based on the test data corresponding to the current deep learning model according to the model execution sequence to obtain a test result output by the current deep learning model and a current model index; the current deep learning model is the first untested deep learning model in the at least two deep learning models according to the model execution sequence.

In one implementable manner, the test state information includes a test state;

determining test state information of the deep learning model based on the current model index and the historical model index of the deep learning model, including:

comparing the current model metric and the historical model metric for any metric type;

if the current model index is larger than the historical model index, determining that the test state of the deep learning model is a first state, wherein the first state represents that the performance of the current version is higher than that of the previous version;

and if the current model index is smaller than the historical model index, determining that the test state of the deep learning model is a second state, wherein the second state represents that the performance of the current version is lower than that of the previous version.

In one implementable manner, the test state information further includes a state value for the test state; the state value characterizes a degree of gap between the performance of the current version and the performance of the previous version.

In one way that can be achieved,

determining the predicted behavior of the deep learning model according to the test thermodynamic diagram corresponding to the test result output by the deep learning model and the real output result corresponding to the test data, wherein the determining comprises the following steps:

if the prediction result corresponding to the prediction region of the deep learning model in the test thermodynamic diagram is different from the corresponding real output result, determining that the prediction behavior of the deep learning model is abnormal prediction behavior;

and if the prediction result corresponding to the prediction region of the deep learning model in the test thermodynamic diagram is the same as the corresponding real output result, determining that the prediction behavior of the deep learning model is a normal prediction behavior.

In a second aspect, there is provided a testing apparatus for an artificial intelligence system, the apparatus may include:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for performing code instrumentation on at least two deep learning models in an artificial intelligence system to be tested to obtain model decision factors of the at least two deep learning models; the model decision factors comprise execution conditions, input data forms and expected decision information;

obtaining test data corresponding to the model decision factors of the at least two deep learning models and corresponding real output results;

the testing unit is used for testing each deep learning model based on the corresponding testing data of the deep learning model to obtain the testing result output by the deep learning model and the current model index; the test result comprises an output result which is the same as the expected decision information of the deep learning model and an output result which is different from the expected decision information;

a determination unit configured to determine test state information of the deep learning model based on the current model index and a historical model index of the deep learning model; the historical model index is a model index of a previous version of the current version of the deep learning model;

and the generating unit is used for generating a test report, and the test report comprises the test state information, the test result and the corresponding prediction behaviors of the at least two deep learning models.

In one implementable manner, the apparatus further comprises: a transmitting unit;

the sending unit is used for sending a test report to the optimization terminal so that the optimization terminal can optimize the corresponding deep learning model according to the optimization instruction of the optimization personnel.

In an implementable manner, the obtaining unit is specifically configured to:

In an implementation manner, the determining unit is further configured to determine, based on decision factors of the at least two deep learning models, a model execution order corresponding to the at least two deep learning models;

the test unit is specifically configured to test the current deep learning model based on test data corresponding to the current deep learning model according to the model execution sequence, and obtain a test result and a current model index output by the current deep learning model; the current deep learning model is the first untested deep learning model in the at least two deep learning models according to the model execution sequence.

In one implementable manner, the test state information includes a test state; the determining unit is specifically configured to:

In an implementable manner, the determining unit is further specifically configured to:

In a third aspect, an electronic device is provided, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

a processor adapted to perform the method steps of any of the above first aspects when executing a program stored in the memory.

In a fourth aspect, a computer-readable storage medium is provided, having stored therein a computer program which, when executed by a processor, performs the method steps of any of the above first aspects.

According to the test method of the artificial intelligence system, after code instrumentation is carried out on at least two deep learning models in the artificial intelligence system to be tested to obtain model decision factors of the at least two deep learning models, the model decision factors comprise execution conditions, input data forms and expected decision information, and test data corresponding to the model decision factors of the at least two deep learning models and corresponding real output results are obtained; for each deep learning model, inputting corresponding test data based on the deep learning model, testing the deep learning model, and obtaining a test result output by the deep learning model and a current model index; the test result comprises an output result which is the same as the expected decision information of the deep learning model and an output result which is different from the expected decision information; determining test state information of the deep learning model based on the current model index and the historical model index of the deep learning model; the historical model index is a model index corresponding to the same test data in the previous version of the current version of the deep learning model; determining the prediction behavior of the deep learning model according to a test thermodynamic diagram generated by test data input by the deep learning model and an output test result and a real output result corresponding to the test data; and generating a test report, wherein the test report comprises the test state information, the test result and the corresponding prediction behaviors of the at least two deep learning models. Compared with the prior art, the method disassembles the deep learning models forming the AI system, performs independent test on each deep learning model, and can accurately position the data model to be optimized in the AI system, thereby realizing the accuracy of optimizing the AI system.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a system architecture diagram of a testing method using an artificial intelligence system according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a testing method of an artificial intelligence system according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a test thermodynamic diagram provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of a testing apparatus of an artificial intelligence system according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without any creative effort belong to the protection scope of the present application.

The testing method of the artificial intelligence system provided by the embodiment of the application can be applied to the system architecture shown in fig. 1, and as shown in fig. 1, the system can include a testing server and an optimization terminal in communication connection with the testing server.

The test server can be an application server or a cloud server; the optimization Terminal may be a Mobile phone, a smart phone, a laptop, a digital broadcast receiver, a User Equipment (UE) such as a Personal Digital Assistant (PDA), a tablet computer (PAD), etc., a handheld device, a vehicle-mounted device, a wearable device, a computing device or other processing device connected to a wireless modem, a Mobile Station (MS), a Mobile Terminal (Mobile Terminal), etc., capable of receiving an operation instruction.

The preferred embodiments of the present application will be described in conjunction with the drawings of the specification, it should be understood that the preferred embodiments described herein are only for illustrating and explaining the present application, and are not intended to limit the present application, and the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Fig. 2 is a schematic flowchart of a testing method of an artificial intelligence system according to an embodiment of the present disclosure. As shown in fig. 2, the method may include:

step S210, code instrumentation is carried out on at least two deep learning models in the artificial intelligence system to be tested, and model decision factors of the at least two deep learning models are obtained.

In specific implementation, model scheduling analysis is carried out on the artificial intelligence system to be tested to obtain scheduling paths of deep learning models in the artificial intelligence system to be tested; and adding probe information before the input layer and after the output layer in each deep learning model according to the scheduling path to obtain model decision factors of at least two deep learning models. Model decision factors include execution conditions, input data form, and expected decision information.

Specifically, firstly, lexical analysis and syntactic analysis, that is, model scheduling analysis, need to be performed on a program code of the artificial intelligence system to be tested, so as to obtain a model scheduling path. For example, the model scheduling path of the artificial intelligence system to be tested may be: the method comprises the steps of target vehicle detection, license plate frame identification and license plate information identification, namely, the target vehicle is detected firstly, then the license plate frame is detected on the target vehicle, and finally the license plate information in the license plate frame is identified.

And then, according to the model scheduling path, performing instrumentation (adding probe information) at the decision-making place of each detection algorithm, namely adding probe information into the program code blocks before the input and after the output of each model to obtain the model decision factors of each model.

It should be noted that the model decision factor may further include information that can determine the model decision process, such as model parameters and a mean value of model input data, that is, information that affects the model processing process.

According to the implementation mode, the whole artificial intelligence system is disassembled into at least two deep learning models in a pile inserting mode, so that each deep learning model is tested to locate the problem of the artificial intelligence system.

And S220, obtaining test data corresponding to model decision factors of at least two deep learning models and corresponding real output results.

And acquiring historical input data and corresponding real output results of each deep learning model according to information in the model decision factors of each deep learning model, or configuring the input data and the corresponding real output results according to the information in the model decision factors of each deep learning model.

Wherein, the collected historical input data or the configured input data are used as the test data of the corresponding deep learning model.

And step S230, aiming at each deep learning model, testing the deep learning model based on the test data corresponding to the deep learning model, and obtaining the test result output by the deep learning model and the current model index.

In specific implementation, aiming at each deep learning model, inputting input data in test data corresponding to the deep learning model into the deep learning model to obtain a test result output by the deep learning model; the test results may include the same output results as the expected decision information of the deep learning model and output results different from the expected decision information.

Based on the test result output by the deep learning model and the real output result corresponding to the corresponding test data, determining a current model index of the deep learning model, where the current model index may include at least one of accuracy, recall, and a Mean Average Precision (MAP) of a multi-label image classification task, that is, the current model index may include model indexes of at least one index type.

It should be noted that the current model index of the deep learning model may also include a performance index of the CPU, the memory, the response time, and the number of Transactions processed Per Second (TPS) which are observed through the performance of the single interface during the working process of the deep learning model, which is not limited herein.

According to the implementation mode, the parallel test of each deep learning model in the whole artificial intelligence system is realized, and the model indexes of each deep learning model are obtained so as to position the deep learning model with problems from the model indexes.

And step S240, determining the test state information of the deep learning model based on the current model index and the historical model index of the deep learning model.

The historical model index is a model index corresponding to the same test data in the previous version of the current version of the deep learning model. The test status information may include a test status.

In specific implementation, aiming at any index type, comparing the current model index with the historical model index;

It should be noted that the test status information may include the test status of at least one index type of the deep learning model.

Further, in order to improve the accuracy of the model test, the test state information may further include a state value of the test state, where the state value represents a difference between the performance of the current version and the performance of the previous version, so that a subsequent optimizer determines the optimization degree of the current version based on the state value.

The embodiment compares the data result difference of the previous version and the next version of any deep learning model, namely the comparison difference of the output results of the same batch of input data and the previous version and the next version is tested.

And S250, determining the prediction behavior of the deep learning model according to a test thermodynamic diagram generated according to the test data input by the deep learning model and the output test result and a real output result corresponding to the test data.

In a specific implementation, the most typical interpretable analysis method of data is a visualization method. The visualization method is mainly used for marking important parts in data through a visualization tool and combining the learning process with original data, so that the learning process of deep learning is intuitively understood.

For each deep learning model, interpretable analysis is performed on the test data input by the deep learning model and the output test result to obtain a test thermodynamic diagram, and the test thermodynamic diagram describes a region (or called a prediction region) where the deep learning model makes a decision on the input test data. Specifically, according to the test result output by the deep learning model, the input test data is divided into a data set P with prediction pairs and a data set N with prediction errors, and interpretability analysis is performed on the test data in the two sets so as to identify a region in which the prediction behaviors are gathered in the deep learning model.

If the prediction result corresponding to the prediction region of the deep learning model in the test thermodynamic diagram is different from the corresponding real output result, determining the prediction behavior of the deep learning model as abnormal prediction behavior;

As shown in fig. 3, the input test data is taken as a picture, and the actual output result of the deep learning model is taken as an example of the feature of the identified cat, that is, the picture (a) is taken as the input test data, and the regions 1 and 2 marked in the pictures (b) and (c) are prediction regions of the deep learning model. Since the deep learning model is a model for identifying cats, the predicted behavior of the deep learning model in the graph (b) is an abnormal predicted behavior, and the predicted behavior of the deep learning model in the graph (c) is a normal predicted behavior. The dark regions in region 1 and region 2 represent regions that are heavily predicted by the deep learning model.

The above embodiment can determine the predicted behavior of the corresponding deep learning model according to the test thermodynamic diagram and the real output result, so that the accuracy of determining the problematic model (i.e. the model needing to be optimized) can be further improved.

And step S260, generating a test report.

The test report may include test status information, test results, and corresponding predicted behavior for the at least two deep learning models.

After the test report is further generated, the test report can be sent to the optimization terminal;

and after checking the test state information, the test result and the corresponding predicted behavior, an optimizer can determine the depth learning model with the problem and clearly determine the problem to be solved corresponding to the corresponding depth learning model, namely, the optimization direction is determined, and based on the optimization direction, the version of the corresponding depth learning model is optimized. Specifically, the optimization terminal optimizes the corresponding deep learning model according to the received optimization instruction of the optimizer, so that the output result of the optimized deep learning model is consistent with the result expected to be achieved by the algorithm.

In a specific embodiment, each deep learning model in the entire artificial intelligence system can be tested in series according to a certain sequence, so as to obtain a test result and a model index output by each deep learning model.

In specific implementation, the execution sequence of the model corresponding to at least two deep learning models needs to be determined based on the decision factors of the at least two deep learning models; for example, the artificial intelligence system X includes three models: the data form of the output result expected by the model A meets the data form of the input data of the model C, and the data form of the output result expected by the model C meets the data form of the input data of the model B after the execution condition of the model C is met. Therefore, the execution sequence of the three deep learning models is as follows: model a, model C and model B.

Testing the current deep learning model based on the test data of the current deep learning model according to the execution sequence of the model, and acquiring a test result output by the current deep learning model and a current model index; the current deep learning model is the first untested deep learning model in the at least two deep learning models according to the model execution sequence. Step S230 may be referred to in the process of specifically obtaining the test result output by the current deep learning model and the current model index, which is not described herein in this embodiment of the present application.

In the serial test execution process, for step S220, only the test data corresponding to the model decision factor of the first deep learning model in the at least two deep learning models according to the model execution order needs to be obtained.

In the above example, during the first test, the first untested deep learning model in the model a, the model C and the model B according to the model execution sequence is the model a, that is, the model a is determined as the current deep learning model, and the test result and the current model index output by the model a are obtained.

Non-first test time: in the second test, the model a is a tested deep learning model, so that a current deep learning model to be tested needs to be determined in the models C and B, the first untested deep learning model in the models C and B according to the model execution sequence is the model C, that is, the model C is determined as the current deep learning model, and a test result and current model indexes output by the model C are obtained. In the third test, the model a and the model C are tested deep learning models, so the model B needs to be determined as the current deep learning model, and the test result and the current model index output by the model B are obtained.

Therefore, in the step S240, for any deep learning model, the test state information of the deep learning model is determined based on the current model index and the historical model index of the deep learning model, and the specific process may refer to the step S240, which is not described herein again in this embodiment of the present application.

Compared with the prior art, the test method of the artificial intelligence system provided by the application can accurately position the data model to be optimized in the AI system by disassembling the deep learning models forming the AI system and independently testing the deep learning models, so that the accuracy of optimizing the AI system is realized.

Corresponding to the above method, an embodiment of the present application further provides a testing apparatus for an artificial intelligence system, as shown in fig. 4, the testing apparatus for an artificial intelligence system includes: an acquisition unit 410, a test unit 420, a determination unit 430 and a generation unit 440;

an obtaining unit 410, configured to perform code instrumentation on at least two deep learning models in an artificial intelligence system to be tested, so as to obtain model decision factors of the at least two deep learning models; the model decision factors comprise execution conditions, input data forms and expected decision information;

the testing unit 420 is configured to test each deep learning model based on test data corresponding to the deep learning model, and obtain a test result and a current model index output by the deep learning model; the test result comprises an output result which is the same as the expected decision information of the deep learning model and an output result which is different from the expected decision information;

a determining unit 430, configured to determine test state information of the deep learning model based on the current model index and a historical model index of the deep learning model; the historical model index is a model index of a previous version of the current version of the deep learning model;

a generating unit 440, configured to generate a test report, where the test report includes test status information, test results, and corresponding predicted behaviors of the at least two deep learning models.

The functions of the functional units of the testing apparatus of the artificial intelligence system provided in the above embodiments of the present application can be implemented through the above method steps, and therefore, the specific working processes and beneficial effects of the units in the testing apparatus of the artificial intelligence system provided in the embodiments of the present application are not repeated herein.

An electronic device is further provided in the embodiments of the present application, as shown in fig. 5, and includes a processor 510, a communication interface 520, a memory 530, and a communication bus 540, where the processor 510, the communication interface 520, and the memory 530 complete communication with each other through the communication bus 540.

A memory 530 for storing a computer program;

the processor 510, when executing the program stored in the memory 530, implements the following steps:

The aforementioned communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

Since the implementation manner and the beneficial effects of the problem solving of each device of the electronic device in the foregoing embodiment can be implemented by referring to each step in the embodiment shown in fig. 2, detailed working processes and beneficial effects of the electronic device provided in the embodiment of the present application are not repeated herein.

In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which instructions are stored, and when the instructions are executed on a computer, the instructions cause the computer to execute the method for testing an artificial intelligence system according to any one of the above embodiments.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method for testing an artificial intelligence system as described in any of the above embodiments.

As will be appreciated by one of skill in the art, the embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the true scope of the embodiments of the present application.

It is apparent that those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the embodiments of the present application and their equivalents, the embodiments of the present application are also intended to include such modifications and variations.

Claims

1. A method of testing an artificial intelligence system, the method comprising:

determining test state information of the deep learning model based on the current model index and a historical model index of the deep learning model; the historical model index is a model index corresponding to the same test data in the previous version of the current version of the deep learning model;

determining the prediction behavior of the deep learning model according to a test thermodynamic diagram generated by the test data input by the deep learning model and the output test result and the real output result corresponding to the test data;

2. The method of claim 1, wherein after generating the test report, the method further comprises:

3. The method of claim 1, wherein code instrumentation is performed on at least two deep learning models in the artificial intelligence system under test to obtain model decision factors for the at least two deep learning models, comprising:

4. The method of claim 1 or 3, wherein after obtaining test data corresponding to model decision factors for the at least two deep learning models, the method further comprises:

5. The method of claim 1, wherein the test state information comprises a test state;

6. The method of claim 5, wherein the test state information further comprises a state value for a test state; the state value characterizes a degree of gap between the performance of the current version and the performance of the previous version.

7. The method of claim 1, wherein determining the predicted behavior of the deep learning model based on a test thermodynamic diagram corresponding to the test result output by the deep learning model and a true output result corresponding to the test data comprises:

8. An apparatus for testing an artificial intelligence system, the apparatus comprising:

9. An electronic device, characterized in that the electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-7 when executing a program stored on a memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.