CN115810137B

CN115810137B - Construction method of interactive artificial intelligence technical evaluation scheme

Info

Publication number: CN115810137B
Application number: CN202310087037.1A
Authority: CN
Inventors: 丰强泽; 齐红威; 何鸿凌; 肖永红; 王大亮
Original assignee: Hebei Shuyuntang Intelligent Technology Co ltd; Datatang Beijing Technology Co ltd
Current assignee: Hebei Shuyuntang Intelligent Technology Co ltd; Datatang Beijing Technology Co ltd
Priority date: 2023-02-09
Filing date: 2023-02-09
Publication date: 2023-06-02
Anticipated expiration: 2043-02-09
Also published as: CN115810137A

Abstract

The invention discloses a construction method of an interactive artificial intelligence technical evaluation scheme, which relates to the technical field of artificial intelligence evaluation, and comprises the following steps: step S1: constructing a data layer, wherein the data layer comprises an evaluation database, an evaluation tool library, an evaluation standard library and a reference model library; step S2: constructing a packaging layer; step S3: and constructing an execution layer. According to the invention, an artificial intelligent evaluation scheme is quickly constructed based on elements such as an evaluation tool, evaluation data, an evaluation standard, a reference model and the like, so that the problems that the artificial intelligent evaluation experience and method cannot be reused and the evaluation requirement technical threshold is high are solved, an interactive configuration environment is provided, a user is supported to quickly construct a complex evaluation scheme flow in a dragging configuration visualization mode, and a new evaluation scheme is quickly developed for different types of artificial intelligent tasks.

Description

Construction method of interactive artificial intelligence technical evaluation scheme

Technical Field

The invention relates to the technical field of artificial intelligence evaluation, in particular to a construction method of an interactive artificial intelligence technical evaluation scheme.

Background

Evaluation is an indispensable link in the artificial intelligence research and development process. The purpose of the evaluation is to detect whether the product/system/platform is normal, how the index is, whether the error or the loophole exists, and the evaluation has important effects. From the test content of artificial intelligence, the system comprises technical indexes such as functions, performances, safety and the like, and social indexes such as privacy, ethics and the like. The later the test and defect discovery, the more costly the remedy will be. The evaluation scheme is an evaluation flow which is built in advance aiming at a certain type of AI technology, and is usually built by an experienced AI expert, so that the evaluation scheme can be directly multiplexed when each evaluation user evaluates own model, and the use threshold of the evaluation user is reduced.

The Chinese patent with publication number of CN113177208A discloses an automatic construction method and device for an evaluation operation template, so as to improve the efficiency and accuracy of evaluating the safety of a computer. The method is used for solving the problems that a great deal of time and effort are required to be consumed for manual verification and operation configuration of computer system safety in the prior art, and the efficiency is often very low. Meanwhile, the method and the device have the technical problems that the experience and the method can not be reused and the verification efficiency is low. However, the patent objectively classifies, counts and analyzes the test result by automatically constructing the evaluation operation template, and when the evaluation template can realize automatic judgment, or the evaluation template which can realize automatic judgment after manual correction is added into the template library, the automatic construction and detection are realized. The evaluation scheme in the patent is a code, a system is automatically opened through shell script commands or simulation software, and testing is performed according to program content; and the content to be tested automatically tests the security level required to be ensured by the system of the type mentioned in the national standard file by acquiring the national standard. Evaluation standards are cited in the evaluation scheme in the patent, but no mention is made of interactively constructing an evaluation flow based on evaluation tools, evaluation data, benchmark models.

The Chinese patent with the publication number of CN115329326A discloses an artificial intelligence safety evaluation method and system, wherein the system comprises an interaction module, an uploading resource management module, an evaluation module, a visualization module and a restoration and defense module, the output end of the interaction module is in signal connection with the input end of the uploading module, the output end of the uploading module is in signal connection with the input end of the uploading resource management module, and the safety problem existing in AI application can be comprehensively measured under the combined action of the modules. According to the invention, five elements from AI application development to deployment are subjected to fine granularity evaluation, a foundation is provided for the security promotion of the AI application, a solution is specified according to the problems found by evaluation, and the AI application is protected by floor-mounted driving. However, the system proposed by the patent can evaluate the AI application in a fine granularity, and can formulate a corresponding solution according to the existing safety problem, the evaluating scheme is single, and the whole artificial intelligence scheme cannot be evaluated effectively.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a construction method of an interactive artificial intelligence technical evaluation scheme.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the construction method of the interactive artificial intelligence technical evaluation scheme comprises the following steps:

step S1: constructing a data layer, wherein the data layer comprises an evaluation database, an evaluation tool library, an evaluation standard library and a reference model library;

step S2: constructing a packaging layer;

step S3: and constructing an execution layer.

Based on the above technical solution, further, the evaluation tool library in step S1 provides an execution program basis for evaluation, including various evaluation tools for AI tasks, each of the evaluation tools is an executable program, and the evaluation tools reads the original data in the evaluation data, and compares the result after the input model is pushed with the manually marked result in the evaluation data.

Based on the technical scheme, further, the evaluation standard library in the step S1 provides a methodology foundation for evaluation, and each evaluation standard in the evaluation standard library comprises two parts of an evaluation standard document and an evaluation index list, wherein the evaluation standard document is used for evaluating user browsing; the evaluation index list is a structured index list used for guiding the execution and output of the evaluation tool.

Based on the above technical solution, further, the reference model library in step S1 includes a reference model and a user model.

Based on the above technical solution, further, the encapsulation layer in step S2 includes a data object encapsulation, an operation object encapsulation, and a control object encapsulation.

Based on the above technical solution, further, step S3 constructs a process of executing a layer, including the following steps:

step S31: editing scheme meta information;

step S32: automatically matching the object content;

step S33: creating a scheme flow;

step S34: performing matching checking between objects;

step S35: scheme similarity checking;

step S36: setting an evaluation model;

step S37: debugging and evaluating schemes;

step S38: and (5) warehousing an evaluation scheme.

Based on the above technical solution, the meta information in step S31 includes at least a solution name, a solution library type, an AI task type, an AI application scenario, and a solution introduction.

Based on the technical scheme, further, the object content automatic matching is calculated based on similarity calculation and historical co-occurrence rate, wherein the similarity calculation is obtained by calculating each item of meta information of an evaluation scheme and meta information of all evaluation data in an evaluation database, and the similarity of each item of meta information is weighted and averaged to obtain final similarity; the historical co-occurrence rate is obtained by statistically analyzing the references of other historical evaluation schemes before each evaluation data is analyzed.

Based on the above technical solution, further, the process of creating the solution flow in step S32 is as follows: in the graphical configuration environment, a user selects a data object, an operation object and a control object from the packaging layer, and an evaluation scheme is further constructed in a dragging, connecting and configuring mode.

Based on the technical scheme, further, the matching check between the objects comprises a scene consistency check of an evaluation scheme, wherein the scene consistency check carries out semantic similarity calculation through the AI application scene and the value content of the AI task type in the meta information of the evaluation standard, the evaluation data, the evaluation tool and the reference model, the evaluation standard, the evaluation data, the evaluation tool and the reference model under the same branch in the evaluation scheme process have the same or similar application scene, if the calculated similarity is lower than a set threshold value, a scene inconsistency error prompt is output to prompt a user that the evaluation scheme process is wrong.

Based on the above technical solution, further, the solution similarity check in step S35 is to automatically compare the current evaluation solution with the evaluation solution library constructed in the history, detect whether a similar evaluation solution exists before, and if so, prompt the evaluation user that no new construction is needed.

Based on the above technical solution, further, the evaluating model setting process in step S36 is as follows: in order to verify the effect of the evaluation scheme design, a test input model is set during debugging operation, and whether the evaluation scheme design reaches the expected or not is verified through the execution result of the input model.

Based on the above technical solution, further, the debugging process of the evaluation solution in step S37 is as follows:

step 1: automatically translating the evaluation scheme into an executable test job, and submitting the test job to the power calculation platform;

step 2: in the execution process of the evaluation scheme, the execution progress, abnormal prompt and resource occupation condition of each node in the process are monitored and output in real time;

step 3: and after the problem is found in the operation process, suspending the operation, and continuing the operation after the configuration is modified until the final operation result accords with the expected effect, and completing the debugging work of the test scheme.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides an evaluation scheme construction method of an interactive artificial intelligence technology, which can construct corresponding evaluation schemes aiming at each type of AI technology and is used for directly selecting a proper evaluation scheme to generate when an evaluation user performs evaluation. The evaluation scheme is a process and consists of evaluation tools, evaluation data, evaluation standards and reference models, which are agreed to follow what evaluation standards, what evaluation data are used, what evaluation tools are used for execution, and evaluation results and which AI reference models are compared. In order to reduce the technical threshold used by users, an interactive configuration environment is provided, and the users are supported to quickly construct a complex evaluation scheme flow in a dragging configuration visual mode, so that new evaluation schemes are quickly developed for different types of artificial intelligence tasks. Meanwhile, the invention can quickly construct an artificial intelligent evaluation scheme based on elements such as an evaluation tool, evaluation data, an evaluation standard, a reference model and the like, thereby solving the problems that the artificial intelligent evaluation experience and method cannot be reused and the evaluation requirement technical threshold is high.

Drawings

FIG. 1 is a flow chart of a construction method of the present invention;

FIG. 2 is a schematic diagram of a combination of a data layer, a package layer, and an execution layer according to the present invention;

fig. 3 is a flowchart illustrating the procedure of the creation scheme in embodiment 1 of the present invention.

Detailed Description

In order to make the objects and technical solutions of the present invention more clear, the technical solutions of the present invention will be clearly and completely described below with reference to examples.

Example 1

As shown in fig. 1, a method for constructing an interactive artificial intelligence technical evaluation scheme includes the following steps:

step S1: constructing a data layer; the data layer comprises an evaluation database, an evaluation tool library, an evaluation standard library and a reference model library, as shown in fig. 2;

in particular, the profile database provides a data base for profiles, including various test set data, each profile being a data set. In order to facilitate one evaluation tool to read different evaluation data, the evaluation database defines a unified data standardization specification.

1. Standardization of metadata: and the metadata management is to label each set of evaluation data in a standardized way, classify and manage the subjects of the evaluation data, establish standards and specifications and record various label information of each evaluation data. The metadata of each profile includes the following fields, each defining a uniform writing specification, as shown in Table 1 below.

Table 1 metadata field requirements for profile

2. Standardization of data storage structures:

in order to facilitate the evaluating tool to directly read different evaluating data under the same type, the data set storage structure of the evaluating data should be standardized. The data set of the evaluation data comprises a complete data packet and a sample data packet, wherein the complete data packet comprises two parts of original data and a manual labeling result. The sample data packet is a sample of the data set for presentation to the evaluating user for visual understanding of the details of the evaluation data.

The evaluation tool library provides an execution program basis for evaluation, and comprises evaluation tools for various AI tasks, and each evaluation tool is an executable program. The evaluation tool reads the original data (RawData) in the evaluation data, and compares the result after the input model is pushed with the artificial labeling result (Answer) in the evaluation data. The output index of the evaluating tool is required to be consistent with the corresponding evaluating standard. As shown in Table 2, the metadata for each evaluation tool includes the following fields, each of which defines a uniform writing specification.

Table 2 metadata field requirements of the evaluation tool

3. The evaluation standard library provides a basis for evaluation, and is usually derived from international standards, national standards and industry standards related to artificial intelligence. Each evaluation criterion comprises two parts: the evaluation standard document is a document in the format of pdf or word and the like and is used for evaluating browsing of users; the evaluation index list is a structured index list, such as an evaluation index name, an evaluation index description and the like, and is used for guiding the execution and output of an evaluation tool, and the output of the evaluation tool needs to conform to the definition of the evaluation index list. As shown in Table 3, the metadata for each evaluation criterion includes the following fields, each of which defines a uniform writing specification.

Table 3 metadata field requirements for evaluation criteria

4. The reference model library comprises a reference model and a user model, wherein the reference model represents a technical standard pole in the corresponding technical field and is used for comparing and evaluating with the user model, so that a user can know own technical advantages and defects conveniently. As shown in Table 4, the metadata for each reference model includes the following fields, each of which defines a uniform writing specification.

Table 4 metadata field requirements of benchmark model

Step S2: constructing a packaging layer; the packaging layer comprises a data object packaging layer, an operation object packaging layer and a control object packaging layer;

specifically, the data object packaging realizes that the evaluation database, the evaluation tool library, the evaluation standard library and the reference model library are packaged into an evaluation data object, an evaluation tool object, an evaluation standard object and a reference model object in a graphical environment, and the graphical display, the retrieval and the selection functions of the evaluation database, the evaluation tool library, the evaluation standard library and the reference model library are respectively realized.

The operation object package is how to operate the data object in addition to the data object, and is also an important content constituting the flow of the evaluation scheme. The operation object includes normal operations such as selection, connection, running, movement, deletion, and the like. The selecting operation comprises selecting an evaluation data object, selecting an evaluation tool object, selecting an evaluation standard object and selecting a reference model object. The connection operation includes serial connection, parallel connection, etc. between the plurality of data objects. The operation includes operations such as operation on the whole flow, suspension operation, termination operation, and the like. The move operation supports drag and move operations on each data object drawn in the flow. The deletion operation supports the deletion operation of the data objects drawn in the flow, the deletion is only removed from the flow of the evaluation scheme being developed, and the library is not physically deleted.

The evaluation scheme is a process and consists of evaluation tools, evaluation data, reference models and evaluation standards, what evaluation standards are followed by agreements, what evaluation data are used, what evaluation tools are used for execution, and evaluation results and which AI reference models are compared. For example, the face recognition accuracy evaluation scheme can accord with the face recognition accuracy test requirement in the face recognition evaluation standard, the LFW evaluation data of the industry standard, the evaluation data under different shielding states, the evaluation data of different ethnic groups and different ages are used, the evaluation tool capable of calculating accuracy, precision, recall, F and AUC is used for testing, and the evaluation result is compared with the industry standard face recognition model such as FaceNet, VGGNet, googleNet. Various control nodes are also designed in the flow. The control object in the control object package includes: a start node, a branch node, an aggregate node, a loop node, and an end node. The start node represents the start of the flow, indicating that the evaluation scheme flow starts at the start node. The branch node represents bifurcation in the flow, and represents the operation of parallel execution of multiple branches from the branch node in the evaluation scheme flow. For example, when the models are aligned, test execution is performed on a plurality of models in parallel. The aggregation node represents aggregation of multiple branches in the flow, and the multiple branches can skip the aggregation node to continue to execute after the execution of all the multiple branches is completed. The loop node represents that loop execution operation is needed in the process, and a loop ending condition can be configured on the loop node. The ending node represents the ending of the flow, and represents the ending of the execution when the evaluation scheme flow is executed to the ending node.

Step S3: and constructing an execution layer. The process specifically comprises the following steps:

step S31: editing scheme meta information; the meta information of the evaluation scheme is meta data for describing one evaluation scheme, so that a user can conveniently search and know the evaluation scheme. The meta information to be filled in at least comprises scheme names, such as 'face recognition evaluation scheme'; scheme library types, such as "image class"; AI task types, such as "face recognition"; AI application scenario, test: such as "identity verification"; the scheme introduces, for example, the method can realize various tests such as precision, performance, living body detection, attack resistance and the like of the face recognition technology, and compare the test result with the result of the industry reference model.

Step S32: automatically matching the object content;

specifically, according to each item of meta information filled in step S31, candidate evaluation data, candidate evaluation tools, candidate evaluation standards and candidate reference models which can be used for the current evaluation scheme are automatically matched from the evaluation database, the evaluation tool library, the evaluation standard library and the reference model library, so that the time for an evaluation user to search for a suitable object from the evaluation database, the evaluation tool library, the evaluation standard library and the reference model library can be greatly reduced. The object content automatic matching is calculated based on similarity calculation and historical co-occurrence rate.

The similarity calculation is obtained by calculating the meta information of the evaluation scheme and the meta information of all evaluation data or evaluation tools or evaluation standards or reference models in an evaluation database or an evaluation tool library or an evaluation standard library or a reference model library, and specifically comprises the following steps: similarity of the scheme name and the profile name or the benchmark model name, coincidence of applicable AI task types, similarity of the scheme description and the profile description or the benchmark model description. And obtaining the final similarity based on the similarity weighted average and the like. The more similar the meta information is to the evaluation data or tool or standard or reference model, the higher the final similarity is, and the higher the matching degree is.

The historical co-occurrence rate is obtained by statistically analyzing the reference situation of other historical evaluation schemes before evaluating each evaluation data or evaluation tool evaluation standard reference model, and the specific process is as follows: the method comprises the steps of firstly searching a historical evaluation scheme similar to the current evaluation scheme, then observing evaluation data or evaluation tools or evaluation standards or reference models used in the similar historical evaluation schemes, and taking the most commonly used evaluation data or evaluation tools or evaluation standards or reference models in the historical evaluation schemes as a result of high co-occurrence rate. The evaluating tool is mainly embodied as test software or test script, and is a program for executing certain test functions; the evaluation tool operates under the guidance of an evaluation standard. The profile provides a data basis for the evaluation, including various test set data, each of which is a data set. The evaluation criteria provide guidelines for evaluation of methodologies, typically derived from artificial intelligence related international, national, industrial and self-fitting standard documents. Reference model: the system comprises a reference model and a user model, wherein the reference comparison model represents a technical standard pole in the corresponding technical field and is used for comparing and evaluating with the user model, so that a user can know own technical advantages and defects conveniently. And an evaluation scheme comprises one or more evaluation tools, one or more evaluation data, one or more evaluation standards and one or more reference models. The same evaluation tool or evaluation data or evaluation criteria or benchmark model may be comprised by multiple evaluation schemes.

And finally obtaining matching degree sequencing by integrating the similarity calculation result and the historical co-occurrence rate from the two dimensions, so that an accurate candidate data object list is given to the current evaluation scheme. Specifically, firstly, calculating the similarity, sorting according to the similarity, and similarly, sorting according to the historical co-occurrence rate, and then, carrying out weighted average on the sorting sequence numbers of the same evaluation data under each dimension to calculate the total matching degree sorting.

Step S33: creating a scheme flow;

specifically, in order to improve the development efficiency of the evaluation scheme, various objects, a graphical interface development program, a graphical interface running program, a real-time database running environment and the like are combined to form a configuration type, plug-in type and distributed evaluation scheme generation working interface.

In order to reduce the use threshold of an evaluation user and realize a workflow-based drag-and-drop evaluation scheme generation mode, in a graphical configuration environment, a user selects a data object, an operation object and a control object from a packaging layer, and constructs an evaluation scheme through drag, connection and configuration, for example, as shown in fig. 3, wherein a first face recognition model in fig. 3 is a FaceNet face recognition model, a second face recognition model is a VGGNet face recognition model, and a third face recognition model is a GoogleNet face recognition model; the disclosure example in fig. 3 is an LFW disclosure example.

Step S34: performing matching checking between objects;

specifically, before the evaluation scheme flow created in step S33 is run, in order to ensure the correctness of the created flow, a matching check between objects may be performed, and if no match is found, the evaluation user may be reminded in time to adjust the flow.

The matching checking between objects comprises automatically checking consistency of scenes, interfaces and the like of the evaluation scheme. The scene consistency checks whether application scenes are matched with each other or not according to evaluation standards, evaluation data, evaluation tools and reference models quoted in the evaluation schemes, and the interface consistency checks whether input and output interfaces are matched with each other or not according to the evaluation data, the evaluation tools and the reference models quoted in the evaluation schemes.

The scene consistency check specifically calculates semantic similarity through the content of the values of 'AI application scene' and 'AI task type' in meta information of evaluation standards, evaluation data, evaluation tools and reference models, and sets a threshold before, wherein the threshold is set for comparing with the similarity, the evaluation standards, the evaluation data, the evaluation tools and the reference models under the same branch in the evaluation scheme process generally have the same or similar application scenes, the threshold is used as a reference, if the calculated similarity is too low, a scene inconsistency error prompt is output, and a user is prompted that the evaluation scheme process may be wrong.

The interface consistency check is implemented by:

1. and checking whether the storage format of the evaluation data accords with the interface format of the data read by the evaluation tool. For example, if a certain evaluation data cited in the current evaluation scheme is marked with a bbox to identify the detection position of the face region, but a data reading interface of an evaluation tool connected with the bbox reads the face region with a region, an interface inconsistency error prompt is output.

2. It is checked whether the output indicators of the evaluation tool are a subset of the list of evaluation indicators defined in the evaluation criterion. For example, if the output index of the reference evaluating tool in the current evaluating scheme flow is an mAP value, but the evaluating index list of the reference evaluating standard has only an accuracy rate and a recall rate, an output interface inconsistency error prompt is output to prompt a user that the evaluating scheme flow may be incorrect.

3. And checking whether the interface format of the benchmark model accords with the calling format of the evaluating tool.

Step S35: scheme similarity checking;

specifically, the current evaluation scheme and the evaluation scheme library constructed in a history are automatically compared, and whether similar evaluation schemes exist before detection is detected. If so, prompting the evaluating user that new construction is not needed.

The similarity calculation between the two evaluation schemes is based on a measure of both similarities. For example, based on meta information of the evaluation scheme, the similarity between the meta information is calculated. The scheme name, the scheme library type, the AI task type, the AI application scene and the scheme introduction are all character string texts, and the similarity between the meta-information texts is calculated based on a text editing distance calculation method. For another example, based on the process design of the evaluation scheme, the similarity between the processes is calculated. And calculating the similarity between the flows by comparing the repeatability and the similarity between the evaluation standard, the evaluation tool, the evaluation data and the reference model in the evaluation scheme flow.

Step S36: setting an evaluation model;

specifically, to verify the effect of the design of the evaluation scheme, a test input model may be set during the debugging operation, and the execution result of this input model is used to verify whether the design of the evaluation scheme is expected. The model input by the test can select the existing reference model from the reference model library, and can also be uploaded by the evaluating user. In order to fully verify the reliability of the evaluation scheme, the flexible setting of the test input is supported, and a user can replace different test inputs to verify the evaluation scheme for a plurality of times.

Step S37: debugging and evaluating schemes;

specifically, an evaluation scheme is executed, a set test input model is tested, and a test result is returned for manually confirming whether the design of the evaluation scheme is correct. Firstly, the evaluation scheme is automatically translated into an executable test job, and then the test job is submitted to the power computing platform. The test operation is an executable script, and the script calls an evaluation tool in the process according to the design of the evaluation scheme process, tests the set test input model by using the evaluation data in the process, and compares the result with a reference model in the process. And secondly, in the execution process of the evaluation scheme, monitoring and outputting the execution progress, abnormal prompt, resource occupation condition and the like of each node in the flow in real time. Finally, after the problem is found in the operation process, the operation can be manually suspended at any time, and the operation is continued after the configuration is modified until the final operation result accords with the expected effect, and the debugging work of the test scheme is completed. The configuration modification action includes: replacement test input: changing a model of the test input to fully verify the reliability of the evaluation scheme; flow node add/delete/move: performing new adding, deleting and moving operations on the flow nodes of the evaluation scheme, wherein the new adding, deleting and moving operations belong to the change of an execution flow; configuration modification of flow nodes: if only the evaluation standard, the evaluation data, the reference model or the evaluation tool is replaced without changing the whole execution flow, the node of the evaluation scheme flow can be replaced through attribute configuration, and the node is not required to be added or deleted.

Step S38: and (5) warehousing an evaluation scheme.

Specifically, after the step S37 is completed, the evaluation scheme flow and the scheme meta information are stored in an evaluation scheme library, so as to be used for evaluating the artificial intelligence evaluation tasks in the future.

Finally, it should be noted that the above description is only for illustrating the technical solution of the present invention, and not for limiting the scope of the present invention, and that the simple modification and equivalent substitution of the technical solution of the present invention can be made by those skilled in the art without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. The construction method of the interactive artificial intelligence technical evaluation scheme is characterized by comprising the following steps of:

step S1: constructing a data layer, wherein the data layer comprises an evaluation database, an evaluation tool library, an evaluation standard library and a reference model library; the evaluation tool library provides an execution program basis for evaluation, comprises evaluation tools of various AI tasks, each evaluation tool is an executable program, the evaluation tools read original data in evaluation data, and the result obtained by pushing an input model is compared with the artificial labeling result in the evaluation data;

wherein, the metadata field requirement of the evaluation data is:

；

the metadata field requirements of the evaluation tool are:

；

the metadata field requirements of the evaluation criteria are:

；

the metadata field requirements of the reference model are:

；/>

step S2: constructing an encapsulation layer, wherein the encapsulation layer comprises a data object encapsulation, an operation object encapsulation and a control object encapsulation;

step S3: constructing an execution layer;

a process for building an execution layer, comprising the steps of:

step S31: editing scheme meta information;

step S32: automatically matching the object content;

the automatic matching of the object content is calculated based on similarity calculation and historical co-occurrence rate, wherein the similarity calculation is obtained by calculating each item of meta information of an evaluation scheme and meta information of all evaluation data in an evaluation database, and the similarity of each item of meta information is weighted and averaged to obtain final similarity; the historical co-occurrence rate is obtained by statistically analyzing the reference situation of other historical evaluation schemes before each evaluation data is analyzed;

step S33: creating a scheme flow;

step S34: performing matching checking between objects;

step S35: scheme similarity checking;

the scheme similarity check is to automatically compare the current evaluation scheme with the evaluation scheme library constructed by the history, detect whether the similar evaluation scheme exists before, and if so, prompt the evaluation user to avoid new construction;

step S36: setting an evaluation model;

the evaluation model setting process comprises the following steps: in order to verify the effect of the evaluation scheme design, a test input model is set during debugging operation, and whether the evaluation scheme design reaches the expected or not is verified through the execution result of the input model;

step S37: debugging and evaluating schemes;

the debugging process of the evaluation scheme comprises the following steps:

step 3: after a problem is found in the operation process, the operation is suspended, and the operation is continued after the configuration is modified until the final operation result accords with the expected effect, and then the debugging work of the test scheme is completed;

step S38: and (5) warehousing an evaluation scheme.

2. The method for constructing an interactive artificial intelligence technical evaluation scheme according to claim 1, wherein the evaluation standard library in the step S1 provides a methodology basis for evaluation, and each evaluation standard in the evaluation standard library comprises two parts of an evaluation standard document and an evaluation index list, wherein the evaluation standard document is used for evaluating user browsing; the evaluation index list is a structured index list used for guiding the execution and output of the evaluation tool.

3. The method for constructing an interactive artificial intelligence evaluation scheme according to claim 1, wherein the creating scheme in step S32 comprises the following steps: in the graphical configuration environment, a user selects a data object, an operation object and a control object from the packaging layer, and an evaluation scheme is further constructed in a dragging, connecting and configuring mode.

4. The method for constructing an interactive artificial intelligence technical evaluation scheme according to claim 1, wherein the matching check between objects includes a scene consistency check of the evaluation scheme, wherein the scene consistency check performs semantic similarity calculation through evaluation standards, evaluation data, evaluation tools, and the content of AI application scenes and AI task types in meta information of a reference model, and the evaluation standards, the evaluation data, the evaluation tools, and the reference model under the same branch in the evaluation scheme process have the same or similar application scenes, and if the calculated similarity is lower than a set threshold, a scene inconsistency error prompt is output to prompt a user that the evaluation scheme process is wrong.