CN116126740B

CN116126740B - Model in-loop test method and device, computer equipment and storage medium

Info

Publication number: CN116126740B
Application number: CN202310409686.9A
Authority: CN
Inventors: 陈绪伦
Original assignee: Xiaomi Automobile Technology Co Ltd
Current assignee: Xiaomi Automobile Technology Co Ltd
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-08-04
Anticipated expiration: 2043-04-18
Also published as: CN116126740A

Abstract

The disclosure provides a model ring test method, a model ring test device, computer equipment and a storage medium, and relates to the technical field of computers. Comprising the following steps: acquiring a first model and an associated evaluation set from a model test queue; the evaluation set comprises test data and labeling results; creating a first container, and running a first model to process the test data to obtain a first processing result; acquiring a rule corresponding to the first model; and determining a test result of the first model according to the evaluation rule and the first difference between the first processing result and the labeling result. Therefore, the parallel operation of the model test can be realized by decoupling the model in the loop test process into a plurality of independent links and operating the model test process by creating independent containers, so that the duration of the model in the loop test is reduced, and the test efficiency is improved.

Description

Model in-loop test method and device, computer equipment and storage medium

Technical Field

The disclosure relates to the technical field of automatic driving and testing, in particular to a model ring testing method, a model ring testing device, computer equipment and a storage medium.

Background

With the increasing degree of intelligent networking of automobiles, automobiles carrying automatic driving technology have become a trend. Before autopilot is used for large-scale business, enough perfect functional tests and safety tests must be performed, and an agile test closed loop is a key of evolution of an autopilot system. Depending on the manner of testing, testing can be generally categorized as follows: model In Loop (MILs), software In Loop (SIL), hardware In Loop (HIL), vehicle In Loop (VIL), etc. The model is particularly important in loop test, is derived from functional requirements, and is used for verifying whether an algorithm and a function meet expectations or not in an initial stage of functional design, and is a method for continuously iterating and optimizing system design according to evaluation results.

However, the current model has heavy coupling in loop test, which results in long test duration and low efficiency.

Disclosure of Invention

The present disclosure aims to solve, at least to some extent, one of the technical problems in the related art.

An embodiment of a first aspect of the present disclosure provides a model in-loop testing method, including:

acquiring a first model and an associated evaluation set from a model test queue, wherein the evaluation set comprises test data and labeling results;

creating a first container, and running a first model to process the test data to obtain a first processing result;

acquiring an evaluation rule corresponding to the first model;

and determining a test result of the first model according to the evaluation rule and the first difference between the first processing result and the labeling result.

Embodiments of a second aspect of the present disclosure provide a model-in-loop testing apparatus, including:

the first acquisition module is used for acquiring a first model and an associated evaluation set from the model test queue, wherein the evaluation set comprises test data and labeling results;

the second acquisition module is used for creating a first container, running a first model to process the test data and acquiring a first processing result;

the third acquisition module is used for acquiring an evaluation rule corresponding to the first model;

the determining module is used for determining a test result of the first model according to the evaluating rule and the first difference between the first processing result and the labeling result.

Embodiments of a third aspect of the present disclosure provide a computer device comprising: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the model ring test method as provided by the embodiment of the first aspect of the disclosure when the processor executes the program.

An embodiment of a fourth aspect of the present disclosure proposes a computer readable storage medium storing a computer program which, when executed by a processor, implements a model-in-loop test method as proposed by an embodiment of the first aspect of the present disclosure.

Embodiments of a fifth aspect of the present disclosure propose a computer program product comprising a computer program which, when executed by a processor, implements a model-in-loop test method as proposed by embodiments of the first aspect of the present disclosure.

The model ring test method, the device, the computer equipment and the storage medium provided by the disclosure have the following beneficial effects:

in the embodiment of the disclosure, a first model and an associated evaluation set are firstly obtained from a model test queue, then a first container is created, the first model is operated to process test data to obtain a first processing result, finally, a rule corresponding to the first model is obtained, and a test result of the first model is determined according to the evaluation rule and a first difference between the first processing result and a labeling result. Therefore, the parallel operation of the model test can be realized by decoupling the model in the loop test process into a plurality of independent links and operating the model test process by creating independent containers, so that the duration of the model in the loop test is reduced, and the test efficiency is improved.

Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic flow chart of a model-in-loop test method according to an embodiment of the disclosure;

FIG. 2 is a flow chart of a model-in-loop test method according to an embodiment of the disclosure;

FIG. 3 is a flow chart of a model-in-loop test method according to an embodiment of the disclosure;

FIG. 4 is a schematic diagram of a model-in-loop testing apparatus according to an embodiment of the present disclosure;

fig. 5 illustrates a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present disclosure and are not to be construed as limiting the present disclosure.

It should be noted that the model ring test method provided by the present disclosure may be used to perform ring test on any model. For convenience of explanation, the following embodiments of the present disclosure take a tested model as an automatic driving field model as an example, and describe the model ring test method provided in the present disclosure in detail.

Model ring test methods, apparatuses, electronic devices, and storage media of embodiments of the present disclosure are described below with reference to the accompanying drawings.

Fig. 1 is a flow chart of a model in-loop test method according to an embodiment of the disclosure.

As shown in fig. 1, the model-in-loop test method may include the steps of:

step 101, a first model and an associated evaluation set are obtained from a model test queue, wherein the evaluation set comprises test data and labeling results.

The model test queue comprises all test tasks acquired by the test system.

The evaluation set is a data set for testing the first model, and can be screened in a scene library by a user; alternatively, the test system may also be based on the information of the first model, such as the type, function, etc. of the first model, the test data screened in the field Jing Ku, and the corresponding labeling result.

The test data in the scene library can be any type of data such as picture data, voice data, point cloud data and the like acquired by the vehicle in the driving process. The labeling result may be a result of manually identifying the test data, or may be a result of labeling the test data by the labeling system, which is not limited in this disclosure.

For example, the test data may be image or voice data returned by the autopilot system, or may be related image or point cloud data collected by an onboard sensor during travel.

In the method, a test system acquires a first model and an associated evaluation set from a model test queue, so that conditions are provided for decoupling of generating a model test task and executing the model test task.

Step 102, a first container is created, and a first model is operated to process the test data, so as to obtain a first processing result.

In the disclosure, the test system runs the test process of the first model by specially creating the first container, so that the test process of the first model can be isolated from the test processes of other models, and the test system can also run the test processes of different models or run the test processes of the same model under different batches of test data by creating different containers. Therefore, the safety of the model test can be improved, and the speed and the efficiency of the model test can be improved.

And step 103, acquiring an evaluation rule corresponding to the first model.

Wherein the evaluation rule is one or more criteria for describing whether the model passes the test.

For example, the evaluation rule of the corresponding image recognition class model is as follows: and if the picture identification accuracy reaches 90%, the model test is passed.

In some possible implementations, the evaluation rule may be an evaluation rule obtained from a test request corresponding to the first model. That is, the user submits their corresponding evaluation rules simultaneously when submitting a test request for the first model.

In some possible implementations, the evaluation rule may also be an evaluation rule corresponding to the type of the first model, which is obtained from a preset evaluation rule base.

The evaluation rule base comprises different types of models and corresponding evaluation rules.

Optionally, the evaluation rule base may be created for a user, or may be automatically generated for the test system based on the historical model test result, which is not limited in this disclosure.

Step 104, determining a test result of the first model according to the evaluation rule and the first difference between the first processing result and the labeling result.

The first difference is used for describing whether the first processing result corresponding to each test data is matched with the labeling result, and the value of the first difference may be 0 (unmatched) or 1 (the matching degree is greater than the matching threshold), or may be any value between 0 and 1, which is not limited in the disclosure. The test results are used to describe whether the first model passes the test.

For example, the first model is an image recognition model, and the associated evaluation rule is that the accuracy is greater than 95% and the test is passed, otherwise the test is not passed. After comparing the first processing result corresponding to each test data with the labeling result, the test system determines that the first difference between the first processing result corresponding to 97% of the test data and the labeling result is 1 (that is, the first processing result corresponding to 97% of the test data is accurate), and then can determine that the test result corresponding to the first model is passed.

In some possible implementation forms, in the case that the test set further includes a scene tag of the test data, the test system may also obtain the scene tag corresponding to the test data, and determine a test result of the first model under the scene tag based on the test data in the test set.

The scene tag is a tag for describing an acquisition scene of the test data or an application scene. For example, if the scene label corresponding to the test data in the evaluation set is "daytime", the test result determined by the test system based on the evaluation set is the result of the model in the "daytime" scene, and the result cannot be used for measuring the performance of the model in the "night" scene.

In the embodiment of the disclosure, a first model and an associated evaluation set are firstly obtained from a model test queue, then a first container is created, the first model is operated to process test data, a first processing result is obtained, then a rule corresponding to the first model is obtained, and a test result of a model is determined according to the evaluation rule and a first difference between the first processing result and a labeling result. Therefore, the parallel running of the model test can be realized by decoupling the model in the loop test process into a plurality of independent links and running different model test processes by creating independent containers, so that the duration of the model in the loop test is reduced, and the test efficiency is improved.

Fig. 2 is a flow chart of a model-in-loop test method according to an embodiment of the disclosure.

As shown in fig. 2, the model-in-loop test method may include the steps of:

in step 201, a model test request is received, where the test request includes a first model and a test data type associated with the first model.

The test data type refers to the type of data that can be processed by the first model, and the test data type can be obtained from a preset model library.

The model library comprises a first model and a type label associated with the first model. For example, the first model in the model library is an image recognition model, and the type label associated with the first model is image recognition; the first model in the model library is a "speech recognition model", and the type label associated therewith is "speech recognition".

Alternatively, the model library may be created for the user, or may be automatically generated for the test system based on a history model, which is not limited by the present disclosure.

Alternatively, the type of test data may be user-entered, which is not limited by the present disclosure.

Step 202, obtaining test data and labeling results corresponding to the test data types from a preset sample library.

The sample library may include test data and labeling results corresponding to various data types, and after the test system receives the test request, the test system may first obtain the test data and labeling results corresponding to the test data types from the sample library based on the data types in the test request, so as to test the first model.

In one possible implementation form, the sample library further includes a scene tag corresponding to each test data. Therefore, when the test system adds the model and the test data into the test queue, the scene labels can be associated to the queue, so that different test results correspond to different scene labels. When the test report is generated, a scene tag corresponding to the test result can be generated.

For example, when the test system obtains the type of the test data from the model test request as "image recognition", the test system may screen the test data with the type of "image recognition" (such as a picture returned by the autopilot system or a related picture collected by the vehicle-mounted sensor during driving, etc.), a labeling result corresponding to the test data (i.e. a result of manually recognizing the scenes of the pictures or a result of recognizing the scenes of the pictures by the labeling system), and a scene tag corresponding to the test data, where the scene of the test data is daytime, the scene tag is daytime.

Alternatively, the sample library may be created for a user, or may be automatically generated for the test system based on historical model testing, which is not limited by the present disclosure.

And 203, placing the first model, the test data and the labeling result into a model test queue.

In the method, after the test system receives the test request, the first model, the test data and the labeling result are put into the model test queue, and then the test process reads and executes the test task from the model test queue, so that the test process of the model is decoupled into independent test task generation and execution test task process, conditions are provided for realizing concurrent test of the model, and the test efficiency is improved.

Step 204, a first model and an associated evaluation set are obtained from a model test queue, wherein the evaluation set comprises test data and labeling results.

In step 205, a first container is created, and a first model is run to process the test data, and a first processing result is obtained.

And 206, acquiring an evaluation rule corresponding to the first model.

Step 207, determining a test result of the first model according to the evaluation rule and the first difference between the first processing result and the labeling result.

The specific implementation manner of steps 204 to 207 may refer to the detailed descriptions in other embodiments of the disclosure, and will not be described herein in detail.

Optionally, the test system may further obtain a second model associated with the first model from a preset model library, and run a test process of the second model by creating a specific second container to obtain a second processing result. And generating a test comparison result between the first model and the second model according to the first difference and the second difference between the second processing result and the labeling result.

For example, the first model is an "image recognition" model, the test system screens out a second model with type labels of "image recognition" from the model library, and tests the second model under the same test data as the first model to obtain a second difference. After the processing results of the first model and the second model under the same test data are respectively matched with the labeling results, the first difference indicates that the accuracy of the first model is 97% (i.e. the first model is accurate for 97% of the first processing results of the test data), and the second difference indicates that the accuracy of the first model is 85% (i.e. the second model is accurate for 85% of the results of the test data). From this it can be concluded that the test results of the first model are better than those of the second model, i.e. the first model works better than the second model.

It should be noted that, the presentation manner of the test comparison result between the first model and the second model may be set by the user when submitting the test request, or may also be determined by the test system based on a preset test report template, which is not limited in this disclosure.

In the method, the test system runs the test processes of the corresponding models by establishing the special container, so that the test processes of different models are isolated from each other and can run simultaneously, and parallel running of the model tests is realized. Different containers can execute different test tasks in parallel, so that the model test process and the test task process are decoupled, and the time of the model test is reduced. Therefore, the safety of the model test can be improved, and the speed and the efficiency of the model test can be improved.

In the embodiment of the disclosure, a model test request is received first, and a first model and a test data type associated with the first model are obtained from the model test request; secondly, test data and labeling results corresponding to the test data types are obtained from a preset sample library; placing a first model, test data and a labeling result into a model test queue, then obtaining the first model and an associated evaluation set from the model test queue, then creating a first container, operating the first model to process the test data, obtaining a first processing result, then obtaining an evaluation rule corresponding to the first model, and determining a test result of the first model according to the evaluation rule and a first difference between the first processing result and the labeling result. Therefore, the model in-loop test process is decoupled into a plurality of independent links such as task creation, task execution and the like, and different model test processes are operated by creating independent containers, so that parallel operation of model tests can be realized, the duration of the model in-loop test is reduced, and the test efficiency is improved.

Fig. 3 is a flow chart of a model-in-loop test method according to an embodiment of the disclosure.

As shown in fig. 3, the model-in-loop test method may include the steps of:

step 301, a first model and an associated evaluation set are obtained from a model test queue, wherein the evaluation set comprises test data and labeling results.

Step 302, a first container is created, and a first model is run to process test data, and a first processing result is obtained.

And 303, acquiring an evaluation rule corresponding to the first model.

Step 304, determining a test result of the first model according to the evaluation rule and the first difference between the first processing result and the labeling result.

The specific implementation manner of steps 301 to 304 may refer to the detailed descriptions in other embodiments of the disclosure, and will not be described in detail herein.

In step 305, a test report generation request is received, where the report generation request includes an identifier of the first model and a target format of the test report.

The identification of the test model is any identification which can uniquely determine the first model. For example, it may be an identifier of the algorithm file corresponding to the first model, or may be an identifier allocated to the user by the test system after submitting the user to the test system, which is not limited in this disclosure.

The target format of the test report, a style for indicating the test report, and the like. It should be noted that, the style of the test report may be selected at the trigger page when the user triggers the test report to generate the request, or may be automatically determined for the test system based on the identifier or the type of the first model, which is not limited in this disclosure.

And step 306, calling a report generation service corresponding to the target format, and processing the test result of the first model to generate a test report corresponding to the first model.

Wherein the report generating service is an algorithm, or application, that may be used to generate the test report.

Different report generating services can be configured in the test system to generate test reports with different formats or different display styles, so that the test system can select the corresponding report generating service according to the target format after receiving the test report generating request.

Alternatively, the test report may be generated by a report tool service that is self-made by the test system, by a business intelligence (Business Intelligence, BI) tool service, or by other services.

The BI tools are tools for collecting, managing and analyzing business data, so that decision makers of all levels of enterprises can obtain knowledge or insight and assist the business decision makers of the enterprises to make correct decisions. An impromptu analysis report, a explored analysis report, an early warning report, etc. may be generated.

Among them, ad hoc analysis is a specific analysis that solves a specific problem, which is the ability to analyze data to quickly find a single instant question answer. The exploratory analysis is to analyze the multidimensional analysis of users in the form of a chart table through drag operation based on basic data or self-service data sets. The multi-party requirements of different users are met. The early warning report is to monitor the change of the data median, and an alarm is sent out when the data has a value meeting the condition.

It can be appreciated that in the embodiment of the disclosure, after the model test is completed, the test result can be displayed in the form of a test report according to the need, so that references and bases are provided for the developer to update and iterate the model.

In the embodiment of the disclosure, a first model and an associated evaluation set are firstly obtained from a model test queue, then a first container is created, the first model is operated to process test data, a first processing result is obtained, then a rule corresponding to the first model is obtained, and a test result of a model is determined according to the evaluation rule and a first difference between the first processing result and a labeling result. And finally, receiving a test report generation request, calling a corresponding report generation service, and processing the test result of the first model to generate a test report corresponding to the first model. Therefore, the parallel operation of the model test is realized, the time of the model in the loop test is reduced, the test efficiency is improved, and test reports in different formats can be generated according to the needs, so that the basis is provided for the development iteration of the model.

In order to implement the above embodiment, the present disclosure further proposes a model ring test device.

Fig. 4 is a schematic structural diagram of a model in-loop testing device according to an embodiment of the disclosure.

As shown in fig. 4, the in-loop test apparatus 400 may include:

the first obtaining module 410 is configured to obtain a first model and an associated evaluation set from a model test queue, where the evaluation set includes test data and labeling results.

The second obtaining module 420 is configured to create a first container, run a first model to process the test data, and obtain a first processing result.

And a third obtaining module 430, configured to obtain an evaluation rule corresponding to the first model.

The determining module 440 is configured to determine a test result of the first model according to the evaluation rule and the first difference between the first processing result and the labeling result.

Optionally, the model ring test device 400 further includes:

a receiving module (not shown in the figure) for receiving a model test request, wherein the test request includes a first model and a test data type associated with the first model;

the first obtaining module 410 is further configured to obtain test data and a labeling result corresponding to the test data type from a preset sample library;

and the processing module (not shown in the figure) is used for placing the first model, the test data and the labeling result into a model test queue.

Optionally, the first obtaining module 410 is further configured to:

acquiring an evaluation rule from a test request corresponding to the first model; or alternatively, the process may be performed,

and acquiring an evaluation rule corresponding to the type of the first model from a preset evaluation rule base.

Optionally, the second obtaining module 420 is further configured to:

acquiring a second model associated with the first model from a preset model library;

creating a second container, and operating a second model to process the test data to obtain a second processing result;

optionally, the third obtaining module 430 is further configured to:

and acquiring a scene label corresponding to the test data.

Optionally, the determining module 440 is further configured to:

determining a type label corresponding to the first model, and storing the first model and the type label in a preset model library in an associated mode;

determining a test result signed by the first model at the field Jing Biao;

and generating a test comparison result between the first model and the second model according to the first difference and the second difference between the second processing result and the labeling result.

Optionally, the above receiving module is further configured to:

a test report generation request is received, wherein the report generation request comprises an identification of a first model and a target format of a test report.

Optionally, the processing module is further configured to:

and calling a report generating service corresponding to the target format, and processing the test result of the first model to generate a test report corresponding to the first model.

The functions and specific implementation principles of the foregoing modules in the embodiments of the present disclosure may refer to the foregoing method embodiments, and are not repeated herein.

The model in-loop testing device of the embodiment of the disclosure firstly acquires a first model and an associated evaluation set from a model testing queue, then creates a first container, operates the first model to process test data to acquire a first processing result, finally acquires a rule corresponding to the first model, and determines a testing result of the first model according to the evaluation rule and a first difference between the first processing result and a labeling result. Therefore, the model is decoupled into a plurality of independent links in the ring test process, and the model test process is operated by creating the container, so that parallel operation of the model test can be realized, the time length of the model in the ring test is reduced, and the test efficiency is improved.

In order to achieve the above embodiments, the present disclosure further proposes an electronic device including: the method comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the model ring test method as proposed by the previous embodiment of the disclosure when executing the program.

In order to implement the foregoing embodiments, the present disclosure further proposes a computer-readable storage medium storing a computer program, which when executed by a processor, implements a model-in-loop test method as proposed in the foregoing embodiments of the present disclosure.

In order to implement the above-described embodiments, the present disclosure also proposes a computer program product comprising a computer program which, when executed by a processor, implements a charging method as proposed in the above-described embodiments of the present disclosure.

Fig. 5 illustrates a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present disclosure. The computer device 12 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.

As shown in FIG. 5, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include industry Standard architecture (Industry Standard Architecture; hereinafter ISA) bus, micro channel architecture (Micro Channel Architecture; hereinafter MAC) bus, enhanced ISA bus, video electronics standards Association (Video Electronics Standards Association; hereinafter VESA) local bus, and peripheral component interconnect (Peripheral Component Interconnection; hereinafter PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter: RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a compact disk read only memory (Compact Disc Read Only Memory; hereinafter CD-ROM), digital versatile read only optical disk (Digital Video Disc Read Only Memory; hereinafter DVD-ROM), or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the various embodiments of the disclosure.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods in the embodiments described in this disclosure.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, the computer device 12 may also communicate with one or more networks such as a local area network (Local Area Network; hereinafter LAN), a wide area network (Wide Area Network; hereinafter WAN) and/or a public network such as the Internet via the network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the methods mentioned in the foregoing embodiments.

According to the technical scheme, a first model and an associated evaluation set are firstly obtained from a model test queue, then a first container is created, the first model is operated to process test data, a first processing result is obtained, then a rule corresponding to the first model is obtained, and a test result of the first model is determined according to the evaluation rule and a first difference between the first processing result and a labeling result. Therefore, the model in-loop test process is decoupled into a plurality of independent links such as task creation, task execution and the like, and different model test processes are operated by creating independent containers, so that parallel operation of model tests can be realized, the duration of the model in-loop test is reduced, and the test efficiency is improved.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, the meaning of "a plurality" is at least two, such as two, three, etc., unless explicitly specified otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this specification

In the context of this document, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or part of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.

Furthermore, each functional unit in the embodiments of the present disclosure may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present disclosure have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present disclosure, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present disclosure.

Claims

1. A method for in-loop testing of a model, comprising:

acquiring a first model and an associated evaluation set from a model test queue, wherein the evaluation set comprises test data and labeling results, and the model test queue comprises all test tasks acquired by a test system;

creating a first container, running the first model to process the test data, and obtaining a first processing result, wherein different containers run test processes of different models or run test processes of the same model under different batches of test data;

acquiring an evaluation rule corresponding to the first model;

determining a test result of the first model according to the evaluation rule and a first difference between the first processing result and the labeling result;

before the first model and the associated evaluation set are obtained from the model test queue, the method further comprises the following steps:

receiving a model test request, wherein the test request comprises the first model and a test data type associated with the first model;

acquiring test data and labeling results corresponding to the test data types from a preset sample library;

placing the first model, the test data and the labeling result into the model test queue;

wherein the method further comprises:

creating a second container, and operating the second model to process the test data to obtain a second processing result;

generating a test comparison result between the first model and the second model according to the first difference and the second difference between the second processing result and the labeling result;

receiving a test report generation request, wherein the report generation request comprises an identifier of the first model and a target format of a test report;

2. The method of claim 1, wherein the obtaining the evaluation rule corresponding to the first model comprises:

acquiring the evaluation rule from the test request corresponding to the first model; or alternatively, the process may be performed,

3. The method as recited in claim 2, further comprising:

determining a type label corresponding to the first model;

and associating the first model with the type tag and storing the first model and the type tag into the preset model library.

4. A method according to any one of claims 1-3, wherein said determining test results of said first model comprises:

acquiring a scene tag corresponding to the test data;

and determining a test result of the first model under the scene label.

5. A model in-loop test apparatus, comprising:

the first acquisition module is used for acquiring a first model and an associated evaluation set from a model test queue, wherein the evaluation set comprises test data and labeling results, and the model test queue comprises all test tasks acquired by a test system;

the second acquisition module is used for creating a first container, running the first model to process the test data and acquiring a first processing result, wherein different containers run test processes of different models or run test processes of the same model under different batches of test data;

the determining module is used for determining a test result of the first model according to the evaluation rule and a first difference between the first processing result and the labeling result;

the receiving module is used for receiving a model test request, wherein the test request comprises a first model and a test data type associated with the first model;

the first acquisition module is further used for acquiring test data and labeling results corresponding to the test data types from a preset sample library;

the processing module is used for placing the first model, the test data and the labeling result into a model test queue;

the second obtaining module is further configured to: acquiring a second model associated with the first model from a preset model library; creating a second container, and operating a second model to process the test data to obtain a second processing result;

the determining module is further configured to: generating a test comparison result between the first model and the second model according to the first difference and the second difference between the second processing result and the labeling result;

the receiving module is further configured to: receiving a test report generation request, wherein the report generation request comprises an identification of a first model and a target format of a test report;

the processing module is further configured to: and calling a report generating service corresponding to the target format, and processing the test result of the first model to generate a test report corresponding to the first model.

6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the model-in-loop test method of any of claims 1-4 when the program is executed.

7. A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the model-in-loop test method according to any one of claims 1-4.