CN115422094B

CN115422094B - Algorithm automatic testing method, central dispatching equipment and readable storage medium

Info

Publication number: CN115422094B
Application number: CN202211379312.9A
Authority: CN
Inventors: 任思宇; 侯国飞; 王盟; 吴立; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-02-28
Anticipated expiration: 2042-11-04
Also published as: CN115422094A

Abstract

The application discloses an algorithm automatic testing method, a central dispatching device and a computer readable storage medium. The algorithm testing method comprises the steps that in response to a testing instruction of a user, a communication layer sends the received testing instruction to a service layer; the service layer acquires test data from the data layer based on the test instruction; the service layer divides the test task indicated by the test instruction into at least one subtask, selects a target subtask meeting the condition from the at least one subtask, and sends the test data and the target subtask to the equipment cluster, so that the target equipment in the equipment cluster tests the algorithm program indicated by the target subtask according to the test data; and the service layer receives the test result returned by the target equipment and generates a test report based on the test result. By means of the method, the technical problem that algorithm testing processing efficiency is low can be solved.

Description

Algorithm automatic testing method, central dispatching equipment and readable storage medium

Technical Field

The present application relates to the field of testing, and in particular, to an algorithm automated testing method, a central scheduling device, and a computer-readable storage medium.

Background

Deep learning algorithms are receiving wide attention and applications in more and more fields. In the process of applying the algorithm achievement to real life, certain manual operation and lack of systematic management exist. In the processes of structural improvement, parameter optimization and the like of the algorithm, a large amount of training and testing are required to be performed on various data sets by using the algorithm in each improvement, and the work of testing and training consumes a large amount of operation performance of a server, so that the execution efficiency of a testing task is seriously influenced.

Disclosure of Invention

The application mainly aims to provide an algorithm automatic testing method, central scheduling equipment and a computer readable storage medium, and the technical problem of low algorithm testing processing efficiency can be solved.

In order to solve the above technical problem, the first technical solution adopted by the present application is: an automated testing method for algorithms is provided. The algorithm testing method comprises the steps that in response to a testing instruction of a user, a communication layer sends the received testing instruction to a service layer; the service layer acquires test data from the data layer based on the test instruction; the service layer divides the test task indicated by the test instruction into at least one subtask, selects a target subtask meeting the condition from the at least one subtask, and sends the test data and the target subtask to the equipment cluster, so that the target equipment in the equipment cluster tests the algorithm program indicated by the target subtask according to the test data; and the service layer receives the test result returned by the target equipment and generates a test report based on the test result.

In order to solve the above technical problem, the second technical solution adopted by the present application is: a dispatch center apparatus is provided. The dispatch center device includes a memory and a processor coupled to the memory. The memory is used for storing program data which can be executed by the processor for implementing the method as described in the first aspect.

In order to solve the above technical problem, the third technical solution adopted by the present application is: a computer-readable storage medium is provided. The computer-readable storage medium is used for storing program data, which can be executed by a computer to implement the method as described in the first aspect.

The beneficial effect of this application is: the application is applied to dispatching center equipment, and the dispatching center equipment comprise a communication layer, a data layer and a service layer. The communication layer responds to the received test instruction and sends the test instruction to the service layer to be executed, the service layer obtains test data according to the test instruction and divides the test tasks corresponding to the test instruction to obtain at least one subtask. And in the process of executing the subtasks, selecting the target subtasks meeting the conditions from the subtasks, sending the target subtasks and the corresponding test data to the equipment cluster, and completing the test of the degree of the algorithm corresponding to the target subtasks based on the test data by the equipment cluster. The service layer only needs to receive the test result from the device cluster to generate a corresponding test report. The service layer delivers the target subtasks which need to occupy a large amount of performance and consume a large amount of time to the equipment cluster for execution, so that the load pressure of the local server is reduced, the task execution efficiency of the local server is greatly improved, and more algorithm test tasks can be executed.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of an embodiment of a dispatch center device of the present application;

FIG. 2 is a schematic structural diagram of an embodiment of a service layer of a dispatch center device according to the present application;

FIG. 3 is a detailed diagram of a service layer of the dispatch center device according to the present application;

FIG. 4 is a schematic flow chart diagram of one embodiment of a task schedule for a test task flow scheduling center;

FIG. 5 is a schematic diagram of a work process schedule;

FIG. 6 is a schematic diagram of a test flow framework;

FIG. 7 is a schematic flow chart diagram of a first embodiment of the automated testing method for algorithms of the present application;

FIG. 8 is a schematic flow chart diagram of a second embodiment of the automated testing method of the algorithm of the present application;

FIG. 9 is a schematic flow chart of a third embodiment of the automated testing method for algorithms of the present application;

FIG. 10 is a schematic flow chart diagram of a fourth embodiment of the automated testing method for algorithms of the present application;

FIG. 11 is a schematic view of a test report;

FIG. 12 is yet another schematic illustration of a test report;

FIG. 13 is yet another schematic illustration of a test report;

FIG. 14 is a schematic structural diagram of an embodiment of a central scheduling apparatus of the present application;

FIG. 15 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.

Before introducing the algorithm automation test method of the present application, a dispatch center device applying the method is introduced.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an embodiment of a scheduling center device according to the present application. The dispatch center device includes a communication layer 10, a service layer 20, and a data layer 30. The communication layer 10 is used for transmitting the received command to the service layer 20 for processing. The service layer 20 is used for acquiring the test data from the data layer 30 according to the instruction of the communication layer 10, testing the specified algorithm through the test data, and generating a test report according to the test result. The specified algorithm may be an algorithm code directly input before the start of the test, or may be an algorithm template stored in advance, and various different algorithms can be obtained by configuring information such as corresponding parameters or code segments. And the data layer is used for storing test data of a plurality of algorithms and further storing information such as algorithm templates and the like which are stored in advance. The specified algorithm may be determined based on the received instructions.

In an embodiment, the dispatch center device further includes a presentation layer. The display layer is used for displaying an interface, providing an operable and visual interface for a user, receiving related instruction information input by the user, transmitting the instruction information to the service layer through the communication layer, receiving a test report from the service layer, and displaying the related test report for the user. In one case, the presentation layer can be a web interface. The interface can be constructed by HTML, THML5, VGA, vue, etc. HTML is hypertext markup language. Hypertext is a word that may contain pictures, links, and even non-text elements such as music and programs. HTML is a standard language for making web pages, which eliminates barriers to information exchange between different computers. THML5 was the latest standard released in 2014. VGA is a computer display standard using analog signals, also known as video graphics array display standard. Vue is a set of progressive JavaScript framework for building user interfaces.

In one embodiment, the communication layer may further include Nginx. Nginx is a high performance HTTP and reverse proxy web server, while also providing IMAP/POP3/SMTP services. In the communication layer, nginx acts as a load balancing server, and distributes the requested connection from the presentation layer, i.e., the web, to the service layer with a small number of connections. The Nginx has the characteristics of less occupied memory and strong concurrency capability, and the Nginx concurrency capability is better represented in the same type of webpage server.

In one embodiment, the data layer may include a database management system such as MySQL and a database such as MongoDB. MySQL is a relational database management system which saves data in different tables, rather than storing all data in one library, thereby increasing access speed and system flexibility. The MongoDB is a database based on distributed file storage, is between a relational database and a non-relational database, and has the most abundant functions in the non-relational database, and is most similar to the relational database. The supported data is in a bson format and is an extension of json. The query statement supported by the database is very strong and is similar to an object-oriented query statement, so that most of query functions similar to a relational database table can be almost realized, and the establishment of indexes can be supported. Based on the characteristics of high performance, easy deployment, easy use and convenient storage, the method can be suitable for various actual scenes.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an embodiment of a service layer of a scheduling center device according to the present application.

The service layer comprises a test task flow management module 21, a test flow message service module 22, a test task flow operation module 23 and a test task flow scheduling center 24. The test task flow management module 21 is configured to divide a test task of an algorithm for an instruction into a plurality of subtasks. And the test flow message service module 22 is configured to generate a message queue for the subtask to complete message passing work for the subtask. And the test task flow running module 23 is configured to create a process for the execution of the subtasks, and monitor the message queue to execute the corresponding subtasks. And the test task flow scheduling center 24 is used for scheduling the subtasks according to the execution condition of the subtasks.

The test task flow scheduling center 24 controls the test task flow running module 23 through the test task flow management module 21 and the test flow message service module 22 to control the execution of the algorithmic test task. Referring to fig. 3, fig. 3 is a specific schematic diagram of a service layer of the scheduling center device according to the present application. After the user submits the task of the algorithm automated testing, at the service layer, the testing task is first divided into a plurality of subtasks, such as a first subtask and a second subtask, in the testing task flow management module 21. A subtask message queue corresponding to the number of subtasks is defined in the test flow message service module 22 to be responsible for the message passing work of the corresponding subtask. The test task flow running module 23 creates a corresponding process queue to be responsible for running the corresponding subtask. The number of processes in the process queue may be one or more. And the process queue monitors the corresponding message queue, and when the sub-task to be executed exists in the message queue, the task is taken out and executed. In an embodiment, when the tested algorithm is a face recognition algorithm, the subtasks divided by the test task flow management module 21 include four subtasks, i.e., data preprocessing, test algorithm running, result comparison, and test report generation. The subtask execution processes are data preprocessing, test algorithm operation, result comparison and test report generation in sequence. Four message queues are defined in the test flow message service module 22 for passing messages of corresponding subtasks. Four sets of processes are created in the test task flow execution module 23 to execute corresponding subtasks, where the number of processes in each set is 48.

Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of task scheduling in a test task flow scheduling center. After each test task is divided into subtasks, the subtasks are dispatched by a test task flow dispatching center. There are two task states for the subtask: can run and wait to run. The initial state of the subtask with the first execution sequence is that the subtask can be executed, and the initial state of the other subtasks is that the subtask is waiting to be executed. And the test task flow scheduling center sends the subtasks with the task states of being capable of running to the corresponding message queues, and if the first subtask is sent to the first subtask message queue, a first subtask execution process in the test task flow running module monitors the first subtask message queue and takes out and executes the task information in the message queue. And after the execution of the first subtask is finished, the test task flow operation module informs the test task flow scheduling center of the completion of the execution of the first subtask and returns a corresponding execution result. The test task flow scheduling center can continuously set the state of the second subtask to be operable, and sends the state to a second subtask message queue, and the second subtask execution process continues to operate. And after the second subtask result is returned to the test task flow scheduling center, continuing to execute the next subtask until all subtasks are executed. When any subtask fails to be executed, the test task flow scheduling center stops the execution of the subsequent subtasks and sends error information to remind a user of the execution error.

In an embodiment, the service layer further includes a device management service module, and the device management service module is configured to send a task executed by the algorithm to the device cluster for execution. In the corresponding task of executing the algorithm for testing, the algorithm is not directly executed locally, but the corresponding running task is handed to the device cluster for running, for example, the testing algorithm runs the subtask. And the executed process informs the equipment cluster of the information of the operation of the algorithm in a network communication mode. The information sent to the equipment cluster comprises a stored data set, a test algorithm program code and relevant parameter information for running the test algorithm, and after receiving the information, the cluster control node of the equipment cluster distributes an algorithm running task to corresponding equipment. After the operation is finished, the device cluster control node informs the executed process of the algorithm operation result, and the execution of the subtask is finished. In this module, the status of the devices running the algorithm and the server can be checked.

The device cluster has multiple device types, and each device type can correspond to multiple devices to operate. In a cluster control node of the device cluster, each device type corresponds to one work process, and the cluster control node is responsible for scheduling algorithm running tasks and distributing different algorithm running tasks to different work processes. The work process corresponds to a task list and a device linked list. As shown in FIG. 5, FIG. 5 is a schematic diagram of a work process scheduling. The work process schedules the tasks to be run on the idle devices in the device chain table.

The task list managed by the work process adopts an EDF scheduling strategy. The EDF (early delay First) is a preemptible scheduling algorithm, and is a dynamic scheduling algorithm. The EDF algorithm maintains a real-time task ready queue in the system with the task with the earliest deadline queued at the front of the queue. When the scheduler selects a task, it always selects the first task in the ready queue, and allocates a handler to it for operation. For the same type of equipment, the equipment of the same type is linearly connected in series by adopting a bidirectional circular linked list. And the equipment management service module controls the equipment cluster to select available equipment for algorithm operation according to a polling mode. When selecting the device, the work process checks whether the device can continue to execute the task in a polling mode. Each polling does not start from the first device, so the work process records the device node when the last scheduling task selects the device, and each polling is started from the last device node to search for the subsequent node. When the worker process finds that the device is not available, the device is deleted from the device chain table. When a new device is added, the new device is added to the linked list, and a new device node is configured. After the algorithm task is executed on the device (successfully or unsuccessfully), the device can be restored to an available state, and the work process returns the execution result of the algorithm execution task to the cluster control node.

In one embodiment, the service layer further comprises a data management service module for providing multiple ways of uploading the test data set. The system also comprises a task configuration management module, wherein the module can perform related configuration of tasks, including comparison configuration of algorithm results, report configuration during statistical indexes and the like, the comparison configuration is a comparison scheme of the test results of the test algorithms and the standard results, and the report configuration provides diversified statistical choices.

Referring to fig. 6, fig. 6 is a schematic diagram of a test flow framework. And preprocessing the test data set to obtain a standard result, operating the test algorithm to obtain a test result, and comparing the standard result with the test result to obtain a test report of the test algorithm. The detail check provides an interface for data visualization that allows viewing of each picture data set or each frame of video data set.

In the process, each subtask can be configured with a corresponding script plug-in for configuring the corresponding subtask according to the script plug-in the test task process running module, so that data processing and comparison are facilitated. Through the script plug-in 1, the data format of the obtained standard result can be unified and standardized, so that the framework can support compatibility with various data set formats. By means of the script plug-in 2, it is possible to configure by algorithms such that the framework is able to run test algorithms from different platforms, e.g. a pitorch, tensorflow, caffe, etc. Through the script plug-in 3, the comparison scheme of the test result and the standard result can be flexibly set. The script plug-in 4 may configure the multidimensional screening of data sets, such as: the method comprises the steps of judging whether category classification is correct or not, whether target detection is matched or not, whether attribute identification is correct or not and the like, comparing labeling and algorithm output, and rapidly positioning out the reasons that the algorithm is not well represented on a test data set, such as wrong labeling of the data set, more interference sources in the data set and the like. The method can help algorithm personnel to clean and effectively expand the test data set and the training data set. The script plug-in 5 can configure a test algorithm report, customize a test report style, and calculate specified statistical indexes, such as setting an IOU value, performing statistics according to categories, performing statistics according to attributes, performing statistics according to events and the like; the levels of statistical indexes such as recall rate, detection rate, accuracy rate, omission factor, false detection rate and the like reflect whether the algorithm is too sensitive in certain categories and attributes, whether generalization capability is insufficient, whether overfitting occurs and the like, and an optimization direction can be provided for algorithm personnel.

In the embodiment, the plug-in script can meet the verification requirements of different types of algorithm effects, the function of detail checking is further combined, the problems occurring in analysis can be quickly positioned, and the efficiency of algorithm optimization and data cleaning is greatly improved.

In addition, in the embodiment, the function of configuring each flow and report can be realized, the test platform has universality, different algorithms can be tested, different test reports can be configured, and the workload of testers is greatly reduced.

Referring to fig. 7, fig. 7 is a schematic flowchart of a first embodiment of the automated testing method for algorithm of the present application. The automatic testing method of the algorithm is applied to dispatching center equipment which comprises a communication layer, a data layer and a service layer. Which may be any one and possibly a combination of the dispatch center devices described above. Which comprises the following steps:

s11: and responding to the test instruction of the user, and sending the received test instruction to the service layer by the communication layer.

And the communication layer sends the test instruction for the algorithm input by the user to the service layer. The test instructions include instructions to determine an algorithm to be tested and also instructions to determine a test data set for testing. The instruction for determining the algorithm to be tested may include an algorithm code directly input by a user, or may be information such as selecting a prestored algorithm sample and adding a corresponding configuration parameter or configuration field. The instructions to determine the test data set for testing may include information such as the storage location, data attributes, etc. of the test data set.

S12: the service layer obtains test data from the data layer based on the test instructions.

The service layer can acquire the test data for the test algorithm from the data layer according to the information related to the test data in the test instruction.

In one embodiment, the labeling format of the test data can be uniformly standardized by the first script plug-in the data layer. Through unified standardization, the dispatching center equipment can be compatible with various formats of test data sets. When the standard result is compared with the test result in the follow-up process, the unified and standardized labels can enable the comparison to be simpler, more convenient and clearer.

S13: the service layer divides the test task indicated by the test instruction into at least one subtask, selects a target subtask meeting the condition from the at least one subtask, and sends the test data and the target subtask to the device cluster, so that the target device in the device cluster tests the algorithm program indicated by the target subtask according to the test data.

The service layer divides the test task of the target algorithm into at least one subtask to execute, and for some subtasks meeting the conditions, the service layer takes the subtask as a target subtask and delivers the execution of the target subtask to the device cluster to process. The condition may be that the performance requirement of the sub-task processing exceeds a preset threshold, or that the sub-task is a task of some kind of complex operation, etc. For such tasks, if local processing may have a certain influence on the operation of the local server, in order to reduce the influence of the local server and improve the efficiency of algorithm testing, such tasks are sent to the device cluster for execution, thereby reducing the operation burden of the local server.

S14: and the service layer receives the test result returned by the target equipment and generates a test report based on the test result.

After the device cluster completes the execution of the target subtask, a test result is returned to the service layer. And the service layer generates a corresponding algorithm test report based on the test result.

In this embodiment, the method is applied to a scheduling center device, and the scheduling center device includes a communication layer, a data layer, and a service layer. The communication layer responds to the received test instruction and sends the test instruction to the service layer to be executed, the service layer obtains test data according to the test instruction and divides the test tasks corresponding to the test instruction to obtain at least one subtask. And in the process of executing the subtasks, selecting the target subtasks meeting the conditions from the subtasks, sending the target subtasks and the corresponding test data to the equipment cluster, and completing the test of the degree of the algorithm corresponding to the target subtasks based on the test data by the equipment cluster. The service layer only needs to accept the test result from the device cluster to generate a corresponding test report. The service layer delivers the target subtasks which need to occupy a large amount of performance and consume a large amount of time to the equipment cluster for execution, so that the load pressure of the local server is reduced, the task execution efficiency of the local server is greatly improved, and more algorithm test tasks can be executed.

Referring to fig. 8, fig. 8 is a flowchart illustrating a second embodiment of the algorithm automated testing method of the present application. The method is a further extension of step S13, and includes the steps of:

s21: the service layer determines a test task indicated by the test instruction according to the test instruction of the communication layer, divides the test task into a plurality of subtasks according to a plurality of subtask types, and puts the subtasks in a running state into corresponding message queues.

The service layer divides the test task corresponding to the test instruction into a plurality of subtasks according to the type of the subtask to execute. The subtask types may include: preprocessing a data set, running a test algorithm program, comparing results and generating a test report. The initial state of the data set preprocessing is operable, and the initial state of the test algorithm program operation, the result comparison and the test report generation is waiting for operation. In this embodiment, the execution sequence of the subtasks includes data set preprocessing, test algorithm program running, result comparison, and test report generation. The data set is preprocessed into a first executed subtask, so that the data set is in an initial state of being capable of running and can be directly started to be executed, other subtasks are subsequently executed tasks, and the execution can be started after the corresponding subtasks before the task is finished, so that the initial state of the task is waiting to run, and the task state is changed into being capable of running and is started to be executed after the corresponding subtasks before the task is finished. The step of placing the subtasks in the executable state into the corresponding message queue includes: and after the current subtask is executed, the service layer switches the state of the next subtask according to the subtask sequence into operable state, and puts the subtask in operable state into the corresponding message queue. The execution of the subtasks can be performed with reference to FIG. 4 and the associated description.

S22: and the service layer sends the target subtasks in the message queue and the test data thereof to the equipment cluster.

The message queues are respectively established by the service layer according to a plurality of subtask types, and the task types can comprise data set preprocessing, test algorithm program operation, result comparison, test report generation and the like. The message queue is used for storing the subtasks indicated by the test instruction, and the subtasks stored in the message queue are in a state of being capable of running. The service layer selects a target subtask from the subtasks according to the task type, the selected target subtask is sent to the equipment cluster by the service layer for processing, and the non-target subtask is captured from the message queue by a task execution process in the service layer and then executed. The task types of the target subtasks can be some task types which have larger requirements on data processing capacity, higher complexity and more processing time consumption, for such tasks, if local processing may have certain influence on the operation of the local server, in order to reduce the influence of the local server and improve the efficiency of algorithm testing, such tasks are sent to the equipment cluster for execution, and the operation burden of the local server is reduced. In this embodiment, if the service layer divides the algorithm test task into sub-tasks such as data set preprocessing, test algorithm program running, result comparison, test report generation, and the like, the test algorithm program running is more demanding on system performance than other sub-tasks, and the test algorithm program is taken as a target sub-task. And when the subtask type of the target subtask is the operation of the test algorithm program, the service layer sends the target subtask and the test data thereof in the message queue to the equipment cluster.

Referring to fig. 9, fig. 9 is a flowchart illustrating a third embodiment of the algorithm automated testing method of the present application. The method is a further extension of step S11, and includes the steps of:

s31: the communication layer obtains the connection number of each service layer.

S32: and the communication layer sends the test instruction to the service layer of which the connection number is less than the preset connection number threshold.

The communication layer can send the test instruction input by the user to the service layer, the communication layer can be connected with a plurality of service layers, and the obtained test instruction can be distributed to different service layers to be executed. When the test command is distributed, the communication layer will first obtain the connection number of each service layer, i.e. the number of tasks to be executed is equivalent to the corresponding number of tasks to be executed. After the connection number is obtained, the connection number is compared with a preset connection number threshold value, and a service layer smaller than the preset connection number threshold value is determined as a service layer with a low load, so that a new test task can be continuously carried. Therefore, the communication layer will send the new test command to the service layer whose connection number is less than the preset connection number threshold. In one embodiment, this function may be implemented by Nginx included in the communication layer.

Referring to fig. 10, fig. 10 is a flowchart illustrating a fourth embodiment of the automated testing method for algorithm of the present application. The method is a further extension of step S14, and includes the steps of:

s41: and the service layer compares the received target subtask test result with the corresponding standard result to generate a test report.

And after sending the target subtask to the equipment cluster, the service layer receives a test result from the equipment cluster after executing the target subtask, compares the test result with a standard result obtained in advance to judge whether the test result is accurate, and generates a test report corresponding to the algorithm test task.

The report configuration service for the test report may be configured by the service layer through the second script plug-in to provide a plurality of customized test report styles. The report configuration service includes testing report style and/or statistical indexes, such as setting IOU value, counting according to category, counting according to attribute, counting according to event, and the like, and the statistical indexes include recall rate, detection rate, accuracy rate, missed detection rate, false detection rate, and the like. The statistical indexes can reflect whether the algorithm is too sensitive in certain categories and attributes, whether generalization capability is not enough, whether overfitting occurs and the like, and can provide optimization directions for algorithm personnel.

The service layer compares the target subtask test result with the corresponding standard result, and the heating comparison configuration and the comparison logic are configured by the service layer through the third script plug-in unit, so that various flexible comparison schemes are provided. The comparison logic is described in more detail below.

In an embodiment, the service layer further configures a test environment of the algorithm program through a fourth script plug-in to run test algorithm programs of different platforms, for example, test algorithms of various platforms such as pyrrch, tensoflow, caffe, and the like.

In one embodiment, the algorithm program of the target subtask is a face recognition algorithm, and the task category of the target subtask, i.e., the task category tested by the algorithm, includes at least one of a static picture stream, a dynamic picture stream, a fuzzy video stream, and a precise video stream.

The service layer compares the received target subtask test result with the corresponding standard result in the following ways.

When the task type is a static picture stream, the service layer acquires a preset number of test results with the highest matching degree corresponding to each detection picture; if the standard result comprises any one of the test results, determining that the recognition result of the face recognition algorithm on the detected picture based on the standard result is correct;

when the face recognition algorithm to be tested recognizes the preset test data set, the algorithm outputs the index name of the corresponding face data of each detected face in the face database, and arranges the index names in the order of high similarity to low similarity. For example, for a detected picture, after a face recognition algorithm to be tested performs face recognition on the detected picture, 10 matching results are obtained, and the similarity of the 10 results is sequentially reduced according to the sequence of 1-10. If the face data which is consistent with the index name in the standard result exists in the 10 results, the recognition is considered to be correct. And for the condition that a plurality of faces exist in one detected picture, only the face with the largest width is selected for detection and matching.

In the subsequent algorithm test report, the relevant parameters include the total number of input identifications, the actual total number of people, the total number of identifications, the correct number of identifications, the number of false positives, the recall rate, the false positive rate, the correct-error number (the number of the correct number of identifications minus the number of false positives), and the like. And sending the face number corresponding to the used test data set as the total number of the faces, wherein the actual total number of the faces is the face number corresponding to the standard result, the total number of the faces is the number of the faces obtained after the test algorithm removes the duplication of the output recognition result which is larger than the similarity threshold, and the correct number of the faces is the number of the faces correctly recognized by the test algorithm. The false alarm number is the number of faces identified by the test algorithm. The obtained 10 matching results can be classified into top1 to top10. The total Topn identification number is the number of human faces with identification results in the algorithm output results, and Topn identification is correct if the index name of the result in the output results of Topn is consistent with the index name in the standard results. For example, the total number of top1 identifications is the number of results with similarity greater than a preset threshold in all top1 results, and if top1 correctly identifies that the index name of the result in top1 is consistent with the index name in the standard result, the identification is considered to be correct. The recall rate is the correct number of recognitions divided by the actual total number of people, and the false alarm rate is the false alarm number divided by the total number of sent recognitions. In the results corresponding to different tops, different similarity thresholds can be set for further analysis and statistics of the results. For the result output by each top, the recall rate can be counted according to a preset attribute value, for example, the attribute value is preset as the sex, and the result is divided into two categories, namely male and female; the preset attribute value is an age group, the age is 10-20 years, 20-30 years and the like. Referring to fig. 11, fig. 11 is a schematic diagram of a test report.

When the task type is a dynamic picture stream, the service layer acquires all test results corresponding to the detection picture sequence, wherein the test results comprise a test result with the highest matching degree corresponding to each detection picture in the detection picture sequence; if the standard result comprises any one of the test results, determining that the recognition result of the face recognition algorithm on the detected picture sequence based on the standard result is correct;

when the task type is a dynamic picture stream, the detection of the test data set is performed in units of picture sequences. In the process of algorithm identification and detection, the algorithm gives all the faces identified for the picture sequence, then all the obtained tested face results are matched with the standard results, and if the index name of any one result in the tested face results is matched with the index name of the standard result, the identification is considered to be correct. For example, the first group of the road people pictures and the standard result ABC are detected as a group, the first group of the road people pictures and the standard result D are detected as a group, the second group of the road people pictures and the standard result ABC are detected as a group, and the second group of the road people pictures and the standard result D are detected as a group. The task type is suitable for detecting whether the target portrait exists in a certain portrait range, such as detecting whether the target portrait exists in a passerby.

In the subsequent algorithm test report, the relevant parameters include the total number of input identifications, the actual total number of people, the total number of identifications, the correct number of identifications, the number of false positives, the recall rate, the false positive rate, the correct-error number (the number of the correct number of identifications minus the number of false positives), and the like. And sending the face number corresponding to the used test data set as the total number of the faces, wherein the actual total number of the faces is the face number corresponding to the standard result, the total number of the faces is the number of the faces obtained after the test algorithm removes the duplication of the output recognition result which is larger than the similarity threshold, and the correct number of the faces is the number of the faces correctly recognized by the test algorithm. The false alarm number is the number of faces identified by the test algorithm. The recall rate is the number of correct recognitions divided by the actual total number of people, and the false alarm rate is the number of false alarms divided by the total number of incoming recognitions. In the statistics of the task types, only one output result with the highest similarity is taken as an output result for the output result of the face, for example, only the result of top1 is output, the total identification number only takes the output result of top1, and then duplication is removed according to the index name to obtain the final identification number. In the display of the test report, the test report can be statistically displayed by setting a similarity threshold or a false alarm rate threshold. Referring to fig. 12, fig. 12 is a further schematic diagram of a test report. After the false alarm rate is set, the corresponding similarity score can be calculated according to the false alarm rate. Firstly, calculating the number of false alarms: (the total number of identification times the false alarm rate) is rounded to obtain an integer value n, and the similarity of a result before the n +1 th false alarm is taken as the similarity score corresponding to the false alarm rate.

When the task type is the fuzzy video stream, the service layer acquires all test results corresponding to the detection video, wherein the test results comprise the test result with the highest matching degree corresponding to each frame of image of the detection video; if the standard result comprises any one of the test results, determining that the recognition result of the face recognition algorithm on the detection video based on the standard result is correct;

when the task type is the fuzzy video stream, the detection of the test data set is performed by taking the video as a whole. In the process of algorithm identification and detection, the algorithm gives all the faces identified by the video, then all the obtained tested face results are matched with the standard results, and if any one result in the tested face results is matched with the standard result index name, the identification is considered to be correct. The comparison of the test face results does not match information such as video frame number positions and the like.

In the task category statistics, the related parameters include the total number of input identifications, the actual total number of people, the total number of identifications, the correct number of identifications, the number of false positives, the recall rate, the false alarm rate, the correct-wrong number (the number of the correct number of identifications minus the number of false positives), and the like. And sending the face number corresponding to the used test data set as the total number of the faces, wherein the actual total number of the faces is the face number corresponding to the standard result, the total number of the faces is the number of the faces obtained after the test algorithm removes the duplication of the output recognition result which is larger than the similarity threshold, and the correct number of the faces is the number of the faces correctly recognized by the test algorithm. The false alarm is the number of faces identified by the test algorithm. The recall rate is the correct number of recognitions divided by the actual total number of people, and the false alarm rate is the false alarm number divided by the total number of sent recognitions. And for the output result of the face, only one with the highest similarity is taken as the output result, for example, only the top1 is output, the total identification number only takes the output result of the top1, and then duplication is removed according to the index name to obtain the final identification number. In the display of the test report, the test report can be statistically displayed by setting a similarity threshold or a false alarm rate threshold. In the presentation of the test report, the optimal result can be further presented. The optimal result is a similarity threshold value when the correct identification number and the wrong identification number are the maximum, and when a plurality of similarity threshold values reach the optimal result, a threshold value with high similarity is taken. And taking the precision of the similarity threshold value as one percent, reducing the similarity in the obtained result by one percent precision from one hundred percent of the similarity, observing whether the number of the corresponding correct identification number-the number of the wrong identification number starts to be continuously reduced or not (such as three-continuous reduction), and if so, determining that the result corresponding to the corresponding similarity threshold value when the reduction starts is the optimal result. The threshold precision can be configured according to actual conditions. The similarity threshold may be set in the range of sixty-seven percent to one hundred percent. Referring to fig. 13, fig. 13 is another schematic diagram of a test report.

When the task type is the accurate video stream, the service layer obtains all test results corresponding to the detection video, wherein the test results comprise the test result with the highest matching degree corresponding to each frame of image of the detection video; if the standard result comprises the test result, and the test result is matched with the standard result in attribute, determining that the recognition result of the face recognition algorithm on the detection video based on the standard result is correct; the attribute matching comprises matching of an image frame number corresponding to the test result with an image frame number of the corresponding standard result, matching of position information of the test result in the image frame with position information of the corresponding standard result, and matching of an index name of the test result with an index name of the corresponding standard result.

And when the task type is the accurate video stream, the detection object of the face recognition algorithm to be detected is the whole video. The output face test result corresponds to the video image frame number of the video where the detected face is located and also includes the position information in the image frame. And when the output test result is consistent with the index name of the standard result and the corresponding image frame number is consistent with the position information in the image frame, judging that the identification is correct.

In the task category statistics, the related parameters include the total number of input identifications, the actual total number of people, the total number of identifications, the correct number of identifications, the number of false positives, the recall rate, the false positive rate, the correct-wrong number (the number of the correct number of identifications minus the number of false positives), and the like. And sending the face number corresponding to the used test data set as the total number of the faces, wherein the actual total number of the faces is the face number corresponding to the standard result, the total number of the faces is the number of the faces obtained after the test algorithm removes the duplication of the output recognition result which is larger than the similarity threshold, and the correct number of the faces is the number of the faces correctly recognized by the test algorithm. The false alarm is the number of faces identified by the test algorithm. The recall rate is the number of correct recognitions divided by the actual total number of people, and the false alarm rate is the number of false alarms divided by the total number of incoming recognitions. And (3) as for the output result of the face, only one with the highest similarity is taken as an output result, for example, only the top1 is output, the total identification number only takes the output result of the top1, and then duplication is removed according to the index name to obtain the final identification number. When the test report result is displayed, the relevant parameters such as the total number of input identifications, the actual total number of people, the total number of identifications, the correct number of identifications, the number of false positives, the recall rate, the false positive rate, the correct-error number (the number of the correct number of identifications minus the number of false positives) and the like can be displayed. And the test result can be displayed according to the false alarm rate, the similarity threshold value and the like. The optimal results can be further demonstrated. The optimal result is a similarity threshold value when the correct identification number and the wrong identification number are the maximum, and when a plurality of similarity threshold values reach the optimal result, a threshold value with high similarity is taken. And taking the precision of the similarity threshold value as one percent, reducing the similarity in the obtained result by one percent precision from one hundred percent of the similarity, observing whether the number of the corresponding correct identification numbers-the number of the corresponding wrong identification numbers starts to be continuously reduced (such as three-continuous reduction), and if so, determining that the result corresponding to the corresponding similarity threshold value when the reduction starts is the optimal result. The threshold precision can be configured according to actual conditions. The similarity threshold may range from sixty-seven percent to one hundred percent.

As shown in fig. 14, fig. 14 is a schematic structural diagram of an embodiment of a central scheduling apparatus in the present application.

The central scheduling apparatus includes a processor 110 and a memory 120, wherein the processor 110 is coupled to the memory 120.

The processor 110 controls the operation of the Central scheduling device, and the processor 110 may also be referred to as a Central Processing Unit (CPU). The processor 110 may be an integrated circuit chip having the processing capability of signal sequences. The processor 110 may also be a general purpose processor, a digital signal sequence processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 120 stores instructions and program data required for the processor 110 to operate.

The processor 110 is configured to execute instructions to implement the methods provided by any of the embodiments and possible combinations of the algorithmic automated testing methods described previously herein.

As shown in fig. 15, fig. 15 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

An embodiment of the storage medium readable by the present application includes a memory 210, and the memory 210 stores program data that, when executed, implements the method provided by any one of the embodiments and possible combinations of the method for automated testing of algorithms of the present application.

The Memory 210 may include a medium that can store program instructions, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may also be a server that stores the program instructions, and the server may send the stored program instructions to other devices for operation, or may self-operate the stored program instructions.

To sum up, the application is applied to a dispatching center device, and the dispatching center device comprises a communication layer, a data layer and a service layer. The communication layer responds to the received test instruction and sends the test instruction to the service layer to be executed, the service layer obtains test data according to the test instruction and divides the test tasks corresponding to the test instruction to obtain at least one subtask. And in the process of executing the subtasks, selecting a target subtask meeting the condition from the subtasks, sending the target subtask meeting the condition and corresponding test data to the equipment cluster, and completing the test of the corresponding algorithm degree of the target subtask based on the test data by the equipment cluster. The service layer only needs to receive the test result from the device cluster to generate a corresponding test report. The service layer delivers the target subtasks which need to occupy a large amount of performance and consume a large amount of time to the equipment cluster for execution, so that the load pressure of the local server is reduced, the task execution efficiency of the local server is greatly improved, and more algorithm test tasks can be executed. The automatic test system can realize the standardization of test data, test codes and output results through the plug-in scripts and the configuration functions, so that the automatic test system has higher universality and can be used for various data sets and test algorithms.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units in the other embodiments described above may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only an example of the present application, and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes performed by the present application and the contents of the attached drawings, which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. The automatic algorithm testing method is characterized by being applied to dispatching center equipment, wherein the dispatching center equipment comprises a communication layer, a data layer and a service layer; the automatic algorithm testing method comprises the following steps:

responding to a test instruction of a user, and sending the received test instruction to the service layer by the communication layer;

the service layer acquires test data from the data layer based on the test instruction;

the service layer divides the test task indicated by the test instruction into at least one subtask, selects a target subtask meeting conditions from the at least one subtask, and sends the test data and the target subtask to an equipment cluster, so that target equipment in the equipment cluster tests an algorithm program indicated by the target subtask according to the test data;

the service layer receives the test result returned by the target equipment and generates a test report based on the test result;

wherein the task category of the target subtask includes at least one of a static picture stream, a dynamic picture stream, a fuzzy video stream and a precise video stream; the algorithm program of the target subtask is a face recognition algorithm;

the step of comparing the received target subtask test result with the corresponding standard result by the service layer comprises the following steps:

when the task type is the static picture stream, the service layer acquires a preset number of test results with the highest matching degree corresponding to each detection picture; if the standard result comprises any one of the test results, determining that the recognition result of the face recognition algorithm on the detected picture based on the standard result is correct;

when the task type is the dynamic picture stream, the service layer acquires all test results corresponding to the detection picture sequence, wherein the test results comprise a test result with the highest matching degree corresponding to each detection picture in the detection picture sequence; if the standard result comprises any one of the test results, determining that the recognition result of the face recognition algorithm on the detection picture sequence based on the standard result is correct;

when the task type is the fuzzy video stream, the service layer acquires all test results corresponding to the detection video, wherein the test results comprise a test result with the highest matching degree corresponding to each frame of image of the detection video; if the standard result comprises any one of the test results, determining that the recognition result of the face recognition algorithm on the detection video based on the standard result is correct;

when the task type is the accurate video stream, the service layer acquires all test results corresponding to the detection video, wherein the test results comprise a test result with the highest matching degree corresponding to each frame of image of the detection video; if the standard result comprises the test result, and the test result is matched with the standard result in attribute, determining that the recognition result of the face recognition algorithm on the detection video based on the standard result is correct; the attribute matching comprises matching of an image frame number corresponding to the test result with an image frame number corresponding to the standard result, matching of position information of the test result in an image frame with position information of the corresponding standard result, and matching of an index name of the test result with an index name of the corresponding standard result.

2. The automated algorithmic testing method of claim 1,

before the service layer acquires the test data from the data layer based on the test instruction, the method for automatically testing the algorithm further comprises the following steps:

and the data layer uniformly standardizes the labeling format of the test data through a first script plug-in unit so that the dispatching center equipment is compatible with various data set formats.

3. The algorithmic automated test method of claim 1,

the service layer divides the test task indicated by the test instruction into at least one subtask, selects a target subtask meeting conditions from the at least one subtask, and sends the test data and the target subtask to the device cluster, including:

the service layer determines a test task indicated by the test instruction according to the test instruction of the communication layer, divides the test task into a plurality of subtasks according to a plurality of subtask types, and puts the subtasks in an operable state into corresponding message queues;

the service layer sends the target subtasks in the message queue and the test data thereof to the equipment cluster;

the service layer creates corresponding message queues according to a plurality of subtask types, and the message queues are used for storing the subtasks indicated by the test instruction;

the subtask types include: preprocessing a data set, running a test algorithm program, comparing results and generating a test report; the initial state of the data set preprocessing is operable, and the initial states of the test algorithm program operation, the result comparison and the test report generation are waiting to operate;

the step of putting the subtasks in the executable state into the corresponding message queue includes:

and after the current subtask is executed, the service layer switches the state of the next subtask according to the subtask sequence into a runnable state, and puts the subtask in the runnable state into a corresponding message queue.

4. The algorithmic automated test method of claim 3,

the service layer sends the target subtask in the message queue and the test data thereof to the device cluster, and the method comprises the following steps:

and when the subtask type of the target subtask is the operation of a test algorithm program, the service layer sends the target subtask and the test data thereof in the message queue to the equipment cluster.

5. The automated algorithmic testing method of claim 1,

the communication layer sends the received test instruction to the service layer, and the method comprises the following steps:

the communication layer acquires the connection number of each service layer;

and the communication layer sends the test instruction to a service layer with the connection number smaller than a preset connection number threshold value.

6. The automated algorithmic testing method of claim 1,

generating a test report based on the test result, comprising:

the service layer compares the received target subtask test result with a corresponding standard result to generate the test report; the report configuration service of the test report is configured by the service layer through a second script plug-in to provide a plurality of customized test report styles, and the report configuration service comprises the test report styles and/or statistical indexes;

the service layer is used for configuring comparison configuration and comparison logic for comparing the target subtask test result with the corresponding standard result through the third script plug-in so as to provide multiple flexible comparison schemes.

7. The algorithmic automated test method of claim 1 or 6,

the automatic testing method for the algorithm further comprises the following steps:

and the service layer configures the test environment of the algorithm program through a fourth script plug-in so as to run the test algorithm programs of different platforms.

8. A dispatch center device, the dispatch center device comprising a memory and a processor coupled to the memory;

wherein the memory is configured to store program data and the processor is configured to execute the program data to implement the algorithmic automated test method of any of claims 1 to 7.

9. A computer-readable storage medium for storing program data for implementing an automated testing method of an algorithm according to any one of claims 1 to 7 when executed by a computer.