CN114546798A

CN114546798A - Method and device for evaluating performance of terminal equipment, electronic equipment and storage medium

Info

Publication number: CN114546798A
Application number: CN202210095692.7A
Authority: CN
Inventors: 全振宇; 韩银和
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-05-27

Abstract

The invention provides a method and a device for evaluating the performance of terminal equipment, electronic equipment and a storage medium, wherein the method comprises the following steps: receiving an instruction of a user to start an AI performance evaluation process; controlling and executing each AI performance single test, and determining the score of each AI performance single test; and determining the AI performance comprehensive score of the terminal equipment based on the scores of all the AI performance single tests, and evaluating the performance of the terminal equipment. The method evaluates the AI performance of the terminal equipment through a plurality of AI performance single tests, can test the actual performance of the equipment when processing different AI applications, and evaluates the AI performance of the terminal equipment more comprehensively.

Description

Method and device for evaluating performance of terminal equipment, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computer application, in particular to a method and a device for evaluating the performance of terminal equipment, electronic equipment and a storage medium.

Background

The conventional AI performance testing method of the terminal equipment generally only tests the average delay and the accuracy of AI application such as image classification and target identification of equipment processing, and evaluates the AI performance of the equipment based on the test result. However, the existing method can only test the performance of a small part of AI applications run by the terminal device, and the AI technology has many other application fields, so the current AI performance test method for the terminal device cannot measure the comprehensive performance of the terminal device when running other various AI applications. In addition, the AI model loading performance of the terminal device also directly affects the user experience when the user uses the AI application, but the current AI performance test method cannot measure the model loading performance level of the terminal device. In addition, when the AI performance of the terminal device is scored and evaluated, the current AI performance testing method mainly uses two data, namely average processing delay and accuracy, to score, and the calculation complexity of the AI neural network model and the parameters of the AI model are ignored in the scoring process, which results in that a user cannot accurately know the AI performance of the terminal device.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method and a device for evaluating the performance of terminal equipment, electronic equipment and a storage medium, wherein the AI performance of the terminal equipment is evaluated through a plurality of AI performance single tests, the actual performance of the equipment when processing different AI applications can be tested, and the AI performance of the terminal equipment can be evaluated more comprehensively.

In order to achieve the above object, an aspect of the present invention provides a method for evaluating performance of a terminal device, including:

receiving an instruction of a user to start an AI performance evaluation process;

controlling and executing each AI performance singles test, and determining the score of each AI performance singles test;

and determining the AI performance comprehensive score of the terminal equipment based on the scores of all the AI performance single tests, and evaluating the performance of the terminal equipment.

Optionally, the performing process of each AI performance singleton test includes:

loading a test data set for the AI performance singles test;

loading the AI neural network model corresponding to the AI performance single test into a memory from a storage system of the terminal equipment, completing the initialization process of the model, and counting the loading time of the model;

processing a plurality of test data in a test data set by using the loaded model to obtain test output result information of each test data and average processing delay data of the terminal equipment;

comparing the test output result information of each test data with the true value of the test data, and calculating the accuracy test result of the terminal equipment according to the accuracy calculation formula corresponding to the AI performance single test;

and calculating the score of the AI performance single test obtained by the terminal equipment in the AI performance single test according to the AI model loading time, the average processing delay data of the terminal equipment and the accuracy test result of the terminal equipment.

Optionally, the score calculation formula of the AI performance singles test is as follows:

S_i＝P_i+T_i*w_i

wherein S is_iA performance score, P, for the terminal device in the ith AI performance singles test_iA data processing performance score, T, for the terminal device in the ith AI performance singles test_iLoading a performance score, w, for the model of the terminal device in the ith AI performance singleton test_i(ii) the scoring proportional weight of the ith said AI performance singles test;

the calculation formula of the data processing performance score is as follows:

wherein, P_iFor the data processing performance score, MAC, of the terminal device in the ith AI performance singles test_iAccumulating the times of operation for the products contained in the AI neural network model in the ith AI performance singleton test; l is_iThe terminal device averagely processes delay data per unit data in the ith AI performance single item test;

the model loading performance score calculation formula is as follows:

wherein, T_iLoading a performance score, M, for the model of the terminal device in the ith AI performance singleton test_iThe model parameters of the AI neural network model in the ith AI performance single-item test, namely the parameter data quantity of the model weight in the AI neural network model; s_iAnd the terminal equipment completes the time spent in the loading process of the AI neural network model and the initialization process of the model in the ith AI performance single test.

Optionally, the AI performance single test includes any one of a face recognition test, a speech keyword recognition test, an image classification test, an object recognition test, a super-resolution test, a human body posture recognition test, and a semantic segmentation test.

Optionally, when the AI performance single test includes a human body posture recognition test, an executing process of the human body posture recognition AI performance single test includes:

and operating the human body posture recognition AI performance single test, recognizing key points of the human body as posture recognition characteristic points, determining the accuracy of the key points according to the similarity of the key points, and evaluating the AI human body posture recognition capability of the terminal equipment.

Optionally, the calculation formula of the keypoint similarity OKS is as follows:

where p denotes someone in the truth of the test data, pⁱAn ith keypoint representing a pth individual;

the Euclidean distance between the ith key point of the pth person and the ith key point in the output result is represented by the following calculation formula:

wherein, (x'_i,y'_i) For testing the detection result position coordinates of the ith key point in the output result information,

is the ith key point position coordinate of the p person in the truth value;

v_pivisibility of the ith keypoint representing the pth individual;

S_pa scale factor representing the p-th person is calculated as

w and h are the width and the height of the p-th personal detection frame respectively;

σ_iexpressing a key point normalization factor with id being i, and carrying out artificial annotation on all real value key points in the test data set to obtain a standard deviation with a real value;

δ (×) indicates that δ (×) is 1 if the condition is true, and δ (×) is 0 if the condition is false, and is used to determine whether a certain key point is a point already marked in the true value.

Optionally, when the AI performance single test includes a voice keyword recognition test, an execution process of the voice keyword recognition AI performance single test includes:

operating the AI performance single test, and evaluating the performance level of the terminal equipment for recognizing the voice keywords through the voice recognition accuracy, wherein the test comprises the following steps:

the calculation formula of the speech recognition accuracy rate is as follows:

wherein, Acuracy_{_0}For speech recognition accuracy, n₀Number of samples for which an error is identified, t₀The number of all test data;

and taking the voice recognition accuracy as a reference, scoring the data processing performance of the voice keyword recognition AI performance single test, and calculating the score of the terminal equipment in the voice keyword recognition AI performance single test by combining with the model loading performance score.

Optionally, when the AI performance single test includes a face recognition test, the executing process of the face recognition AI performance single test includes:

the AI performance singles test of the face recognition is operated, and the performance of the terminal equipment when the terminal equipment processes the AI face recognition task is evaluated through the face recognition accuracy, which comprises the following steps:

the calculation formula of the face recognition accuracy rate is as follows:

wherein, Acuracy_{_1}For face recognition accuracy, n₁Number of samples for which an error is identified, t₁The number of all test data;

and taking the face recognition accuracy as a reference, scoring the data processing performance of the face recognition AI performance single test, and calculating the score of the terminal equipment in the face recognition AI performance single test by combining the model loading performance score.

Optionally, when the AI performance single test includes a semantic segmentation test, the semantic segmentation AI performance single test execution process includes:

operating the semantic segmentation AI performance single test, and evaluating the performance of the terminal equipment when processing an AI semantic segmentation task through an average cross-over ratio index;

and taking the average cross-over comparison as a reference, scoring the data processing performance of the semantic segmentation AI performance single test, and calculating the score of the terminal equipment in the semantic segmentation AI performance single test by combining with the model loading performance score.

Optionally, the evaluating, by using the average cross-over ratio index, the performance of the terminal device when processing the AI semantic segmentation task includes:

loading a test data set for the semantic segmentation AI performance singles test, constructing an output matrix about the category of a true value and the category of a predicted value of the test data set, and generating a confusion matrix of the semantic segmentation AI performance singles test by comparing the value of each pixel point in the output matrix with the value of each pixel point in a true value matrix;

after the confusion matrix is obtained, IoU cross-over ratio is calculated, and IoU is calculated as:

wherein, IoU_iIoU value, V, for class i_iFor values in the ith row and ith column of the confusion matrix, R_iFor the sum of all elements in the ith row of the confusion matrix, C_iIs the sum of all elements in the ith column in the confusion matrix;

the average of the IoU values for the test data sets of all classes for the semantic segmentation AI performance singles test was taken as the MIoU average intersection ratio.

Optionally, in a case that the AI performance single test includes an image classification test, an image classification AI performance single test execution process includes:

and operating the image classification AI performance single test, and evaluating the performance of the terminal equipment when processing the AI image classification task through the image classification accuracy, wherein the performance comprises the following steps:

loading a test data set, namely a test picture set, for the image classification AI performance singles test;

after a test picture is processed, outputting a plurality of prediction categories and corresponding probabilities, respectively comparing the former categories in the probability numerical values output by each test picture with the truth values of the picture according to requirements, if one category is consistent with the truth values, considering the picture as a correctly classified result, counting the number of all correctly recognized pictures when all test data are processed, wherein the calculation formula of the image classification accuracy is as follows:

wherein p is the number of correctly identified pictures, and n is the number of all test pictures;

and with the image classification accuracy as a reference, scoring the data processing performance of the image classification AI performance single test, and calculating the score of the terminal equipment in the image classification AI performance single test by combining with the model loading performance score.

Optionally, when the AI performance single test includes a super-resolution test, the super-resolution AI performance single test execution process includes:

running the super-resolution AI performance single test, and evaluating the AI super-resolution calculation performance of the terminal equipment according to the peak signal-to-noise ratio index;

and scoring the data processing performance of the super-resolution AI performance single test by taking the peak signal-to-noise ratio as a reference, and calculating the score of the terminal equipment in the super-resolution AI performance single test by combining with the model loading performance score.

Optionally, a test data set, i.e. a test picture set, for the super-resolution AI performance singles test is loaded;

before testing, original test data are converted into low-resolution data through a down-sampling method, during testing, the low-resolution data are used for testing, after testing, a high-resolution image output by a model is compared with the original image to obtain a PSNR peak signal-to-noise ratio, and a calculation formula of the peak signal-to-noise ratio is as follows:

wherein, MAX_IThe maximum value of a single pixel point in the picture is obtained;

the MSE is a mean square error and is used for reflecting the difference degree between the estimator and the estimated quantity, and the calculation formula of the MSE is as follows:

wherein, I and K are pixel values of ith row and jth column of the original test picture and the output picture respectively, and m and n are horizontal and vertical resolutions.

Optionally, in a case that the AI performance single test includes an object identification test, the executing process of the object identification AI performance single test includes:

operating the object identification AI performance single test, and evaluating the AI object identification calculation performance of the terminal equipment through the average mAP accuracy;

and scoring the data processing performance of the object identification AI performance single test by taking the average mAP accuracy as a reference, and calculating the score of the terminal equipment in the object identification AI performance single test by combining with the model loading performance score.

Optionally, the determining an AI performance composite score of the terminal device based on the scores of all the AI performance singles includes:

and weighting and averaging the scores of the single AI performance tests, and calculating the comprehensive AI performance score of the terminal equipment.

The invention also provides a performance evaluation device of the terminal equipment, and the performance evaluation method of the terminal equipment comprises the following steps:

the receiving module is used for receiving an instruction of a user so as to start an AI performance evaluation process;

the control module is used for controlling and executing each AI performance single test and determining the score of each AI performance single test;

and the evaluation module is used for determining the AI performance comprehensive score of the terminal equipment based on the scores of all the AI performance single tests and evaluating the performance of the terminal equipment.

Another aspect of the present invention further provides a storage medium for storing a computer program for executing the performance evaluation method of the terminal device.

The invention also provides an electronic device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, and the processor implements the performance evaluation method of the terminal device when executing the computer program.

According to the scheme, the invention has the advantages that:

the performance evaluation method of the terminal equipment uses various AI performance single tests to evaluate the AI performance of the terminal equipment, determines the AI performance comprehensive score of the terminal equipment by determining the score of each AI performance single test and based on the scores of all the AI performance single tests, and evaluates the performance of the terminal equipment. The method can test the actual performance of the equipment when processing the different AI applications, and can more comprehensively evaluate the AI performance of the terminal equipment, thereby better helping a user select the terminal equipment with proper AI performance according to the self requirement.

Drawings

FIG. 1 is a schematic diagram of a system architecture of a terminal performance evaluation method;

FIG. 2 is a flow chart of a single AI performance test in an embodiment of the present application;

FIG. 3 is a block diagram of a performance evaluation apparatus of a terminal device;

fig. 4 is a block diagram of a partial structure of a terminal device provided in an embodiment of the present application;

wherein:

100-a terminal device;

400-performance evaluation device;

410-a receiving module;

420-a control module;

430-load module;

440-a calculation module;

450-a statistics module;

460-an evaluation module;

500-a terminal device;

501-IO components;

502-a processor;

503-a controller;

504-AI calculation unit;

505 — a storage system;

506-a memory;

507-memory;

508-a display;

509-a bus;

510-other external devices.

Detailed Description

In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

As described above, in the conventional AI performance testing method for a terminal device, generally, only performance levels of the device when processing partial AI applications such as image classification and target identification can be tested, and the test indexes are generally average delay data and accuracy data, and an AI performance score is used as a test conclusion to describe the AI performance level of the terminal device.

However, in the prior art, performance of a terminal device when running a few kinds of AI applications such as image classification and target identification is generally evaluated, and other mainstream AI applications are not used for AI performance testing, and in addition, AI model loading performance of the terminal device also directly affects user experience of a user when using the AI applications, but the current AI performance testing method of the terminal device cannot measure performance of the terminal device in terms of model loading performance.

In order to solve the above technical problem, an embodiment of the present invention provides a performance evaluation method for a terminal device, where the performance evaluation method includes seven different AI performance single tests, which are respectively: the method comprises the following steps of face recognition testing, voice keyword recognition testing, image classification testing, object recognition testing, super-resolution testing, human body posture recognition testing and semantic segmentation testing, wherein AI neural network models with different computational complexity are used in each single item of testing; meanwhile, the embodiment of the invention also provides an AI performance evaluation index of the corresponding terminal equipment, and the content of the index evaluation comprises the following steps: AI model loading performance and AI data processing performance; the embodiment of the invention also provides a method for calculating the comprehensive performance score of the terminal equipment, which combines the parameters of the AI model and the calculation complexity of the AI model to calculate the score of the AI performance single item test, so that the proportion of each single item test score in the total score of the comprehensive AI performance is more balanced, and the score calculation method can more accurately describe the performance of the terminal equipment in each single item test.

Next, a method for evaluating the performance of the terminal will be described, referring to fig. 1, where fig. 1 is a schematic diagram of a system architecture of the method for evaluating the performance of the terminal device according to the embodiment of the present application. The system architecture includes a terminal device 100, and an operation evaluation program may be installed on the terminal device 100, so that a user may evaluate performance of the AI application running on the terminal device 100 through the evaluation program. The user opens the evaluation program on the terminal device 100, and sends a test starting instruction to the terminal device 100, and the terminal device 100 starts the AI performance evaluation flow after receiving the evaluation instruction of the user.

In the AI performance evaluation process, the terminal device 100 sequentially executes seven AI performance single tests included in the AI performance evaluation method, in each AI performance single evaluation execution process, displays test result information generated in the evaluation process, counts performance evaluation data of each AI performance single test, calculates an AI performance score of each AI performance single test, after all seven AI performance single tests are completed, the terminal device 100 calculates a comprehensive AI performance score of the terminal device 100 according to the AI performance scores of all the AI performance single tests, and displays the performance evaluation result to a user, and the user can know performance levels of the terminal device during the operation of the seven AI performance single tests according to the performance evaluation result, thereby helping the user complete terminal device model selection according to own requirements.

A performance evaluation method of a terminal device, wherein the terminal device 100 executes the following processes, including:

s1, receiving an instruction of a user to start an AI performance evaluation process;

s2, controlling and executing each AI performance single test, and determining the score of each AI performance single test;

and S3, determining the AI performance comprehensive score of the terminal equipment based on the scores of all the AI performance single tests, and evaluating the performance of the terminal equipment.

In a specific implementation, the score of each AI performance single test may be weighted and averaged to calculate a comprehensive AI performance score of the terminal device, where the calculation formula of the comprehensive AI performance score is:

where n is the number of AI performance singles tested, S_iIs the performance score of the ith AI performance singles test.

The performance evaluation method for the terminal device provided in this embodiment evaluates the AI performance of the terminal device by using a plurality of AI performance single tests, determines the score of each AI performance single test, determines the AI performance comprehensive score of the terminal device based on the scores of all the AI performance single tests, and evaluates the performance of the terminal device. The method can test the actual performance of the equipment when processing the different AI applications, and can more comprehensively evaluate the AI performance of the terminal equipment, thereby better helping a user select the terminal equipment with proper AI performance according to the self requirement.

Referring to fig. 2, fig. 2 is a flowchart of AI performance singles tests in the embodiment of the present application, where each of the AI performance singles tests is executed by:

s201, loading a test data set for the AI performance single test;

the test data set contains test data and true values of the data, and the test data sets used by different AI performance single tests are different. The test data in the test data set may be pictures and voices, and the true value of the data may be actual information included in each test data, such as the type and position of an object in the picture, the content included in the voice data, and the like. Before each AI performance single test is started, a corresponding test data set is loaded into a memory of the terminal device, when the test is started, test data of each unit is required to be respectively input into an AI neural network model for processing, after each AI performance single test is finished, a true value of the used data is compared with output data of the AI model, and an accuracy value when the device processes the AI model is calculated according to an accuracy calculation formula of different AI performance single tests.

And S202, loading the AI neural network model corresponding to the AI performance single test into a memory from a storage system of the terminal equipment, completing the initialization process of the model, and counting the loading time of the model.

Because the time required for loading the AI neural network model is different among different devices and there is a large difference, this is mainly caused by the difference of hardware configuration and the difference of software implementation method among terminal devices. The loading time of the AI neural network model refers to the time required for the terminal device to load the network structure data, the weight data, the configuration data and other related data of the AI neural network model into the memory and complete the initialization process, the longer the loading time, the worse the user experience, and the prior art can only evaluate the AI data processing performance and the AI application accuracy performance of the device generally, without considering the loading performance of the AI neural network model. Therefore, in the embodiment, when the AI performance of the terminal device is evaluated, the loading performance of the AI neural network model is used as one of the evaluation indexes, so that the AI performance of the device is evaluated more accurately.

The AI neural network model is composed of neural network structure information and neural network weight data. Each AI neural network model consists of a plurality of layers of different neural network operators (neural network layers), which can be understood as specific algorithm functions required for performing AI calculations, such as convolution operators, full-link operators, pooling operators, and the like. The neural network structure information is used for configuring neural network operators, the number, the sequence and the configuration of the neural network operators included in an AI neural network model, the neural network weight information is parameter data corresponding to each layer of operator in the AI neural network model, and the parameter data in the weights are determined in advance through a neural network training process. The AI neural network calculation can be realized on the terminal equipment by using the neural network structure information and the neural network weight data, and the neural network weight data amount of different AI neural network models is different because the number, the calculation amount and the parameter amount of neural network operators of different AI neural network models are different. Before the test, because the AI model is stored in the memory of the terminal device, when the AI performance test is performed, the AI model data needs to be loaded into the memory to shorten the time for loading the AI model data, and meanwhile, the terminal device needs to complete the AI model initialization work, and the model initialization is to complete the configuration and preheating of the computing resources and the storage resources of the terminal device according to the network structure of the AI model to prepare for the subsequent step S203.

It should be noted that, in a specific implementation, different models and kinds of terminal devices are used in implementing S202, since there are differences in the access performance between different terminal devices, and differences in the software methods or interfaces used by them may also occur, this results in different terminal equipments needing different time when loading the same AI neural network model, the longer the time for loading the AI model is, the longer the time for the user to wait, so the time for the terminal device to load the AI model directly affects the experience of the user for using the AI application on the terminal device, in order to reflect the performance difference of loading the AI neural network model between different terminal devices in the AI performance score, the embodiment counts the delay of the terminal device in completing the S202 process, and uses this as the judgment basis, and meanwhile, calculating the performance score of the AI neural network model loaded by the terminal equipment by combining the parameter quantity of the AI neural network model.

S203, processing a plurality of test data in the test data set by using the loaded AI neural network model, and obtaining test output result information of each test data and average processing delay data of the terminal equipment;

in a specific implementation, after a test data set and an AI neural network model are prepared, an AI performance singles test can be started, test data are input into an AI neural network algorithm and calculated by combining with corresponding AI neural network weight data, and this process needs to call a processor of a terminal device to complete calculation, in this embodiment, timing is started when first test data are processed, and timing is completed when an output result of last test data is output, so that total time for hardware to process all test data is obtained, an average processing delay of the terminal device in the current AI performance singles test is calculated and recorded according to the number of the test data, and a calculation formula of the average processing delay refers to formula (2):

in the formula (2), L_avgIs the average processing delay, n is the total number of test data, l_iIs the result of the delay in processing the ith test data by the terminal device.

In the testing process, output result data obtained after the terminal device processes each testing data is recorded respectively, namely a prediction result obtained after the AI neural network model calculates each unit data, the prediction result data obtained by different AI performance single tests are different, the prediction result data can be the category, the position and the like of an object in a picture, and can also be information contained in voice data.

And S204, comparing the test output result information of each test data with the true value of the test data, and calculating the accuracy test result of the terminal equipment according to the accuracy calculation formula corresponding to the AI performance single test.

In the specific implementation, because the test contents and the application fields of different AI performance single tests are not very same, the formulas for calculating the accuracy rates of the different AI performance single tests are different.

S205, calculating the score of the AI performance single item test obtained by the terminal equipment in the AI performance single item test according to the AI model loading time, the average processing delay data of the terminal equipment and the accuracy test result of the terminal equipment.

In a specific implementation, the score calculation formula for the AI performance singles test is shown in formula (3).

S_i＝P_i+T_i*w_i (3)

S in formula (3)_iPerformance score, P, for the terminal device at the ith AI performance singles test_iData processing performance score, T, for a terminal device in the ith AI performance singles test_iLoad the performance score, w, for the model of the terminal device in the ith AI performance singleton test_iThe score proportional weight of the ith AI performance singles test.

The formula for calculating the data processing performance score is shown in formula (4).

In the formula (4), P_iData processing performance score, MAC, for a terminal device in the ith AI performance singles test_iFor the number of Multiply-Accumulate (MAC) operations included in the AI neural network model in the ith AI performance singles test, the calculation complexity of the current mainstream AI neural network model generally depends on the number of Multiply-Accumulate operations, and the greater the number of MAC operations, the more complicated the calculation degree of the AI neural network model, and the longer the time required to complete the AI neural network model. L is_iIs the delay data per unit data that the terminal device processes on average in the ith AI performance singles test. Equation (4) reflects the time that the terminal device needs to process each unit of MAC operation on average, and the higher the computing power of the terminal device, the lower the time consumption, the higher the score of the data processing performance score.

The model load performance score calculation formula is seen in formula (5).

In the formula (5), T_iLoading Performance scores, M, for the model of the terminal device in the ith AI Performance Individual test_iThe model parameters of the AI neural network model in the ith AI performance single test are the model parameters of the AI neural network model, that is, the parameter data quantity of the model weight in the AI neural network model, and the unit is MB. s_iThe time spent by the terminal device to complete the loading of the AI neural network model and the initialization process of the model in the ith AI performance single test is shown. The formula (4) can reflect the time spent by the terminal device to averagely load parameter data per MB unit, and the stronger the performance of the terminal device to load the AI model, the shorter the time spent, the higher the calculated score.

The AI performance single item test score is used for reflecting the comprehensive performance expression of the terminal equipment when the terminal equipment actually completes the AI task, so that the user can intuitively know the AI performance levels of different terminal equipment. Since the data units of the data processing performance score and the model loading performance score are different, and the two-part score reflects the performance levels of the terminal device in two different dimensions, the total AI performance score cannot be obtained by simply adding the two-part scores, and the two-part performance score can be combined into the same dimension by multiplying a score proportion weight on the basis of one of the performance scores, and the comprehensive AI performance of the terminal device can be evaluated by using the data. The scoring weight is mainly used for dividing the proportion of the two scores in the total score, the scoring weight comprehensively considers the scale of the score values of the two parts of the mainstream terminal equipment, and also considers the influence degree of an AI model loading process and an AI calculation process on user experience.

TABLE 1 scoring proportional weights for each AI Performance Individual test in the present application

Serial number	AI Performance Individual test	Scoring proportional weights
			1	Face recognition testing	13.8
2	Speech keyword recognition test	0.46
			3	Image classification testing	0.56
4	Object identification testing	0.90
			5	Super-resolution test	25.86
6	Human body posture recognition test	1.38
			7	Semantic segmentation testing	3.57

Seven kinds of AI performance single tests used in the evaluation method will be described below.

When the AI performance single test includes an object recognition test, an object recognition AI performance single test execution process includes:

running the AI performance single test, and evaluating the AI object identification calculation performance of the terminal equipment through the average mAP accuracy; and scoring the data processing performance of the object identification AI performance single test by taking the average mAP accuracy as a reference, and calculating the score of the terminal equipment in the object identification AI performance single test by combining with the model loading performance score. The method specifically comprises the following steps:

and (3) operating an object recognition AI performance single test, displaying the output result of each unit of data to the user by the terminal equipment, recording the test output result, and calculating the performance score of the terminal equipment in the test after the test is finished. The object recognition technology can help the terminal device to find specific objects (such as people, animals, plants and the like) in the image, and also can provide reference data for the camera to automatically adjust the photographing parameters, so that the terminal device can obtain better photographing experience. The aim of the test is to evaluate the AI object recognition performance of the terminal equipment, and the AI neural network model used in the test is as follows: and the MobileNet SSD can identify 90 types of objects in the COCO data set, find out all target objects in the image, determine the types and the positions of the target objects and output identification accuracy rate information. The test uses the mAP (mean Average precision), namely the Average AP value, to evaluate the accuracy level of the terminal device when processing the model, scores the data processing performance of the terminal device when executing the AI object recognition task by taking the data as the reference, and calculates the score of the terminal device in the AI performance single test by combining the model loading performance score.

Next, a specific calculation method of the accuracy of the ap will be described, and first, indexes involved in the calculation of the ap are described: IoU (Intersection-over-Union), which is a standard for measuring the accuracy of detecting corresponding objects in a specific data set, IoU represents the overlapping ratio of candidate boxes (candidate boxes) output by the AI model and labeled boxes (ground route boxes) in the true value, that is, the ratio of their intersections to the Union, wherein the higher the correlation is, the higher the value is, and the most ideal case is that IoU has a value of 1 when they are completely overlapped; true sites (TP): the number of instances that are correctly classified as positive, i.e., the number of instances that are actually positive and classified as positive by the classifier (sample number); false Positives (FP): the number of instances that are wrongly divided into positive instances, i.e., the number of instances that are actually negative instances but are divided into positive instances by the classifier; false Negatives (FN): the number of instances that are wrongly divided into negative cases, i.e., the number of instances that are actually positive cases but are divided into negative cases by the classifier; after a picture is processed, the target detection algorithm outputs confidence levels (confidence scores) of all supported sample types, wherein the confidence levels are used for describing the probability that the sample is a positive sample, for example, the probability of 99% is regarded as that the sample A is a positive sample, the probability of 1% is regarded as that the sample B is a positive sample, the detection output result is divided by selecting a proper threshold, for example, 50%, the probability of more than 50% is regarded as a positive sample, and the probability of less than 50% is regarded as a negative sample, so that a group of positive samples under the threshold of 50% can be obtained, then on the basis of the group of positive samples, a threshold of IoU is set, the value of the threshold is 0.5, the positive samples above the threshold are regarded as TPs, and the other positive samples are regarded as FPs, and then the number of the real positive samples in the test sample is subtracted from the TPs, so that the FNs are obtained.

Precision, namely accuracy, the calculation method comprises the following steps: predicting the actual number of positive samples/all positive samples in the sample, i.e. Precision TP/(TP + FP); the recalling rate immediately by Recall is calculated as follows: predicting the actual positive sample number/predicted sample number in the sample, namely Recall is TP/(TP + FN), and generally, the higher the Recall rate is, the lower the Precision rate is; AP is Average Precision; the mAP is an average AP value obtained by a plurality of verification sets and is used as an index for measuring detection precision in the single test. The P-R curve is a two-dimensional curve taking Precision and Recall as vertical and horizontal axis coordinates, and is drawn by selecting corresponding Precision and Recall rate when different confidence coefficient thresholds are selected, because the higher the Precision is, the lower the Recall is, when the Recall reaches 1, a positive sample with the lowest probability score is corresponded, and at this time, the lowest Precision value is obtained by dividing the number of the positive samples by the number of all samples which are larger than or equal to the threshold. The area enclosed by the P-R curve is the AP value, generally speaking, the better classifier is, the higher the AP value is, in the target detection, each class can draw the P-R curve according to Precision and Recall, the AP is the area under the curve, and the mAP is the average value of all classes of APs.

Secondly, under the condition that the AI performance single test comprises a human body posture recognition test, the executing process of the human body posture recognition AI performance single test comprises the following steps:

and operating the human body posture recognition AI performance single test, recognizing key points of the human body as posture recognition characteristic points, determining the accuracy of the key points according to the similarity of the key points, and evaluating the AI human body posture recognition capability of the terminal equipment. The method specifically comprises the following steps:

and operating the AI performance single test for human body posture recognition, displaying the output result of each unit data to the user by the terminal equipment, recording the test output result, and calculating the performance score of the terminal equipment in the test after the test is finished. The human body posture recognition technology finds the posture of a person in an image or a video by recognizing the position of a key part of a human body, and can help a terminal device to quickly recognize relevant information of the human body in VR (virtual reality) or AR (augmented reality) or augmented reality application. The AI human posture recognition capability of the terminal equipment is mainly evaluated through the single test, 50 pieces of test data are used in the test from a COCO Keypoints data set, the used PoseNet model can recognize 17 key parts of a human body, including a nose, eyes, ears, shoulders, elbows, legs and the like, and after calculation is finished, the number, the position and the accuracy of the corresponding part of the human body are output. The test mainly combines the index of key point similarity OKS (object key similarity), calculates an mAP (mean Average precision), evaluates the accuracy performance of the terminal equipment when processing the model by using the mAP as a numerical value, scores the data processing performance of the terminal equipment when executing AI human posture recognition by taking the numerical value as a reference, and calculates the score of the terminal equipment in the AI performance single item test by combining the model loading performance score. Next, a method of calculating the keypoint similarity (OKS) will be described.

The key point similarity OKS is used for describing the accuracy of the key points obtained in the output result in the human body posture recognition task, and the calculation formula is shown in formula (6).

Where p denotes someone in the truth, pⁱRepresenting the ith keypoint of the pth person.

The euclidean distance between the ith keypoint of the pth person in the truth and the ith keypoint in the output result is represented.

The formula for calculating the euclidean distance is shown in formula (7).

(x 'in equation (7)'_i,y'_i) For the detection result position coordinates of the ith key point in the output result,

is the ith keypoint location coordinate of the p-th person in the truth.

v_piRepresents the visibility of the ith key point of the p-th person, and the visibility is 1, i.e. the key point is not blocked and is marked.

S_pA scale factor representing the p-th person, whose value is the square root of the area of the p-th person detection box, and whose calculation formula is

w, h are the width and height of the p-th person detection box, respectively.

σ_iAnd expressing a key point normalization factor with id being i, wherein the factor is the standard deviation between the true value and the artificial annotation of the true value of the key point in all the sample sets, and the larger the sigma is, the more difficult the key point of the type is to be annotated.

OKS is used for judging the similarity degree of the joint points of a certain identified person, if M targets (the target is a person) originally exist in a test picture, the model outputs the test results of N targets, each true value in the M targets and the joint points of N results output by the model are subjected to similarity calculation, finally, a similarity value matrix with M rows and N columns is obtained, the position (i, j) in the matrix represents the similarity degree of the ith person in the true value and the OKS of the jth person predicted by the algorithm, and the maximum value of each row in the matrix is found and serves as the OKS similarity value corresponding to the ith person.

According to the aforementioned OKS similarity value matrix, knowing OKS scores of all targets (targets appearing in the true value) of a certain image, a plurality of images are collected in the test set, each image has a plurality of targets, and at this time, an AP (average precision) is used as an index to measure the accuracy performance of the AI model in all test pictures, a t is given when calculating the AP, if the current OKS is greater than the t, it indicates that the current skeletal point of the person is successfully detected, and the detection is right, and if the current OKS is less than the t, it indicates that the detection is failed or false detection is missed, so for all OKS, the number greater than the t is counted, and the ratio of the number of OKS to all OKS is calculated, for example, a total of 100 OKS calculation results, wherein the total of 30 are greater than the threshold t, and the AP value is 30/100 ═ 0.3. mAP (mean Average precision), namely the Average value of the AP, the specific calculation method is to give different threshold values t, calculate the corresponding AP under different threshold values, and then obtain the Average value of all AP values.

And thirdly, under the condition that the AI performance single test comprises a voice keyword recognition test, the executing process of the voice keyword recognition AI performance single test comprises the following steps:

and operating the voice keyword recognition AI performance single test, evaluating the performance level of the terminal equipment for recognizing the voice keywords through the voice recognition accuracy, scoring the data processing performance of the voice keyword recognition AI performance single test by taking the voice recognition accuracy as a reference, and calculating the score of the terminal equipment in the voice keyword recognition AI performance single test by combining with the model loading performance score. The method specifically comprises the following steps:

and (3) operating the AI performance single test of voice keyword recognition, displaying the output result of each unit data to the user by the terminal equipment, recording the test output result, and calculating the performance score of the terminal equipment in the test after the test is finished. At present, many terminal devices are provided with an intelligent voice assistant, however, the intelligent voice assistant on the terminal device usually needs to be activated and started after a user speaks a corresponding keyword, and the recognition function of such a voice keyword usually needs to be realized by an AI technology. The test is mainly used for evaluating the performance level of terminal equipment for recognizing the speech keywords, the AI model used in the test can recognize 10 basic speeches with different words, such as 'yes', 'no', 'up', 'down', 'left', 'right', and the like, the input test data is an audio frequency segment with the duration of 1s, the AI model can analyze audio frequency information and output words in the audio frequency, after the test is completed, the program can calculate the speech recognition accuracy of the test, and the calculation formula of the speech recognition accuracy refers to the formula (8).

Acuracy in formula (8) is the speech recognition accuracy, n is the number of recognized erroneous samples, and t is the number of all test data. The method takes the accuracy data as reference to score the data processing performance of the terminal equipment when the terminal equipment executes the AI voice recognition task, and calculates the score of the terminal equipment in the AI performance singles test by combining the model loading performance score.

And fourthly, under the condition that the AI performance single test comprises a face recognition test, the executing process of the face recognition AI performance single test comprises the following steps:

running the face recognition AI performance single test, evaluating the performance of the terminal equipment when processing an AI face recognition task through the face recognition accuracy,

and taking the face recognition accuracy as a reference, scoring the data processing performance of the face recognition AI performance single test, and calculating the score of the terminal equipment in the face recognition AI performance single test by combining the model loading performance score. The method comprises the following specific steps:

and (3) operating a face recognition AI performance single test, displaying the output result of each unit of data to the user by the terminal equipment, recording the test output result, and calculating the performance score of the terminal equipment in the test after the test is finished. AI face recognition is a biometric application that extracts and recognizes feature information of a face through an AI algorithm, and this technology has become an unlocking method for a large number of terminal devices. The test can evaluate the performance of terminal equipment when processing an AI face recognition task, the test is based on an ssd _ mobilent model, the model can recognize and classify the faces of 62 celebrities in an LFW data set, and after the process is finished, the AI model can output face positions, celebrity names and recognition accuracy information. After the test is finished, the program calculates the face recognition accuracy of the test, and the calculation formula of the face recognition accuracy refers to the formula (9).

Acuracy in the formula (9) is the face recognition accuracy, n is the number of recognized error samples, and t is the number of all test data. The method takes the accuracy data as reference, scores the data processing performance of the terminal equipment when executing the AI face recognition task, and calculates the score of the terminal equipment in the AI performance single test by combining the model loading performance score.

And under the condition that the AI performance single item test comprises a semantic segmentation test, performing the semantic segmentation AI performance single item test by:

and taking the average cross-over comparison as a reference, scoring the data processing performance of the semantic segmentation AI performance single test, and calculating the score of the terminal equipment in the semantic segmentation AI performance single test by combining with the model loading performance score. The method specifically comprises the following steps:

and (3) operating a semantic segmentation AI performance single test, displaying the output result of each unit of data to the user by the terminal equipment, recording the test output result, and calculating the performance score of the terminal equipment in the test after the test is finished. The voice segmentation technology can map each pixel in the image to a category, such as "person", "object" or "background", and can help the terminal device to identify a foreground or background area in the picture in the VR AR application so as to segment and block part of the content in the image. The testing is realized based on a DeepLab model, the DeepLab can identify 21 types of information including 'background' in the VOC data set, and the model can output the type information to which each pixel point in the original picture belongs after the calculation is completed. After the test is completed, the program will use MIoU (Mean-over-unity), i.e. average Intersection, to evaluate the accuracy performance of the terminal device in processing the model, and with this data as a reference, score the data processing performance of the terminal device in executing the AI semantic segmentation task, and calculate the score of the terminal device in the AI performance single test by combining the model loading performance score. Next, a specific calculation method of MIoU will be described.

According to the semantic segmentation algorithm used in the test, the resolution of input data is 257x257, 21 different classes can be detected, the output data is matrix data with the size of 257x257, and the content of each matrix unit is the class to which the pixel belongs. The MIoU is an index for evaluating the accuracy of a semantic segmentation algorithm, the calculation of the MIoU needs to be completed by means of a confusion matrix, after a picture is processed, a 21x21 matrix is created by the method, the ordinate of the matrix represents the category contained in a true value, the abscissa represents the predicted category, the confusion matrix of the AI performance singles test is calculated in a statistical mode by comparing the value of each pixel point in an output matrix with the value of each pixel point in the true value matrix, for example, the true value corresponding to a certain pixel point is the 4 th category, and the output result pixel point is classified into the 3 rd category, then the numerical value of the 3 rd column of the 4 th row in the confusion matrix is added with 1, so the numerical value in the diagonal line element of the confusion matrix represents the number of the predicted pairs in the certain category. When the confusion matrix is obtained by calculation, IoU (interaction-over-Union) and the Union ratio are calculated, and the calculation formula IoU is shown in formula (10).

IoU in equation (10)_iIoU value, V, for class i_iFor values in the ith row and ith column of the confusion matrix, R_iFor the sum of all elements of the ith row in the confusion matrix, C_iIs the ith column of the confusion matrixWith elemental sums, in the test method, there are a total of 21 types of data, and the average of the 21 types of IoU values is MIoU. The method takes the MIoU accuracy rate data as reference to score the data processing performance of the terminal equipment when executing the AI semantic segmentation task.

When the AI performance single test comprises an image classification test, the image classification AI performance single test execution process comprises the following steps:

running the image classification AI performance single test, and evaluating the performance of the terminal equipment when processing an AI image classification task through the image classification accuracy;

and with the image classification accuracy as a reference, scoring the data processing performance of the image classification AI performance single test, and calculating the score of the terminal equipment in the image classification AI performance single test by combining with the model loading performance score. The method specifically comprises the following steps:

and (4) operating an image classification AI performance single test, displaying the output result of each unit data to the user by the terminal equipment, recording the test output result, and calculating the performance score of the terminal equipment in the test after the test is finished. The image classification technology can help the terminal equipment to find people, objects or places in the images, so that the terminal equipment can carry out targeted optimization on the images. The performance of the main test terminal equipment in executing an AI image classification task is tested, the MobileNet V2 used in the test can classify 1000 types of pictures from an ImageNet data set, 50 pictures from ImageNet are used as test data in the test, and after the model calculates the test data, the model outputs classification information of the image content. The test uses TOP5 accuracy to assess the level of accuracy with which the terminal device processes the model, and with this data as a reference, scores the performance of the data processing when it performs the AI image classification task. Next, a method for calculating TOP5 accuracy will be described, where an image classification algorithm outputs a plurality of prediction classes and corresponding probabilities after processing a picture, when calculating TOP5 accuracy, the first five classes of probability values output by each test picture are compared with the true value of the picture, if one of the classes is consistent with the true value, the picture is considered to be a correctly classified result, when processing all test data, the number of all correctly identified pictures is counted, and a formula for calculating TOP5 accuracy is shown in formula (11).

In equation (11), p is the number of correctly recognized pictures, and n is the number of all test pictures. The method takes the accuracy data as reference, scores the data processing performance of the terminal equipment when executing the image classification task A, and calculates the score of the terminal equipment in the AI performance single test by combining the model loading performance score.

And when the single AI performance test comprises a super-resolution test, executing the super-resolution single AI performance test, wherein the executing process comprises the following steps:

and scoring the data processing performance of the super-resolution AI performance single test by taking the peak signal-to-noise ratio as a reference, and calculating the score of the terminal equipment in the super-resolution AI performance single test by combining with the model loading performance score. The method specifically comprises the following steps:

and (4) operating a super-resolution AI performance single test, displaying the output result of each unit of data to the user by the terminal equipment, recording the test output result, and calculating the performance score of the terminal equipment in the test after the test is finished. The super-resolution technology can enable the terminal equipment to reconstruct the low-resolution image into a corresponding high-resolution image, and effectively enhance the image quality of the photo. The test mainly evaluates the AI super-resolution computing performance of the terminal equipment, the AI neural network model used in the test is ESRGAN, details in the image are generated through the model, and a super-resolution image with 4 times of high pixels can be generated based on the low-resolution image. The 20 pieces of test data in the test come from a DIV2K data set, original data are converted into low-resolution data through a down-sampling method before the test, the low-resolution data are used for testing during the test, a high-resolution image output by a model is compared with the original image after the test, PSNR (Peak Signal to Noise ratio), namely peak Signal-to-Noise ratio, is obtained through calculation, the data is used as a reference, the data processing performance of the terminal equipment during the execution of an AI super-resolution task is scored, and the score of the terminal equipment in the AI performance single test is calculated by combining with model loading performance scores.

Equation (12) is a calculation equation of PSNR.

MAX in equation (12)_IThe maximum value of a single pixel point in the picture. MSE is a mean square error that reflects the degree of difference between the estimator and the estimated value, and equation (13) is a calculation equation of MSE.

In the formula (13), I and K are pixel values of the original picture and the output picture in the ith row and the jth column, respectively, and m and n are horizontal and vertical resolutions.

After calculating the PSNR of the RGB three-color pixels in the image, they are added together to take an average value, and if the average value is more than 40, the accuracy is 1. If less than 40, the accuracy is PSNR/40.

In summary, the performance evaluation method for the terminal device provided in the embodiment of the present invention includes seven different AI performance single tests, which are respectively: the method comprises the steps of face recognition testing, voice keyword recognition testing, image classification testing, object recognition testing, super-resolution testing, human body posture recognition testing and semantic segmentation testing, wherein each single item of testing uses AI neural network models with different computational complexity and different testing data sets, and finally, the self performance evaluation method carries out comprehensive evaluation on the AI performance of the terminal equipment by combining the seven AI performance single item testing results. The method can test the actual performance of the terminal equipment when processing the different AI applications, and more comprehensively evaluate the AI performance of the terminal equipment, thereby better helping a user select the terminal equipment with proper AI performance according to the self requirement.

Meanwhile, the embodiment of the invention provides an AI performance evaluation index facing to the terminal equipment, and the content of the index evaluation comprises the following steps: AI model loading performance and AI data processing performance. The AI model loading performance mainly performs comprehensive evaluation on the performance of the terminal equipment when the AI model is loaded, and the AI data processing performance mainly performs evaluation on the performance of the terminal equipment when the AI neural network model is used for processing test data. In calculating the score, the AI model loading performance score and the AI data processing performance score are combined to calculate the score of the single test. The performance evaluation index can evaluate the AI performance of the terminal equipment from two aspects of the loading performance of the AI model and the processing performance of AI data, the content of index evaluation covers the whole process of AI application operation, and the performance level of the terminal equipment in the whole process of AI application processing can be reflected more comprehensively.

Meanwhile, the comprehensive performance score of the invention is an average value of seven AI performance single test scores, each AI performance single test score mainly comprises an AI model loading performance score and an AI data processing performance score, the AI model loading performance score is calculated by mainly using AI model loading delay data and combining with the parameter amount of the AI model, the AI data processing performance score is calculated by mainly using AI model calculating delay data and AI model accuracy data and combining with the MAC operation amount of the AI model, and when the AI performance single test score is calculated, a score proportion weight is respectively designed for each single test to describe the influence degree of the AI model loading performance and the AI data processing performance on the whole AI performance of the terminal equipment. The method for calculating the performance evaluation score can describe the AI performance of the terminal equipment from two aspects of the loading performance of the AI model and the processing performance of the AI data, and the method combines the parameters of the AI model and the AI calculation complexity to calculate the AI performance single item test score, so that the proportion of each single item test score in the total score of the comprehensive AI performance is more balanced, and the score calculation method can more accurately describe the performance of the terminal equipment in each single item test.

Compared with the prior art, the evaluation method has more and richer test contents, and uses various AI performance single tests to evaluate the AI performance of the terminal equipment, so that the actual performance of the equipment in processing different AI applications can be tested, and the AI performance of the terminal equipment can be evaluated more comprehensively, thereby better helping a user to select the terminal equipment with proper AI performance according to the self requirement. The performance evaluation index provided by the invention can evaluate the AI performance of the terminal equipment from two aspects of the loading performance of the AI model and the processing performance of the AI data, the content of the index evaluation covers the whole operation process of the AI application, and the performance level of the equipment in the whole process of processing the AI application can be reflected more comprehensively. The performance evaluation score calculation method provided by the invention can describe the AI performance of the terminal equipment from two aspects of the loading performance of the AI model and the processing performance of the AI data, and the method combines the parameters of the AI model and the AI calculation complexity to calculate the AI performance single item test score, so that the proportion of each single item test score in the total AI performance score is more balanced, and the score calculation method can more accurately describe the performance of the terminal equipment in each single item test.

Based on the foregoing method embodiment, an embodiment of the present invention provides a performance evaluation apparatus 400 for a terminal device, as shown in fig. 3, the apparatus may include:

a receiving module 410, configured to receive an instruction of a user to start an AI performance evaluation process;

the control module 420 is configured to control and execute each AI performance unit test and determine a score for each AI performance unit test. Specifically, for different AI performance single item test contents, loading of an AI neural network model is controlled to be completed, calculation of the AI neural network model is controlled to be completed, calculation results of the single item tests are counted, and after all the AI performance single item tests are completed, test output results of all the AI performance single item tests and scores of the AI performance single item tests are counted and summarized.

The loading module 430 is configured to prepare test data required by each AI performance singleton test, including loading a test data set and an AI neural network model required by each test, and completing an initialization process of the AI neural network model;

the calculation module 440 is configured to complete an AI calculation process for each AI performance singleton test, including inputting each unit of test data into the AI neural network model, completing a calculation process of a corresponding AI algorithm for different AI test contents, and outputting a calculation result for each unit of data;

the statistical module 450 is configured to count the test data of each AI performance singles test, including counting the time required for loading the AI neural network model, counting the time required for processing all the test data, counting the AI calculation result of each unit data, and calculating an AI performance singles test score according to the counted contents;

and an evaluation module 460, configured to determine an AI performance comprehensive score by averaging based on the score of each AI performance singles, and show all test results to the user, including the test output results of all AI performance singles, the AI performance singles, and the AI performance comprehensive score.

The present application also provides a computer-readable non-volatile storage medium, where the computer-readable storage medium is used to store codes and data of a program, and the codes and data are used to implement the AI performance evaluation method described in the foregoing embodiments.

In order to implement the embodiment of the present invention, the present application further provides a computer program product or a computer program, where the computer program product or the computer program may include one or more computer instructions, the computer instructions are stored in a readable non-volatile storage medium, and the computer reads the computer instructions from the non-volatile storage medium and processes the computer instructions through a processor to implement the AI performance evaluation method according to the embodiment of the present application. The code of the computer program may be written in one or more computer languages, which may be object-oriented programming languages, such as Java, C + +, etc., or process-oriented programming languages, such as C.

An embodiment of the present invention further provides a terminal device, configured to implement the AI performance evaluation method, and fig. 4 is a block diagram illustrating a partial structure of the terminal device according to the embodiment of the present application.

Referring to fig. 4, the terminal device 500 includes the following components: the AI performance evaluation method comprises an IO component 501, a processor 502, a storage system 505, a display 508 and a bus 509, wherein the IO component 501 is used for connecting and communicating with external equipment, the storage system 505 is used for storing relevant data of a computer program of the application example and executing each module, the bus 509 is used for connecting all components and realizing communication among the components, the display 508 is used for displaying a test result to a user, the processor 502 is used for controlling and processing the computer program of the application example, and when the processor executes the computer program of the patent, the AI performance evaluation method is realized.

The storage system 505 is used for storing relevant data and various modules of the computer program of the embodiment of the present application, and is composed of a storage 506 and a memory 507, the storage 506 may store and read and write relevant files such as data and application programs required for running the computer program indefinitely, may include a nonvolatile memory, such as an SSD hard disk, an HDD hard disk, a flash memory storage device, and may further include an optical storage device and a magnetic disk storage device. The memory 507 is a short-term memory, which has a faster read/write speed than the memory 506, and is used for quickly acquiring data, programs and related files required by the programs during the execution of the computer programs, the data in the memory 507 will not be retained when the terminal device is turned off or powered off, and the memory 507 may include a volatile memory or a dynamic random access memory.

The IO unit 501 is used to implement connection and communication between the terminal device 500 and other external devices 510, and the external devices 510 may be a keyboard, a mouse, a pointing device, or the like, or may be a device capable of communicating with the terminal device 500.

The display 508 is configured to show the intermediate result generated during the AI performance evaluation program test to the user, and after the AI performance test is finished, is configured to show all the test results to the user, including the test output results of all the AI performance single tests, the scores of the AI performance single tests, and the overall scores of the AI performance.

The bus 509 is used to connect all the components in the terminal device and to realize a communication function between the components, and the bus 509 may be one or more structures such as a memory bus, an IO bus, and a processor bus.

The processor 502 is configured to process the computer program in the present embodiment, and includes a controller 503 and an AI calculation unit 504, where the controller 503 is mainly configured to control all components of the terminal device, and may include a control storage system 505 to implement loading of the computer program and related data, control communication between the IO component 501 and other external devices, control the AI calculation unit 504 to complete calculation of an AI algorithm, and control the display 508 to show an output result of the test program. The AI computing unit 504 is used for completing and accelerating the computing process of the AI algorithm in the computer program, and may be a central Processing unit (cpu), a graphics Processing unit (gpu), a Field Programmable Gate Array (FPGA), a neural network Processing unit (npu), a machine Learning Processing unit (mlu), an application Specific Integrated circuit (asic), a digital Signal processor (dsp), or other Programmable computing device, and may also be a combination device formed by one or more of the above devices. The processor 502 executes all AI performance singles tests, data processing, and result display in the program by running a computer program in the storage system 505, thereby implementing the AI performance evaluation method described herein.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A performance evaluation method of terminal equipment is characterized by comprising the following steps:

2. The method of claim 1, wherein the performing of each of the AI performance singles comprises:

loading a test data set for the AI performance singles test;

3. The method of claim 2,

the score calculation formula of the AI performance single item test is as follows:

S_i＝P_i+T_i*w_i

wherein S is_iThe performance score, P, of the terminal device in the ith AI performance single test_iA data processing performance score, T, for the terminal device in the ith AI performance singles test_iLoading a performance score, w, for the model of the terminal device in the ith AI performance singleton test_i(ii) the scoring proportional weight of the ith said AI performance singles test;

the calculation formula of the data processing performance score is as follows:

wherein, P_iFor the data processing performance score, MAC, of the terminal device in the ith AI performance singles test_iAccumulating the operation times for the products contained in the AI neural network model in the ith AI performance single test; l is_iThe terminal device averagely processes delay data per unit data in the ith AI performance single item test;

the model loading performance score calculation formula is as follows:

wherein, T_iLoading a performance score, M, for the model of the terminal device in the ith AI performance singleton test_iThe model parameters of the AI neural network model in the ith AI performance single test, namely the parameter data quantity of the model weight in the AI neural network model; s is_iAnd the time spent by the terminal equipment in completing the loading of the AI neural network model and the initialization process of the model in the ith AI performance single test is saved.

4. The method of claim 3,

the AI performance single test comprises any one of a face recognition test, a voice keyword recognition test, an image classification test, an object recognition test, a super-resolution test, a human body posture recognition test and a semantic segmentation test.

5. The method of claim 4,

under the condition that the AI performance single test comprises a human body posture identification test, the executing process of the human body posture identification AI performance single test comprises the following steps:

6. The method of claim 5,

the calculation formula of the key point similarity OKS is as follows:

wherein (x'_i,y'_i) For testing the detection result position coordinates of the ith key point in the output result information,

is the ith key point position coordinate of the p person in the truth value;

v_pivisibility of the ith key point representing the pth person;

S_pa scale factor representing the p-th person is calculated as

δ (·) indicates that if the condition is true, δ (·) is 1, otherwise δ (·) is 0, and is used to determine whether a certain key point is a point already labeled in the truth value.

7. The method of claim 4,

under the condition that the AI performance single test comprises a voice keyword recognition test, the executing process of the voice keyword recognition AI performance single test comprises the following steps:

the calculation formula of the speech recognition accuracy rate is as follows:

wherein, Acuraccy _₀For speech recognition accuracy, n₀Number of samples for which an error is identified, t₀The number of all test data;

and taking the voice recognition accuracy as a reference, scoring the data processing performance of the AI performance single test for voice keyword recognition, and calculating the score of the terminal equipment in the AI performance single test for voice keyword recognition by combining with the model loading performance score.

8. The method of claim 4,

under the condition that the AI performance single test comprises a face recognition test, the executing process of the face recognition AI performance single test comprises the following steps:

the calculation formula of the face recognition accuracy rate is as follows:

wherein, Acuraccy _₁For face recognition accuracy, n₁Number of samples for which an error is identified, t₁The number of all test data;

9. The method of claim 4,

in the case that the AI performance singles test comprises a semantic segmentation test, the semantic segmentation AI performance singles test execution process comprises:

10. The method according to claim 9, wherein the evaluating the performance of the terminal device in processing the AI semantic segmentation task through the average cross-over ratio index comprises:

after the confusion matrix is obtained, IoU cross-over ratio is calculated, IoU is calculated as:

wherein, IoU_iIoU value, V, for class i_iFor values in the ith row and ith column of the confusion matrix, R_iFor the sum of all elements of the ith row in the confusion matrix, C_iIs the sum of all elements in the ith column in the confusion matrix;

11. The method of claim 4,

in the case where the AI performance singles test comprises an image classification test, the image classification AI performance singles test execution process comprises:

12. The method of claim 4, wherein in the case that the AI performance singles test comprises a super-resolution test, a super-resolution AI performance singles test execution procedure comprises:

13. The method of claim 12,

loading a test data set, namely a test picture set, for the super-resolution AI performance singles test;

14. The method of claim 4,

in the case where the AI performance singles test includes an object identification test, the object identification AI performance singles test execution process includes:

15. The method according to any of claims 1-14, wherein said determining an AI performance composite score for the terminal device based on the scores of all of the AI performance singles comprises:

16. A performance evaluation apparatus of a terminal device, characterized in that the performance evaluation method of the terminal device according to any one of claims 1 to 15 is adopted, and the method comprises:

17. A storage medium characterized by storing a computer program for executing a method of evaluating performance of a terminal device according to any one of claims 1-15.

18. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for evaluating the performance of a terminal device according to any one of claims 1 to 15 when executing the computer program.