CN115271085A

CN115271085A - Machine learning model verification method and device

Info

Publication number: CN115271085A
Application number: CN202110478436.1A
Authority: CN
Inventors: 崔思静; 庞涛; 张笛; 沙通; 潘碧莹
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2022-11-01

Abstract

The disclosure relates to a method and a device for checking a machine learning model, and relates to the technical field of computers. The checking method comprises the following steps: processing each test set data by using an original machine learning model, outputting a reference tensor corresponding to each test set data, processing each test set data by using a tested machine learning model, outputting a test tensor corresponding to each test set data, and compressing the original machine learning model by using the tested machine learning model to obtain the test tensor; calculating the difference between the corresponding reference tensor and the corresponding test tensor according to each test set data; and according to the difference, checking the similarity degree of the tested machine learning model and the original machine learning model.

Description

Machine learning model verification method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for verifying a machine learning model, a method and an apparatus for testing computational power of a terminal, and a device and a non-volatile computer-readable storage medium for testing computational power of a terminal.

Background

AI (Artificial Intelligence) is applied to the ground of a mobile terminal, and has two main factors: firstly, the machine learning model used in the application, especially the deep learning model, has large parameter quantity and is a burden for the mobile terminal; secondly, the computing power of the mobile terminal is relatively weak, and the mobile terminal cannot adapt to a machine learning model with high computing complexity.

In order to enable the mobile terminal to better carry the AI application, acceleration processing can be implemented by the compressor learning model to ensure the operation efficiency of the machine learning model on the terminal side.

However, the compression process may cause undue acceleration, resulting in reduced performance of the compressed machine learning model. Therefore, it is necessary to check the degree of similarity between the compressed machine learning model and the original machine learning model to detect an improper acceleration.

In the related art, the accuracy rate can be calculated according to the output of the compressed machine learning model and the corresponding label thereof, and the compressed machine learning model is verified according to the accuracy rate.

Disclosure of Invention

The inventors of the present disclosure found that the following problems exist in the above-described related art: some output variations due to the model variations caused by the improper acceleration cannot be detected sharply, resulting in poor verification.

In view of this, the present disclosure provides a verification technical scheme for a machine learning model, which can improve a verification effect.

According to some embodiments of the present disclosure, there is provided a verification method of a machine learning model, including: processing each test set data by using an original machine learning model, outputting a reference tensor corresponding to each test set data, processing each test set data by using a tested machine learning model, outputting a test tensor corresponding to each test set data, and compressing the original machine learning model by using the tested machine learning model to obtain the test tensor; calculating the difference between the corresponding reference tensor and the corresponding test tensor according to each test set data; and according to the difference, checking the similarity degree of the tested machine learning model and the original machine learning model.

In some embodiments, verifying the degree of similarity of the machine learning model under test to the original machine learning model based on the difference comprises: and checking the similarity according to whether the difference between any reference tensor and the test tensor with the same number is smaller than the difference between the reference tensor and the test tensor with other numbers and/or whether the difference between any test tensor and the reference tensor with the same number is smaller than the difference between the test tensor and the reference tensor with other numbers.

In some embodiments, the verification method further comprises: setting a number for each test set data, wherein the test set data, the reference tensor and the test tensor which have corresponding relations have the same number; according to the difference, the step of verifying the similarity degree of the tested machine learning model and the original machine learning model comprises the following steps: constructing a difference matrix according to each difference, wherein each row number of the difference matrix is the number of each reference tensor, each column number of the difference matrix is the number of each test tensor, and each element is the difference between the reference tensor corresponding to the row number and the test tensor corresponding to the column number; and checking the similarity degree according to the diagonal elements of the difference matrix.

In some embodiments, checking the similarity measure according to diagonal elements of the difference matrix comprises: and checking the similarity degree according to whether each diagonal element in the difference matrix is an extreme value element of the row and/or the column where the diagonal element is located.

In some embodiments, determining the degree of similarity according to whether each diagonal element in the disparity matrix is an extremum element in a row and/or a column thereof comprises: calculating the number of each diagonal element of the difference matrix as an extreme value element; the degree of similarity is checked based on the comparison result of the ratio of the number to the number of all diagonal elements with the first threshold.

In some embodiments, checking the similarity measure according to diagonal elements of the difference matrix comprises: inputting the difference matrix into a classifier model, and classifying each element in the difference matrix into similar elements or non-similar elements of corresponding reference tensors and test tensors; in the case where the diagonal elements are classified as similar elements, labeled as positive examples; in the case where the off-diagonal elements are classified as similar elements, label as a counterexample; and checking the similarity according to the labeling results of the positive example and the negative example.

In some embodiments, checking the similarity degree according to the labeling result of the positive example and the negative example comprises: determining at least one of the accuracy or the recall rate of the machine learning model according to the labeling result; and checking the similarity degree according to at least one of the accuracy rate or the recall rate.

In some embodiments, verifying the degree of similarity based on at least one of accuracy or recall includes: determining a check parameter according to the product of the accuracy and the recall rate and the sum of the accuracy and the recall rate, wherein the check parameter is positively correlated with the product and negatively correlated with the sum; and checking the similarity according to the comparison result of the checking parameter and the second threshold value.

In some embodiments, the compression process includes at least one of a model quantization process, a model clipping process, a migration learning process.

In some embodiments, the verification method further comprises: and deploying the machine learning model to be tested on the terminal, and carrying out calculation force test on the terminal.

According to further embodiments of the present disclosure, there is provided a power test method of a terminal, including: deploying the tested machine learning model on the terminal, performing calculation force test on the terminal, compressing the original machine learning model to obtain the tested machine learning model, and performing similarity degree verification on the tested machine learning model in the following way: processing each test set data by using an original machine learning model, outputting a reference tensor corresponding to each test set data, processing each test set data by using a tested machine learning model, outputting a test tensor corresponding to each test set data, and compressing the original machine learning model by using the tested machine learning model to obtain the test tensor; calculating the difference between the corresponding reference tensor and the corresponding test tensor according to each test set data; and according to the difference, checking the similarity degree of the tested machine learning model and the original machine learning model.

According to still other embodiments of the present disclosure, there is provided a verification apparatus for a machine learning model, including: the processing unit is used for processing the data of each test set by using the original machine learning model, outputting a reference tensor corresponding to the data of each test set, processing the data of each test set by using the tested machine learning model, and outputting a test tensor corresponding to the data of each test set, wherein the tested machine learning model is obtained by compressing the original machine learning model; the calculating unit is used for calculating the difference between the corresponding reference tensor and the corresponding test tensor according to each test set data; and the checking unit is used for checking the similarity degree of the tested machine learning model and the original machine learning model according to the difference.

According to still further embodiments of the present disclosure, there is provided a computing power testing apparatus of a terminal, including: the testing unit is used for deploying the tested machine learning model on the terminal and carrying out calculation force testing on the terminal, the tested machine learning model is obtained by compressing the original machine learning model, and the tested machine learning model carries out similarity degree verification in the following mode: processing each test set data by using an original machine learning model, outputting a reference tensor corresponding to each test set data, processing each test set data by using a tested machine learning model, outputting a test tensor corresponding to each test set data, and compressing the original machine learning model by using the tested machine learning model to obtain the test tensor; calculating the difference between the corresponding reference tensor and the corresponding test tensor according to each test set data; and according to the difference, checking the similarity degree of the tested machine learning model and the original machine learning model.

According to still further embodiments of the present disclosure, there is provided a verification apparatus of a machine learning model, including: a memory; and a processor coupled to the memory, the processor configured to perform a method of verifying a machine learning model in any of the above embodiments based on instructions stored in the memory device.

According to still further embodiments of the present disclosure, there is provided a computing power testing apparatus of a terminal, including: a memory; and a processor coupled to the memory, the processor configured to perform the method of computing force testing of the terminal in any of the above embodiments based on instructions stored in the memory device.

According to still further embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a verification method of a machine learning model or a power test method of a terminal in any of the above embodiments.

In the above embodiment, according to the processing difference of the machine learning models before and after compression on the same test set data, the similarity degree of the machine learning models after compression is checked. Thus, the output change caused by the change of the compression processing to the machine learning model can be detected sharply, and the verification effect can be improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

FIG. 1 illustrates a flow diagram of some embodiments of a verification method of a machine learning model of the present disclosure;

FIG. 2 illustrates a schematic diagram of some embodiments of a disparity matrix of the present disclosure;

FIG. 3 illustrates a schematic diagram of some embodiments of a verification method of a machine learning model of the present disclosure;

FIG. 4 illustrates a schematic diagram of some embodiments of a verification device of the machine learning model of the present disclosure;

FIG. 5 illustrates a block diagram of some embodiments of a validation apparatus or computing power testing apparatus of a terminal of the machine learning model of the present disclosure;

FIG. 6 illustrates a block diagram of further embodiments of a validation apparatus or computational force testing apparatus of a terminal of the machine learning model of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of parts and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as exemplary only and not as limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

As mentioned above, in order to enable the mobile terminal to better carry the AI application, the accelerated processing model may be implemented by compressing the measured machine learning model. In the end-side deep learning capability test, the output accuracy of the deep learning model can be used to verify the output.

However, the calculation accuracy is substantially to compare the label corresponding to the index with the highest probability in the label prediction probability tensor output by the classification model with the original label, and the data distribution in the output tensor is not analyzed. Therefore, with the accuracy as the check indicator, there is a risk that some abnormal output values or output distribution changes are masked.

In some embodiments, the compression process may include a quantization process, a clipping process, a migration learning, and the like.

For example, the parameters and/or activation values of the trained machine learning model may be quantized to represent the weights of the machine learning model or tensor data flowing through the model with a small number of bits (e.g., FP16, INT8, etc.).

The quantization process may be applied to a machine learning framework supported by the mobile terminal, and when the machine learning model is subjected to the quantization process, the machine learning model is converted into a model format supported by the framework.

For example, unnecessary parameters in the machine learning model can be removed through the clipping process, and then the performance of the machine learning model can be recovered in a retraining manner.

For example, useful information in a complex machine learning model can be migrated into a simpler machine learning model through migration learning.

In some embodiments, in order to perform the computation test on the capability of the mobile terminal to bear the deep learning capability model, a corresponding deep learning capability evaluation model needs to be set.

For example, a deep learning model with an original data format of FP64 or FP32 can be quantized into a model format (including a data format and a model file format, for example) supported by an end-side platform through a machine learning framework supported by a tested end. Different evaluation models can be set for different test scenes and test tasks (such as picture classification, target detection, AI calculation capacity and the like) so as to test the corresponding deep learning capacity.

In some embodiments, in the AI computation power test, in order to test the highest computation power that can be achieved by the mobile terminal as a whole as much as possible, a VGG16_ notop model, a VGG19_ notop model, or the like may be used as an evaluation model.

For example, the VGG16_ notop is a model obtained by cropping all FC (full Connected) layers and Softmax layers in a VGG16 (Visual Geometry Group) picture classification model.

VGG16_ notop is a full-volume model and is responsible for the input feature extraction, and the cut part is a feature fusion and classification structure. Therefore, the corresponding testing task is ImageNet picture feature extraction, and the evaluation model VGG16_ notop abstracts the input picture into a feature tensor with dimensions (7, 512).

In order to make the evaluation model more fit to the computation capability of the end side, the evaluation model (i.e. the original machine learning model) needs to be compressed to obtain the machine learning model to be tested. Therefore, after the evaluation model is selected, the evaluation model can be subjected to model compression and format conversion by using a model conversion tool provided by a chip manufacturer before being loaded to the end side.

In some embodiments, although the VGG16_ notop model can better express the computational power of the tested end as the evaluation model, the output of the VGG16_ notop model is a classification result, which brings certain difficulty for verifying the similarity of the model before and after compression. If the VGG16 image classification model is adopted, the classification accuracy can be calculated according to the label output by the model, so that the integrity of the mobile terminal model during model inference is detected, namely whether software and hardware on the terminal side are abnormally accelerated or not is detected.

For example, improper acceleration may include: the end-side reduces the amount of input data that needs to be processed, i.e., the machine learning model does not process every test set data within the test set.

Therefore, in the actual evaluation process, the output result of the unprocessed test set data is obtained by directly copying the output result of the processed test set data in the test set. This is also surprising, which can lead to an abnormal output of unprocessed test set data.

For example, improper acceleration may include: the hardware acceleration omits effective multiplication and addition operations in the original machine learning model, or excessively modifies parameters in the original machine learning model to obtain the tested machine learning model.

Therefore, the output result of the tested machine learning model is deformed, and the data processing quality, namely the model performance, cannot be guaranteed. If the difference between the output result of the tested machine learning model (quantitative model) and the output result of the original machine learning model is too large, it indicates that the distribution of the processing of the tested machine learning model to the output data is greatly changed, i.e. the changed degree in the processing process of the tested machine learning model is reflected to be larger.

In some embodiments, the picture labels may be obtained by connecting corresponding feature fusion structures, classification structures, and the like after the VGG16_ notop model, so as to implement the similarity check on the VGG16_ notop model before and after compression. However, the method is only suitable for classification models and is not strong in universality.

In some embodiments, in the computation force testing task, if the model structure is reasonably designed, an evaluation model with randomized parameters can be used to represent the AI computation force owned by the mobile terminal. However, such an evaluation model has no physical significance, and the output thereof will not have any physical significance, and the accuracy, which is a verification index, is invalid in such a scenario.

Further, the similarity degree evaluation is performed only by the accuracy of the output result of the machine learning model, and some output changes due to improper acceleration cannot be detected sharply. This is because the process of obtaining the accuracy rate is substantially to compare the label corresponding to the one-dimensional index with the highest probability in the label prediction probability tensor output by the classification model with the original label, and does not analyze and compare the data distribution in the output tensor. If only the accuracy is used as the check index, there is a risk that some abnormal output values or output distribution changes are masked.

In view of the above technical problems, since the quantization operation has little influence on the output value and the distribution of the model, the present disclosure compares the similarity or difference between the output tensor of the original machine learning model and the output tensor of the machine learning model under test to perform the similarity check based on the same test set.

The technical scheme of the disclosure can be applied to the scene of computing power test. Similar schemes of the disclosure can also be adopted for the scenes, and the modified degree of the machine learning model before and after the processes of activation value/parameter change, structure adjustment, format conversion and the like is verified.

For example, for a picture classification model, the probability tensor output by the Softmax layer can be extracted as a comparison object; for the target detection model, the feature tensor output by the end point of the feature extraction structure can be extracted as a comparison object.

For example, the technical solution of the present disclosure can be realized by the following embodiments.

Fig. 1 illustrates a flow diagram of some embodiments of a verification method of a machine learning model of the present disclosure.

As shown in fig. 1, in step 110, processing each test set data by using the original machine learning model, and outputting a reference tensor corresponding to each test set data; and processing the data of each test set by using the machine learning model to be tested, and outputting a test tensor corresponding to the data of each test set.

In some embodiments, a number is set for each test set data, and the test set data, the reference tensor, and the test tensor having the correspondence have the same number. For example, the test set data within the test set may be numbered, such as 1, 2.., N, to form ordered test set data. Each test set data may be different.

In some embodiments, the original deep learning model may be run on a trusted device (e.g., a personal computer) based on the ordered test set data, resulting in an ordered set of reference vectors. Each output reference tensor is the same number as its corresponding input test set data.

In some embodiments, the tested deep learning model may be run on the tested terminal based on the ordered test set, resulting in an ordered set of test strains. Each output test tensor is the same number as its corresponding input test set data. For example, the number may be a serial number, and each test tensor outputted is numbered in the order of its output.

In some embodiments, the machine learning model under test is obtained by compressing the original machine learning model. For example, the compression process includes at least one of a model quantization process, a model clipping process, and a migration learning process.

In step 120, for each test set data, the difference between its corresponding reference tensor and its corresponding test tensor is computed.

In step 130, the similarity between the tested machine learning model and the original machine learning model is verified according to the difference.

In some embodiments, the similarity is verified based on whether the difference between any reference tensor and the same numbered test tensor is less than the difference between the reference tensor and the other numbered test tensors. Or whether the similarity is checked according to the difference between any test tensor and the reference tensor with the same number is smaller than the difference between the test tensor and the reference tensor with other numbers.

In some embodiments, the degree of similarity is verified based on whether the difference between any reference tensor and the same numbered test tensor is less than the difference between the reference tensor and the other numbered test tensors, and whether the difference between any test tensor and the same numbered reference tensor is less than the difference between the test tensor and the other numbered reference tensors.

In some embodiments, a difference matrix is constructed from the differences; and checking the similarity degree according to diagonal elements of the difference matrix. Numbering rows of the difference matrix as the numbering of each reference tensor, and numbering columns as the numbering of each test tensor; each element of the difference matrix is the difference between the reference tensor corresponding to its row number and the test tensor corresponding to its column number.

For example, the disparity matrix may be constructed by the embodiment in fig. 2.

Fig. 2 illustrates a schematic diagram of some embodiments of a disparity matrix of the present disclosure.

As shown in FIG. 2, each test tensor q is computed_i(1. Ltoreq. I. Ltoreq.N) and each reference tensor f_j(j is not less than 1 and not more than N). The obtained difference values constitute a difference matrix with the size of NxN. Difference value calculation function diff (f)_j,q_i) Emphasis is placed on the numerical differences between the tensors.

For example, the difference may be calculated using L2-norm (euclidean distance) or SSIM (Structural Similarity) for a two-dimensional output tensor, or the like.

The difference values of two output tensors of the same number are arranged at diagonal positions, i.e., diagonal elements, of the difference matrix. In the case where parameters and structures (which may be referred to as arithmetic operations when running the model) between the reference machine learning model and the machine learning model under test are close, the difference values between the reference tensor and the test tensor of the same number should be smaller.

In this case, these smaller disparity values should be arranged on the diagonal as the very similar points (i.e., extreme value elements) of each row and each column in the disparity matrix.

In the first embodiment, the reference machine learning model is VGG16_ notop, and the measured machine learning model obtained through quantization processing runs on the mobile terminal. The resulting difference between the test tensors and the reference tensors constitutes a difference matrix.

For example, in the case of constructing the disparity matrix in terms of euclidean distance or SSIM, the number of diagonal points and the number of very similar points are both 1000.

The closer the parameters and the structures between the reference machine learning model and the tested machine learning model are, the more the number of the extreme values of the rows and the columns in the difference matrix, i.e. the extremely similar points, which are diagonal points, is. This feature may be referred to as a diagonal feature.

In the second embodiment, with fixed parameters, the last convolution layer in the VGG16_ notop is cut off to obtain the reference machine learning model, and the measured machine learning model and its operating environment are unchanged.

In this case, the structural change between the reference machine learning model and the measured machine learning model is large, and the number of the obtained extremely similar points of the rows and columns in the difference matrix is small, that is, the diagonal features disappear.

In a third embodiment, when the reference machine learning model and the tested machine learning model have different parameters and structures but are trained based on the same training set, the diagonal features of the difference matrix still exist. However, in this case, the number of diagonal points as the extremely similar points of the rows and columns in the disparity matrix is smaller than that in the first embodiment.

In some embodiments, the VGG19_ notop model has three convolutional layers more in model structure than the VGG16_ notop model. When both are obtained by training the ImageNet atlas, the two models have certain similarity to the extracted features of the same picture.

When the reference machine model is a VGG19_ notop model and the tested machine learning model is a model obtained by quantizing a VGG16_ notop model, under the condition that a difference matrix is constructed by Euclidean distance, the number of diagonal points is 989, and the number of extremely similar points is 1000; in the case of constructing the disparity matrix in SSIM, the number of diagonal points is 811, and the number of very similar points is 1000.

In some embodiments, the similarity degree is checked according to whether each diagonal element in the disparity matrix is at least one of the extremum element in the row where the diagonal element is located or the extremum element in the column where the diagonal element is located. For example, when the element in the disparity matrix is a disparity indicator (such as euclidean distance) between tensors, the extreme value element is the minimum value in one row or one column; when the elements in the disparity matrix are similarity indicators (such as structural similarity) between tensors, the extreme value element is the maximum value in one row or one column.

For example, the number of extremum elements of each diagonal element of the disparity matrix is calculated; the degree of similarity is checked based on the comparison result of the ratio of the number to the number of all diagonal elements with the first threshold.

In some embodiments, the method for determining whether each row of the polar similarity points in the difference matrix is located at the diagonal position is as follows.

If diff (f)_j,q_i) For a difference metric function, such as L2-norm, it is required that the diagonal point is the minimum value of the row j where the diagonal point is located, that is:

if diff (f)_j,q_i) For the similarity measurement function, such as SSIM, it needs to satisfy that the diagonal point is the maximum of the row j where the diagonal point is located, that is:

when the percentage of the number of the extreme similar points of the row and the column of the diagonal points in the difference matrix in the total number of the diagonal points is smaller than a first threshold value alpha, the machine learning model to be tested is judged to be tampered, or tampered in the tested running environment, the verification is not passed, and the subsequent steps are not required. For example, α ∈ [0,1], and the value is determined according to the actual application scenario.

In some embodiments, the difference matrix is input to a classifier model, and each element in the difference matrix is classified as a similar element or a non-similar element of the corresponding reference tensor and the test tensor; in the case where the diagonal elements are classified as similar elements, labeled as positive examples; in the case where the off-diagonal elements are classified as similar elements, labeled as counterexamples; and checking the similarity according to the labeling results of the positive example and the negative example.

For example, the classifier model may be an SVM (Support Vector Machine) model, a logistic regression model, or the like, or may be an ROC (Receiver Operating Characteristic) curve model trained using a smaller training set. The ROC curve model can be used to determine the optimal threshold for classification.

For example, the similar element is an extremal element of the corresponding reference tensor and all the test tensors, or is at least one of the extremal elements of the corresponding test tensor and all the reference tensors.

In some embodiments, two classifiers may be used to distinguish whether each pixel in the difference matrix is a strong similarity point (i.e., a similar element). For example, the strong similarity point may be an extreme value element whose difference value is smaller than a preset threshold.

For example, through a plurality of credible test models and credible test environments, training data required by the classifiers of strong similarity points is obtained. The training data may be all values in the difference matrix, with points on the diagonal being strongly similar points labeled positive examples, and the remaining points being strongly similar points labeled negative examples. The two classifiers can be SVM models and the like.

In some embodiments, at least one of an accuracy or a recall of the machine learning model is determined based on the annotation result; and checking the similarity degree according to at least one of the accuracy rate or the recall rate.

For example, determining a verification parameter according to the product of the accuracy and the recall rate and the sum of the accuracy and the recall rate; and checking the similarity according to the comparison result of the checking parameter and the second threshold value. The check parameter is positively correlated with the product and negatively correlated with the sum.

In some embodiments, a harmonic mean in the strong similarity point classification task may be calculated as a check parameter.

For example, the proportion of diagonal elements in all elements in the elements of the difference matrix determined as strong similarity points by the second classifier is calculated to obtain the accuracy P; calculating the occupation ratio of diagonal elements determined as strong similarity points by the classifier in all diagonal elements to obtain a recall ratio R; taking the harmonic mean value F1 of the two values as a check parameter, wherein the harmonic mean value is equal to [0,1 ]:

the higher the F1 is, the higher the credibility of the tested machine learning model and the tested environment is, and the smaller the degree of the original machine learning model is changed. If the resulting F1 is less than a second threshold μ ∈ (0, 1), the check fails. For example, the second threshold may be determined from a P-R curve.

Judging two output tensors q with the same number through a two-classifier_iAnd f_iWhether or not there is significant similarity is a more refined description of the inter-tensor differences in value and distribution. In this way, the difference of the two compared machine learning models in processing the same input data can be reflected better.

In some embodiments, the calculation power evaluation is performed on the AI of a certain brand of mobile phone according to the Euclidean distance difference matrix.

For example, in the case of the first embodiment described above, the accuracy is 99.40%, the recall rate is 99.40%, and F1 is 0.994. The second threshold may be 0.958334.

For example, in the case of the third embodiment, the accuracy is 94.66%, the recall rate is 17.72%, and F1 is 0.299.

In some embodiments, the tested machine learning model is deployed on the terminal, and the terminal is computationally tested. For example, the tested machine learning model passing the verification can be deployed on the terminal. The image data, voice data, text data, etc. can also be processed using a machine learning model under test deployed on the terminal.

In the embodiment, the adopted verification method is more universal, is suitable for various deep learning model structures, and is not limited by the actual function and the physical meaning of the output of the model. Even if a parameter randomization model is adopted, the method can still be used for output verification, so that the method can adapt to more scenes.

Moreover, the output difference caused by the change of model parameters, activation values, structures or formats can be reflected more sensitively.

In the above embodiment, in order to reflect the difference between models, the check is converted into the distribution problem of the extreme similar points and the classification problem of the strong similar points in the matrix by using the diagonal features possessed by the difference matrix between similar model outputs. In addition, F1 of the classification result is used as the representation of the model difference, the verification can be performed from multiple angles such as accuracy and recall rate, and the verification accuracy is improved.

Fig. 3 illustrates a schematic diagram of some embodiments of a verification method of a machine learning model of the present disclosure.

As shown in FIG. 3, test set data is first constructed. For example, a number may be set for each test set data, and the test set data, the reference tensor, and the test tensor having the correspondence have the same number. For example, the test set data within the test set may be numbered, such as 1, 2., N, to form ordered test set data. Each test set data may be different.

Then, carrying out quantitative processing on the reference machine learning model to obtain a tested machine learning model; the test set data may be processed using a reference machine learning model and a machine learning model under test.

An original deep learning model may be run on a trusted device (e.g., a personal computer) based on the ordered test set data to obtain an ordered set of reference vectors. Each output reference tensor is numbered the same as its corresponding input test set data.

And running the tested deep learning model on the tested terminal based on the ordered test set to obtain the ordered test tensor set. Each output test tensor is the same number as its corresponding input test set data. For example, the number may be a serial number, and each test tensor outputted is numbered in the order of its output.

Calculating each test tensor q_i(1. Ltoreq. I. Ltoreq.N) and each reference tensor f_j(j is not less than 1 and not more than N). The obtained difference values constitute a difference matrix with the size of NxN. Difference value calculation function diff (f)_j,q_i) Emphasis is placed on the numerical differences between the tensors. For example, the difference may be calculated using L2 norm or SSIM for a two-dimensional output tensor, or the like.

The difference values of the two output tensors of the same number are arranged at diagonal positions, i.e., diagonal elements, of the difference matrix. In the case where parameters and structures (which may be referred to as arithmetic operations when running the model) between the reference machine learning model and the machine learning model under test are close, the difference values between the reference tensor and the test tensor of the same number should be smaller.

In this case, these smaller difference values should be arranged on the diagonal line as the extremely similar points (i.e., extreme value elements) of each row and each column in the difference matrix. Thus, it can be judged

In the first embodiment, the reference machine learning model is VGG16_ notop, and the quantized measured machine learning model runs on the mobile terminal. The resulting difference of each test tensor and each reference tensor constitutes a difference matrix.

In the second embodiment, under the condition of fixed parameters, the last convolution layer in the VGG16_ notop is cut off to obtain the reference machine learning model, and the measured machine learning model and the operating environment thereof are not changed.

In this case, the structural change between the reference machine learning model and the measured machine learning model is large, and the number of the obtained row and column extremely similar points in the difference matrix is small, that is, the diagonal features disappear.

In the third embodiment, when the reference machine learning model and the tested machine learning model have different parameters and structures but are trained based on the same training set, the diagonal features of the difference matrix still exist. However, in this case, the number of diagonal points that are the most similar points of the rows and columns in the disparity matrix is smaller than that in the first embodiment.

For example, the number of extreme value elements of each diagonal element of the difference matrix is calculated; the degree of similarity is checked based on the comparison result of the ratio of the number to the number of all diagonal elements with the first threshold.

In some embodiments, the determination method of whether each row of the polar similarity points in the difference matrix is at the diagonal position is as follows.

If diff (f)_j,q_i) For the difference metric function, such as L2-norm, it is required that the diagonal point is the minimum value of the line j where the diagonal point is located, that is:

In some embodiments, it may be determined whether the diagonal element is an extremum element of the row in which the diagonal element is located; if yes, determining similar elements in the difference matrix according to a preset threshold; if not, the verification is not passed.

In some embodiments, the number K of the very similar points in each row in the difference matrix can be counted as diagonal points; judging whether the ratio of K to the total number N of the diagonal points is larger than or equal to a first threshold value or not; if the difference matrix is larger than or equal to the preset threshold, determining similar elements in the difference matrix according to the preset threshold; if not, the verification is not passed. For example, the strong similarity point may be an extreme value element whose difference value is smaller than a preset threshold.

In some embodiments, two classifiers may be used to distinguish whether each pixel in the difference matrix is a strong similarity point (i.e., a similar element).

For example, through a plurality of credible test models and credible test environments, training data required by the classifiers of strong similarity points is obtained. The training data may be all values in the difference matrix, with points on the diagonal being strongly similar points labeled positive examples, and the remaining points being strongly similar points labeled negative examples. The two classifiers may be SVMs, etc.

For example, the proportion of diagonal elements in all elements in the difference matrix elements determined as strong similarity points by the two classifiers is calculated to obtain the accuracy rate P; calculating the proportion of diagonal elements determined as strong similarity points by the classifier in all diagonal elements to obtain a recall ratio R; taking the harmonic mean value F1 epsilon [0,1] of the two as a check parameter:

the higher the F1 is, the higher the credibility of the tested machine learning model and the tested environment is, and the smaller the degree of the change of the original machine learning model is. If the resulting F1 is less than a second threshold μ ∈ (0, 1), the check fails. For example, the second threshold may be determined from a P-R curve.

Judging two output tensors q with the same number through a two-classifier_iAnd f_iWhether or not there is significant similarity is a more refined description of the inter-tensor value and distribution differences. In this way, the difference of the two compared machine learning models in processing the same input data can be reflected better.

In some embodiments, the power evaluation is performed on the AI of a certain brand of mobile phone according to the euclidean distance difference matrix.

For example, in the case of the third embodiment described above, the accuracy is 94.66%, the recall rate is 17.72%, and F1 is 0.299.

In some embodiments, model output similarity checks for AI computation tests may be performed in a chip.

For example, in the computation force test, it is necessary to check whether the end under test employs an improper acceleration means for pursuing high computation force.

The test set can be 1000 self-selected different ImageNet pictures; and obtaining an AI computing power estimation value of the tested end according to the inference time and the times of multiplication and addition operations used by the VGG16_ notop model when the tested end processes the test set. The feature tensor of the model output can be used for checking the similarity degree between models.

In checking whether the resulting difference matrix has diagonal features, the α value may be set to 1. I.e. the points on the diagonal must all be the very similar points of the row and column in which they are located. After the original machine learning model is subjected to quantization processing, the output test tensor and the distribution change thereof are relatively few.

Moreover, the arrangement can also effectively eliminate the problem that the detected end intentionally skips a small number of input pictures for acceleration inference. For example, skipping only one picture, the model accuracy drops by 0.1% at most, and is easily mistakenly attributed as a problem for the quantization algorithm.

For example, IN the verification process of the computational power test, the VGG16_ notop quantization with format FP64 is processed into the VGG16_ notop test model with format IN8 by the mechanical framework supported by the tested terminal.

If diff (f)_j,q_i) For the difference measurement function, the difference matrix needs to satisfy that all points on the diagonal are the minimum values of the row or column where the points are located; if diff (f)_j,q_i) For the similarity measure function, the difference matrix should satisfy that all points on the diagonal are the maximum of the row or column where it is located.

In some embodiments, the verification method in any of the above embodiments may also be applied to model security verification. For example, it is checked whether parameters, activation values, structures, and the like of the model are tampered with maliciously.

In the above embodiment, for the "black box" property of the deep learning model, a verification method before and after model compression is proposed, and the difference degree between models can be judged according to the difference between the output tensor values and the distribution of the models.

The verification method utilizes diagonal features possessed by a difference matrix between similar model outputs to judge whether the deep learning model is tampered.

And converting the check into a distribution problem of the extreme similar points in the matrix and a classification problem of the strong similar points, and taking F1 of a classification result as the representation of the model difference.

Fig. 4 illustrates a schematic diagram of some embodiments of a verification device of a machine learning model of the present disclosure.

As shown in fig. 4, the verification device 4 of the machine learning model includes a processing unit 41, a calculating unit 42, and a verifying unit 43.

The processing unit 41 processes each test set data by using the original machine learning model, and outputs a reference tensor corresponding to each test set data; and processing the data of each test set by using the machine learning model to be tested, and outputting a test tensor corresponding to the data of each test set. The tested machine learning model is obtained by compressing the original machine learning model.

The calculation unit 42 calculates, for each test set data, the difference between its corresponding reference tensor and its corresponding test tensor.

The verification unit 43 verifies the similarity between the machine learning model under test and the original machine learning model according to the difference.

In some embodiments, the checking unit 43 checks the degree of similarity according to whether the difference between any one of the reference tensors and the same-numbered test tensor is smaller than the difference between the reference tensor and the other-numbered test tensors, and/or whether the difference between any one of the test tensors and the same-numbered reference tensor is smaller than the difference between the test tensor and the other-numbered reference tensors.

In some embodiments, the processing unit 41 sets a number for each test set data, and the test set data, the reference tensor, and the test tensor having the correspondence have the same number; the checking unit 43 constructs a difference matrix according to each difference, wherein each row number of the difference matrix is the number of each reference tensor, each column number is the number of each test tensor, and each element is the difference between the reference tensor corresponding to the row number and the test tensor corresponding to the column number; and checking the similarity degree according to the diagonal elements of the difference matrix.

In some embodiments, the checking unit 43 checks the similarity degree according to whether each diagonal element in the disparity matrix is an extremum element in the row and/or the column where the diagonal element is located.

In some embodiments, the checking unit 43 calculates the number of extremum elements as each diagonal element of the difference matrix; the degree of similarity is checked based on the comparison result of the ratio of the number to the number of all diagonal elements with the first threshold.

In some embodiments, the checking unit 43 inputs the difference matrix into the classifier model, and classifies each element in the difference matrix as a similar element or a non-similar element of the corresponding reference tensor and test tensor; in the case where the diagonal elements are classified as similar elements, labeled positive case; in the case where the off-diagonal elements are classified as similar elements, label as a counterexample; and checking the similarity according to the labeling results of the positive example and the negative example.

In some embodiments, the verification unit 43 determines at least one of an accuracy or a recall of the machine learning model according to the labeling result; and checking the similarity degree according to at least one of the accuracy rate or the recall rate.

In some embodiments, root checking unit 43 determines checking parameters according to the product of accuracy and recall rate and the sum of accuracy and recall rate, where the checking parameters are positively correlated with the product and negatively correlated with the sum; and checking the similarity according to the comparison result of the checking parameter and the second threshold value.

In some embodiments, the verification apparatus further includes a testing unit, configured to deploy the machine learning model under test on the terminal, and perform a computational force test on the terminal.

In some embodiments, a computing power testing apparatus of a terminal includes: and the test unit is used for deploying the machine learning model to be tested on the terminal and carrying out calculation force test on the terminal. The tested machine learning model is obtained by compressing the original machine learning model.

For example, the machine learning model under test performs similarity checking by: processing each test set data by using an original machine learning model, outputting a reference tensor corresponding to each test set data, processing each test set data by using a tested machine learning model, outputting a test tensor corresponding to each test set data, and compressing the original machine learning model by using the tested machine learning model to obtain the test tensor; calculating the difference between the corresponding reference tensor and the corresponding test tensor according to each test set data; and according to the difference, checking the similarity degree of the tested machine learning model and the original machine learning model.

Fig. 5 illustrates a block diagram of some embodiments of a validation apparatus or computing power testing apparatus of a terminal of the machine learning model of the present disclosure.

As shown in fig. 5, in some embodiments, the verification device 5 of the machine learning model includes: a memory 51 and a processor 52 coupled to the memory 51, the processor 52 being configured to execute a method of verifying a machine learning model in any one of the embodiments of the present disclosure based on instructions stored in the memory 51.

In some embodiments, the computing power testing device 5 includes: a memory 51 and a processor 52 coupled to the memory 51, the processor 52 being configured to execute the computing power testing method in any of the embodiments of the present disclosure based on instructions stored in the memory 51.

The memory 51 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, a database, and other programs.

As shown in fig. 6, in some embodiments, the verification device 6 of the machine learning model includes: a memory 610 and a processor 620 coupled to the memory 610, the processor 620 being configured to perform a method of verifying a machine learning model in any of the above embodiments based on instructions stored in the memory 610.

In some embodiments, the computing power test device 6 includes: a memory 610 and a processor 620 coupled to the memory 610, the processor 620 configured to perform the force testing method of any of the foregoing embodiments based on instructions stored in the memory 610.

The memory 610 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, and other programs.

The verification device or the computation test device may further include an input/output interface 630, a network interface 640, a storage interface 650, and the like. These

interfaces

630, 640, 650 and the connection between the memory 610 and the processor 620 may be, for example, via a bus 860. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, a microphone, and a sound box. The network interface 640 provides a connection interface for various networking devices. The storage interface 650 provides a connection interface for external storage devices such as an SD card and a usb disk.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having computer-usable program code embodied therein.

So far, the verification method of the machine learning model, the verification apparatus of the machine learning model, the computing power test method of the terminal, the computing power test apparatus of the terminal, and the nonvolatile computer readable storage medium according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of validating a machine learning model, comprising:

processing each test set data by using an original machine learning model, outputting a reference tensor corresponding to each test set data, processing each test set data by using a tested machine learning model, and outputting a test tensor corresponding to each test set data, wherein the tested machine learning model is obtained by compressing the original machine learning model;

calculating the difference between the corresponding reference tensor and the corresponding test tensor according to each test set data;

and according to the difference, checking the similarity degree of the tested machine learning model and the original machine learning model.

2. The verification method of claim 1, wherein the verifying the similarity of the machine learning model under test and the original machine learning model according to the difference comprises:

and checking the similarity according to whether the difference between any reference tensor and the same-numbered test tensor is smaller than the difference between the reference tensor and the other-numbered test tensors and/or whether the difference between any test tensor and the same-numbered reference tensor is smaller than the difference between the test tensor and the other-numbered reference tensors.

3. The verification method of claim 1, further comprising:

setting a number for each test set data, wherein the test set data, the reference tensor and the test tensor which have corresponding relations have the same number;

wherein, according to the difference, verifying the similarity between the tested machine learning model and the original machine learning model comprises:

constructing a difference matrix according to the differences, wherein the serial numbers of the rows of the difference matrix are the serial numbers of the reference tensors, the serial numbers of the columns of the difference matrix are the serial numbers of the test tensors, and each element is the difference between the reference tensor corresponding to the row serial number and the test tensor corresponding to the column serial number;

and checking the similarity degree according to diagonal elements of the difference matrix.

4. A verification method according to claim 3, wherein said verifying said degree of similarity according to diagonal elements of said difference matrix comprises:

and checking the similarity degree according to whether each diagonal element in the difference matrix is an extreme value element of the line and/or the column where the diagonal element is located.

5. The checking method according to claim 4, wherein said determining the similarity degree according to whether each diagonal element in the difference matrix is an extreme element in a row and/or a column thereof comprises:

calculating the number of each diagonal element of the difference matrix as an extreme value element;

the degree of similarity is checked based on the comparison of the ratio of the number to the number of all diagonal elements with a first threshold.

6. The verification method of claim 4, wherein said verifying said similarity measure according to diagonal elements of said discrepancy matrix comprises:

inputting the difference matrix into a classifier model, and classifying each element in the difference matrix into similar elements or non-similar elements of corresponding reference tensor and test tensor;

in the case where the diagonal elements are classified as similar elements, labeled positive case;

in the case where the off-diagonal elements are classified as similar elements, label as a counterexample;

and checking the similarity degree according to the labeling results of the positive example and the negative example.

7. The verification method according to claim 6, wherein the verifying the similarity degree according to the labeling results of positive and negative examples comprises:

determining at least one of accuracy or recall of the machine learning model according to the labeling result;

and checking the similarity degree according to at least one item of the accuracy rate or the recall rate.

8. A verification method according to claim 7, wherein said verifying said degree of similarity in accordance with at least one of said accuracy or recall comprises:

determining a check parameter according to the product of the accuracy and the recall rate and the sum of the accuracy and the recall rate, wherein the check parameter is positively correlated with the product and negatively correlated with the sum;

and checking the similarity degree according to the comparison result of the checking parameter and a second threshold value.

9. The verification method according to any one of claims 1-8,

the compression processing comprises at least one of model quantization processing, model clipping processing and transfer learning processing.

10. The verification method of any of claims 1-8, further comprising:

and deploying the tested machine learning model on a terminal, and carrying out calculation force test on the terminal.

11. A computing power testing method of a terminal comprises the following steps:

deploying a tested machine learning model on a terminal, and performing computing power test on the terminal, wherein the tested machine learning model is obtained by compressing the original machine learning model, and the tested machine learning model performs similarity degree verification in the following way:

processing each test set data by using an original machine learning model, outputting a reference tensor corresponding to each test set data, processing each test set data by using the tested machine learning model, and outputting a test tensor corresponding to each test set data;

12. A verification apparatus for a machine learning model, comprising:

the processing unit is used for processing each test set data by using an original machine learning model, outputting a reference tensor corresponding to each test set data, processing each test set data by using a tested machine learning model, and outputting a test tensor corresponding to each test set data, wherein the tested machine learning model is obtained by compressing the original machine learning model;

the calculating unit is used for calculating the difference between the corresponding reference tensor and the corresponding test tensor according to each test set data;

and the verifying unit is used for verifying the similarity between the tested machine learning model and the original machine learning model according to the difference.

13. A computing power testing apparatus of a terminal, comprising:

the testing unit is used for deploying a tested machine learning model on a terminal and performing computing power testing on the terminal, the tested machine learning model is obtained by compressing the original machine learning model, and the tested machine learning model performs similarity degree verification in the following mode:

processing each test set data by using an original machine learning model, outputting a reference tensor corresponding to each test set data, processing each test set data by using the tested machine learning model, and outputting a test tensor corresponding to each test set data, wherein the tested machine learning model is obtained by compressing the original machine learning model;

14. A verification apparatus for a machine learning model, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of verifying a machine learning model of any of claims 1-10 based on instructions stored in the memory.

15. A computing power testing apparatus of a terminal, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of computing power testing of the terminal of claim 11 based on instructions stored in the memory.

16. A non-transitory computer-readable storage medium on which is stored a computer program that, when executed by a processor, implements a method of verifying a machine learning model according to any one of claims 1 to 10 or a method of computationally testing a terminal according to claim 11.