WO2021051917A1 - 人工智能ai模型的评估方法、系统及设备 - Google Patents

人工智能ai模型的评估方法、系统及设备 Download PDF

Info

Publication number
WO2021051917A1
WO2021051917A1 PCT/CN2020/097651 CN2020097651W WO2021051917A1 WO 2021051917 A1 WO2021051917 A1 WO 2021051917A1 CN 2020097651 W CN2020097651 W CN 2020097651W WO 2021051917 A1 WO2021051917 A1 WO 2021051917A1
Authority
WO
WIPO (PCT)
Prior art keywords
evaluation data
model
evaluation
data
data set
Prior art date
Application number
PCT/CN2020/097651
Other languages
English (en)
French (fr)
Inventor
陈轶
李鹏飞
李亿
白小龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911425487.7A external-priority patent/CN112508044A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20865555.5A priority Critical patent/EP4024297A4/en
Publication of WO2021051917A1 publication Critical patent/WO2021051917A1/zh
Priority to US17/696,040 priority patent/US20220207397A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7784Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors
    • G06V10/7792Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors the supervisor being an automated module, e.g. "intelligent oracle"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • This application relates to the field of artificial intelligence (AI), and in particular to an AI model evaluation method, system and equipment.
  • AI artificial intelligence
  • AI models applied to different scenarios are continuously trained, for example: AI models trained for image classification, AI models trained for object recognition, etc.
  • the trained AI model may have some problems, such as: the AI model that has been trained for image classification has the problem of low classification accuracy for all input images or part of the input images. Therefore, it is necessary to evaluate the trained AI model.
  • This application discloses an AI model evaluation method, system and equipment, which are used to more effectively evaluate the AI model.
  • the first aspect discloses an evaluation method of an AI model.
  • a computing device obtains an AI model and an evaluation data set including a plurality of evaluation data carrying tags, classifies the evaluation data in the evaluation data set according to data characteristics, and obtains a subset of the evaluation data, Among them, the evaluation data subset is a subset of the evaluation data set, and the value of the data characteristics of all evaluation data in the evaluation data subset meets the condition.
  • the computing device further determines the AI model's inference result on the evaluation data in the evaluation data subset, and The inference result of each evaluation data in the evaluation data subset is compared with the label of each evaluation data in the evaluation data subset, and the accuracy of the AI model's inference on the evaluation data subset is calculated based on the comparison result to obtain the AI model pair data The evaluation result of the data whose feature value satisfies the condition.
  • the above method can obtain the evaluation result of the AI model on the data of a specific classification, and the evaluation result can be used to better guide the further optimization of the AI model.
  • the label of each evaluation data mentioned above is used to indicate the true result corresponding to the evaluation data.
  • the computing device may generate optimization suggestions for the AI model.
  • the optimization suggestion may include: training the AI model with new data whose data feature values meet the condition. According to the evaluation results obtained in this application, the more specific optimization suggestions for the AI model can be effectively optimized, and the reasoning ability of the optimized AI model can be improved, avoiding the need for technical personnel to optimize the AI model only based on experience. The problem of poor optimization effect.
  • the computing device can generate an evaluation report that includes evaluation results and/or optimization suggestions, and send the evaluation report to the user's device or system, so that the user can understand the AI model's specific classification data based on the evaluation report The results of the evaluation, and the optimization of the AI model based on the evaluation report.
  • the computing device can obtain performance data.
  • the performance data can represent the performance of the hardware performing the inference process during the AI model's inference process on the evaluation data, and/or can be represented in the AI model.
  • the AI model includes the use of operators, so that users can understand the impact of the AI model on the hardware and the use of the operators in the AI model based on the performance data, and the AI model can be correspondingly based on the performance data optimization.
  • the performance data may include the utilization rate of the central processing unit (CPU), the utilization rate of the graphics processing unit (GPU), the memory usage, the video memory usage, and the operator One or more of the duration of use and the number of operators used.
  • the number of the foregoing data features may be multiple, the foregoing conditions may include multiple sub-conditions, and the relationship between the multiple data features and the multiple sub-conditions is one-to-one correspondence.
  • the computing device classifies the evaluation data in the evaluation data set according to the data characteristics and obtains the evaluation data subset, it can classify the evaluation data in the evaluation data set according to the multiple data characteristics to obtain the evaluation data subset. Wherein, each of the multiple data feature values of all the evaluation data in the evaluation data subset satisfies the corresponding sub-condition in the above conditions.
  • the above method classifies the evaluation data set according to multiple data characteristics, and can obtain the evaluation result of the AI model on the data of a specific classification, and the evaluation result can be better used to guide the further optimization direction of the AI model.
  • the computing device can determine the inference result of the AI model on the evaluation data in the evaluation data set, and calculate the AI based on the comparison result of the inference result of the evaluation data in the evaluation data set and the label of the evaluation data in the evaluation data set.
  • the accuracy of the model's inference on the evaluation data set is used to obtain the evaluation result of the AI model on the global data. The above method can intuitively obtain the overall reasoning ability of the AI model for global data.
  • the evaluation data in the evaluation data set may be images or audio.
  • an AI model evaluation system includes:
  • An input output (I/O) module is used to obtain the AI model and an evaluation data set.
  • the evaluation data set includes multiple evaluation data carrying tags, and the tag of each evaluation data is used to represent the evaluation The actual result corresponding to the data;
  • the data analysis module is configured to classify the evaluation data in the evaluation data set according to data characteristics to obtain an evaluation data subset, the evaluation data subset being a subset of the evaluation data set, and the evaluation data subset The values of the data characteristics of all the evaluation data satisfy the conditions;
  • a reasoning module configured to determine the reasoning result of the AI model on the evaluation data in the evaluation data subset
  • the data analysis module is further configured to compare the inference result of each evaluation data in the evaluation data subset with the label of each evaluation data in the evaluation data subset, and calculate the AI model pair according to the comparison result.
  • the accuracy of the inference of the evaluation data subset is obtained to obtain the evaluation result of the data whose value of the data feature satisfies the condition by the AI model.
  • system further includes:
  • the diagnosis module is configured to generate an optimization suggestion for the AI model, and the optimization suggestion includes: training the AI model with new data whose values of the data feature satisfy the condition.
  • the diagnostic module is also used to generate an evaluation report, the evaluation report including the evaluation result and/or the optimization suggestion;
  • the I/O module is also used to send the evaluation report.
  • system further includes:
  • the performance monitoring module is used to obtain performance data, and the performance data is used to indicate the performance of the hardware performing the inference process during the inference process of the AI model on the evaluation data, or in the AI model The usage of the operators included in the AI model in the process of inferring the evaluation data.
  • the performance data includes one or more of the following data: central processing unit CPU usage rate, graphics processing unit GPU usage rate, memory usage, video memory usage, and operator’s The length of use and the number of operators used.
  • the reasoning module is further configured to determine the reasoning result of the AI model on the evaluation data in the evaluation data set;
  • the system also includes:
  • the model analysis module is used to calculate the accuracy of the AI model's reasoning on the evaluation data set based on the comparison result of the inference result of the evaluation data in the evaluation data set and the label of the evaluation data in the evaluation data set, To obtain the evaluation result of the AI model on the global data.
  • the number of data features is multiple
  • the condition includes multiple sub-conditions
  • the relationship between the multiple data features and the multiple sub-conditions is in a one-to-one correspondence
  • the data analysis module is specifically configured to classify the evaluation data in the evaluation data set according to the multiple data characteristics to obtain the evaluation data subset, wherein the value of all the evaluation data in the evaluation data subset is Each of the values of the multiple data characteristics satisfies a corresponding sub-condition in the condition.
  • the evaluation data in the evaluation data set is an image or audio.
  • a computing device in a third aspect, includes a memory and a processor, the memory is used to store a set of computer instructions; the processor executes a set of computer instructions stored in the memory, so that the calculation The device executes the method disclosed in the first aspect or any one of the possible implementations of the first aspect.
  • a computer-readable storage medium stores computer program code.
  • the computing device executes the aforementioned first aspect or the first aspect.
  • the storage medium includes, but is not limited to, volatile memory, such as random access memory, non-volatile memory, such as flash memory, hard disk drive (HDD), and solid state drive (SSD).
  • a fifth aspect discloses a computer program product.
  • the computer program product includes computer program code.
  • the computing device executes the foregoing first aspect or any possible implementation of the first aspect.
  • the computer program product may be a software installation package.
  • the computer program product may be downloaded and executed on a computing device. Program product.
  • the sixth aspect discloses an AI model evaluation method.
  • a computing device can obtain an AI model and an evaluation data set including multiple evaluation data carrying tags, and use the AI model to reason about the evaluation data in the evaluation data set to obtain performance data.
  • Performance data generate optimization suggestions for AI models.
  • the above method provides more specific optimization suggestions for the AI model based on the performance data obtained by the evaluation method of the present application, which avoids the problem of poor optimization effect caused by technicians only optimizing the AI model based on experience.
  • Performance data is used to indicate the performance of the hardware performing the inference process during the AI model inferring the evaluation data, or the usage of the operators included in the AI model during the AI model inferring the evaluation data.
  • the optimization suggestions may include adjusting the structure of the AI model, or performing optimization training on the operators of the AI model.
  • the computing device can generate an evaluation report that includes performance data and/or optimization suggestions, and send the evaluation report so that the user can understand the AI model's reasoning ability based on data characteristics based on the evaluation report, and evaluate the AI based on the evaluation report.
  • the model is optimized.
  • the usage of the operators included in the AI model in the process of inferring the evaluation data by the AI model includes: the usage time of the operators of the AI model, and the duration of the operators of the AI model usage amount.
  • the performance of the hardware performing the inference process includes: CPU usage, GPU usage, memory usage, and video memory usage. One or more.
  • the computing device can determine the inference result of the AI model on the evaluation data in the evaluation data set, and calculate the AI based on the comparison result of the inference result of the evaluation data in the evaluation data set and the label of the evaluation data in the evaluation data set.
  • the accuracy of the model's inference on the evaluation data set is used to obtain the evaluation result of the AI model on the global data. The above method can intuitively obtain the overall reasoning ability of the AI model for global data.
  • the evaluation data in the evaluation data set may be images or audio.
  • a seventh aspect discloses an AI model evaluation system, the system includes:
  • I/O module configured to obtain the AI model and evaluation data set, the evaluation data set includes a plurality of evaluation data carrying tags, and the tag of each evaluation data is used to represent the true result corresponding to the evaluation data;
  • the reasoning module is configured to use the AI model to reason about the evaluation data in the evaluation data set;
  • the performance monitoring module is used to obtain performance data, and the performance data is used to indicate the performance of the hardware performing the inference process during the inference process of the AI model on the evaluation data, or in the AI model The usage of the operators included in the AI model in the process of inferring the evaluation data;
  • the diagnosis module is configured to generate optimization suggestions for the AI model based on the performance data, the optimization suggestions including: adjusting the structure of the AI model, or performing optimization training for the operators of the AI model .
  • the diagnostic module is also used to generate an evaluation report, the evaluation report including the performance data and/or the optimization suggestion;
  • the I/O module is also used to send the evaluation report.
  • the usage of the operators included in the AI model in the process of the AI model inferring the evaluation data includes: the usage time of the operators of the AI model, the The number of operators used in the AI model.
  • the performance of the hardware performing the inference process includes: CPU usage, GPU usage, memory usage, and video memory usage. One or more.
  • the reasoning module is also used to determine the reasoning result of the AI model on the evaluation data in the evaluation data set;
  • the system also includes:
  • the model analysis module is used to calculate the accuracy of the AI model's reasoning on the evaluation data set based on the comparison result of the inference result of the evaluation data in the evaluation data set and the evaluation data label in the evaluation data set to obtain the AI model The evaluation result of the global data. .
  • the evaluation data in the evaluation data set is an image or audio.
  • An eighth aspect discloses a computing device.
  • the computing device includes a memory and a processor.
  • the memory is used to store a set of computer instructions; the processor executes a set of computer instructions stored in the memory to make the calculation
  • the device executes the method disclosed in the sixth aspect or any one of the possible implementation manners of the sixth aspect.
  • a ninth aspect discloses a computer-readable storage medium, the computer-readable storage medium stores computer program code, and when the computer program code is executed by a computing device, the computing device executes the aforementioned sixth aspect or the sixth aspect The method disclosed in any one of the possible implementations.
  • the storage medium includes, but is not limited to, volatile memory, such as random access memory, non-volatile memory, such as flash memory, hard disk drive (HDD), and solid state drive (SSD).
  • a tenth aspect discloses a computer program product.
  • the computer program product includes computer program code.
  • the computing device executes the aforementioned sixth aspect or any possible implementation of the sixth aspect.
  • the computer program product may be a software installation package.
  • the computer program product may be downloaded and executed on a computing device. Program product.
  • FIG. 1 is a schematic diagram of a system architecture 100 disclosed in an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another system architecture 200 disclosed in an embodiment of the present application.
  • Fig. 3 is a schematic diagram of deployment of an evaluation system disclosed in an embodiment of the present application.
  • Fig. 4 is a schematic diagram of deployment of another evaluation system disclosed in an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of an evaluation system disclosed in an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of an AI model evaluation method disclosed in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a task creation interface disclosed in an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of another AI model evaluation method disclosed in an embodiment of the present application.
  • FIG. 9 is a distribution diagram of the brightness of the label frame of the microorganism detection disclosed in the embodiment of the present application.
  • FIG. 10 is a distribution diagram of the proportion of the area of the label frame of the microbial detection in the image in the embodiment of the present application.
  • FIG. 11 is a schematic diagram of mAP before and after retraining a model corresponding to microbial cells disclosed in an embodiment of the present application.
  • FIG. 12 is a curve of FI value and confidence threshold value of an AI model used for helmet detection disclosed in an embodiment of the present application.
  • FIG. 13 is a P-R curve of an AI model for safety helmet detection disclosed in an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of another evaluation system 1500 disclosed in an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of yet another evaluation system 1600 disclosed in an embodiment of the present application.
  • 16 is a schematic structural diagram of a computing device disclosed in an embodiment of this application.
  • FIG. 17 is a schematic structural diagram of another computing device disclosed in an embodiment of the application.
  • AI artificial intelligence
  • AI has received extensive attention from academia and industry, and it has played a level beyond ordinary humans in many application fields.
  • the application of AI technology in the field of machine vision makes machine vision more accurate than humans.
  • AI technology also has better applications in fields such as natural language processing and recommendation systems.
  • Machine learning is a core means to realize AI.
  • the computer constructs an AI model based on the existing data for the technical problem to be solved, and then uses the AI model to reason about unknown data and obtain the reasoning result.
  • This method is like the computer has learned a certain ability (such as cognitive ability, discrimination ability, classification ability, etc.) like a human, so this method is called machine learning.
  • AI models such as neural network models, etc.
  • the AI model is a type of mathematical algorithm model that uses machine learning ideas to solve practical problems.
  • the AI model includes a large number of parameters and calculation formulas (or calculation rules).
  • the parameters in the AI model can be obtained by training the AI model through a data set Numerical value, for example:
  • the parameter in the AI model is the weight of the calculation formula or factor in the AI model.
  • the AI model can be divided into multiple layers or multiple nodes. Each layer or each node includes a type of calculation rule and one or more parameters (used to represent a certain mapping, relationship or transformation). Each of the AI models
  • the calculation rules and one or more parameters used by the layer or each node are called an operator.
  • An AI model can include a large number of operators.
  • the operator in a neural network, can have a one-layer structure, which can be a convolutional layer, a pooling layer, a fully connected layer, and so on.
  • the convolutional layer is used for feature extraction.
  • the pooling layer is used for downsampling.
  • the fully connected layer is used for feature extraction or classification.
  • AI models include deep convolutional neural networks, residual network (residual network, ResNet), super-resolution test sequence (visual geometry group, VGG) network, Inception network, fast (Faser)-region-based convolutional neural network (region) -based convolutional neural networks, R-CNN), single shot multibox detector (SSD) network, you only need to look once (YOLO) network, etc.
  • the AI platform is a system that provides services such as training, evaluation, and optimization of AI models to users such as individuals or enterprises.
  • the AI platform can receive user needs and data through the interface, and train and optimize various AI models that meet user needs.
  • the performance of the AI model can be evaluated for the user, and the AI model can be continuously optimized for the user based on the evaluation result.
  • the AI platform uses the AI model to reason about the evaluation data set to obtain the inference result, and then can determine the AI model pair based on the inference result and the label of the evaluation data in the evaluation data set.
  • the accuracy of the inference results of the evaluation data set is used to indicate the closeness between the AI model's inference results of the evaluation data in the evaluation data set and the real results of the evaluation data in the evaluation data set.
  • the accuracy can use many indicators To measure, such as accuracy rate, recall rate, etc.
  • R&D personnel can only obtain the accuracy value of the AI model for the entire evaluation data set, and cannot obtain more specific information, such as the impact of data characteristics on the inference results of the AI model, so that the evaluation results are relatively general and cannot be given.
  • the further optimization of the AI model provides more information.
  • Reasoning is the process of using AI models to predict the evaluation data in the evaluation data set.
  • the reasoning may be to use an AI model to recognize the name of the person corresponding to the face in the image in the evaluation data set.
  • the AI model can be invoked by reasoning code to reason about the evaluation data in the evaluation data set.
  • the reasoning code may include calling code for calling the AI model to reason about the evaluation data in the evaluation data set.
  • the inference code may also include preprocessing code for preprocessing the evaluation data in the evaluation data set, and then use the calling code to call the AI model to perform inference on the evaluation data in the preprocessed evaluation data set.
  • the inference code can also include post-processing code, which is used to perform further statistical analysis and other processing on the inference result.
  • Data feature is an abstraction of the characteristics or characteristics of the data itself, and is used to express the characteristics or characteristics of the data.
  • the data characteristics may be the aspect ratio of the image, the color degree of the image, the resolution of the image, the blur degree of the image, the brightness of the image, the saturation of the image, and the like.
  • Different data corresponds to different data feature values under the same data feature, and multiple data can be classified according to the data feature, and the data under each category is data with similar data features.
  • images of different sizes have different aspect ratios
  • the above images can be divided into three categories according to the aspect ratio of the image, one is the image whose aspect ratio is [0-0.5], a total of 5; the other is There are 3 images with the value of the aspect ratio of the image (0.5-1]; there is also a category of images with the value of the aspect ratio (1-1.5), 2 in total.
  • the embodiments of the present application disclose an AI model evaluation method, system, and equipment.
  • the method can obtain the evaluation result of the AI model on specific classified data, so that the evaluation result can be more effectively used to guide the further optimization of the AI model.
  • FIG. 1 is a schematic diagram of a system architecture 100 disclosed in an embodiment of the present application.
  • the system architecture 100 may include a training system 11, an evaluation system 12, and a terminal device 13.
  • the training system 11 and the evaluation system 12 can provide users with AI model training and evaluation services through an AI platform. .
  • the training system 11 is configured to receive the training data set sent by the user through the terminal device 13, train the initial AI model according to the training data set, and send the trained AI model to the evaluation system 12.
  • the training system 11 is also used to receive the task type input or selected by the user through the terminal device 13 on the AI platform, and determine the initial AI model according to the task type.
  • the training system 11 is also used to send the received task type to the evaluation system 12.
  • the training system 11 is also used to receive the initial AI model uploaded by the user through the terminal device 13.
  • the evaluation system 12 is used to receive the AI model from the training system 11, receive the evaluation data set uploaded by the user through the terminal device 13, use the AI model to infer the evaluation data set to obtain the inference result, and generate the inference result according to the evaluation data set and the inference result.
  • the result and/or the evaluation report of the optimization suggestion of the AI model is sent to the terminal device 13.
  • the evaluation system 12 is also used to receive task types from the training system 11.
  • the evaluation system 12 is also used to receive the task type input or selected by the user on the AI platform through the terminal device 13.
  • the terminal device 13 is configured to send data and information to the training system 11 and the evaluation system 12 according to the user's operation, or to receive information sent by the training system 11 or the evaluation system 12.
  • FIG. 2 is a schematic diagram of another system architecture 200 disclosed in an embodiment of the present application.
  • the system architecture 200 may include a terminal device 21 and an evaluation system 22.
  • the terminal device 21 is used to send the trained AI model, the evaluation data set, and the inference code to the evaluation system 22 according to the user's operation.
  • the evaluation system 22 is used to receive the trained AI model, evaluation data set, and inference code from the terminal device 21, and call the AI model through the inference code to perform inference on the evaluation data in the evaluation data set to obtain the inference result, according to the evaluation data set and inference code As a result, an evaluation report including evaluation results and optimization suggestions for the AI model is generated, and the evaluation report is sent to the terminal device 21.
  • the evaluation system 22 is also used to receive the task type sent by the user through the terminal device 21.
  • the evaluation method of the AI model provided in the present application is executed by an evaluation system.
  • the evaluation system may be the above-mentioned evaluation system 12 or the above-mentioned evaluation system 22.
  • FIG. 3 is a schematic diagram of deployment of an evaluation system disclosed in an embodiment of the present application.
  • the evaluation system can be deployed in a cloud environment.
  • the cloud environment is an entity that uses basic resources to provide cloud services to users in the cloud computing mode.
  • the cloud environment includes cloud data centers and cloud service platforms.
  • Cloud data centers include a large number of basic resources (including computing resources, storage resources, and network resources) owned by cloud service providers.
  • the computing resources included in cloud data centers can be a large number of computing equipment ( For example server).
  • the evaluation system can be independently deployed on a server or virtual machine in a cloud data center.
  • the evaluation system can also be deployed on multiple servers in a cloud data center or distributed on multiple servers in a cloud data center.
  • the evaluation system is abstracted by the cloud service provider into an evaluation cloud service provided to the user on the cloud service platform. After the user purchases the cloud service on the cloud service platform (it can be pre-charged and then settled according to the final resource usage) ), the cloud environment uses the evaluation system deployed in the cloud data center to provide users with evaluation cloud services.
  • the functions provided by the evaluation system can also be abstracted into a cloud service together with the functions provided by other systems.
  • the cloud service provider will evaluate the AI model evaluation function provided by the evaluation system and the initial AI model provided by the training system.
  • the training function is abstracted together into an AI platform cloud service.
  • the evaluation system can also be deployed in an edge environment.
  • the edge environment refers to a data center or a collection of edge computing devices that are close to users.
  • the edge environment includes one or more edge computing devices.
  • the evaluation system can be independently deployed on the edge computing device, the evaluation system can also be deployed on multiple edge servers in a distributed manner, or on multiple edge stations with computing power, or in a distributed manner. On edge servers and edge stations with computing power.
  • the evaluation system can also be deployed in other environments, such as terminal computing equipment clusters.
  • the evaluation system can be a software system that runs on computing devices such as servers.
  • the evaluation system can also be a back-end system of the AI platform, and it can be an AI model evaluation service on the AI platform, which is provided by the back-end of the evaluation system.
  • FIG. 4 is a schematic diagram of deployment of another evaluation system disclosed in an embodiment of the present application.
  • the evaluation system provided by the present application can also be deployed in different environments in a distributed manner.
  • the evaluation system provided by this application can be logically divided into multiple parts, and each part has a different function.
  • Each part of the evaluation system can be deployed in any two or three environments of the terminal computing device, the edge environment and the cloud environment.
  • Terminal computing devices include: terminal servers, smart phones, notebook computers, tablet computers, personal desktop computers, smart cameras, etc.
  • the edge environment is an environment that includes a collection of edge computing devices that are closer to the terminal computing device.
  • the edge computing devices include: edge servers, edge small stations with computing power, and so on.
  • the AI platform includes a training system and an evaluation system.
  • the training system and the evaluation system can be deployed in the same environment, such as a cloud environment, an edge environment, and so on.
  • the training system and the evaluation system can also be deployed in different environments, for example, the training system is deployed in a cloud environment, and the evaluation system is deployed in an edge environment.
  • the training system and the evaluation system can be deployed independently or distributed.
  • FIG. 5 is a schematic structural diagram of an evaluation system 500 disclosed in an embodiment of the present application.
  • the evaluation system 500 may include an input output (I/O) module 501, a data set storage module 502, an inference module 503, a performance monitoring module 504, a model analysis module 505, a data analysis module 506, and a diagnosis module.
  • the evaluation system 500 may include all or part of the above-mentioned modules. The function of each module in the evaluation system 500 is described below first.
  • the I/O module 501 is configured to receive the AI model sent from the training system or the terminal device, receive the evaluation data set and reasoning code uploaded by the user through the terminal device, and send the evaluation report to the terminal device.
  • the I/O module 501 is also used to receive the task type uploaded by the user through the terminal device.
  • the data set storage module 502 is used to store the received evaluation data set.
  • the inference module 503 is configured to use the AI model to perform inference on the evaluation data set stored in the data set storage module 502 or the received evaluation data set.
  • the performance monitoring module 504 is used to monitor the usage information of hardware resources during the inference process of the AI model, the usage time of the operators included in the AI model, and the usage number of the operators during the inference process performed by the inference module 503.
  • the number of operators used is the number of times the operators are used in the inference module 503 in the process of inference.
  • the use time of the operator is the total time and/or average time used by each operator in the inference module 503 in the process of inference.
  • the model analysis module 505 is configured to calculate the accuracy of the inference result of the AI model on the evaluation data in the evaluation data set according to the inference result of the inference module 503 and the evaluation data in the evaluation data set.
  • the data analysis module 506 is used to calculate the value of the data feature of the evaluation data in the evaluation data set under one or more data features, classify the evaluation data in the evaluation data set according to the value of the data feature, and obtain at least one evaluation data sub According to the inference result of the inference module 503 and the label of the evaluation data in each evaluation data subset, the accuracy of the AI model's inference result of the evaluation data in each evaluation data subset is calculated.
  • the diagnosis module 507 is configured to generate an evaluation report according to any one or more of the monitoring result of the performance monitoring module 504, the analysis result of the model analysis module 505, and the analysis result of the data analysis module 506.
  • the result storage module 508 is used to store the monitoring results of the performance monitoring module 504, the analysis results of the model analysis module 505, the analysis results of the data analysis module 506, and the diagnosis results of the diagnosis module 507.
  • the evaluation system provided in the embodiments of this application can provide users with services for evaluating AI models, and the evaluation system can deeply analyze the impact of different data features on the AI model and other analysis results, and further provide users with the AI model Optimization suggestions.
  • FIG. 6 is a schematic flowchart of an AI model evaluation method disclosed in an embodiment of the present application.
  • the evaluation method of the AI model is applied to the evaluation system. Since the evaluation system is independently or distributedly deployed on the computing device, the evaluation method of the AI model is applied to the computing device, that is, the evaluation method of the AI model of this application can be stored by the processor in the computing device through the execution memory. Computer instructions to execute.
  • the evaluation method of the AI model may include the following steps.
  • the AI model is a trained model.
  • the AI model can be sent by the training system or uploaded by the user through the terminal device.
  • the evaluation data set may include multiple evaluation data and labels of the multiple evaluation data, and each evaluation data corresponds to one or more labels, and the label is used to represent the true result corresponding to the evaluation data.
  • These multiple evaluation data are of the same type and can be images, videos, audios, texts, etc.
  • the evaluation data in the evaluation data set may be different or the same.
  • the evaluation data in the evaluation data set is all images, and when the task type is speech recognition, the evaluation data in the evaluation data set is audio.
  • the label is used to indicate the actual result corresponding to the evaluation data.
  • the form of the label is also different.
  • the label of the evaluation data is the true type of the target.
  • the task type is to detect the target in the image
  • the label can be the detection frame corresponding to the target in the evaluation image.
  • the shape of the detection frame can be rectangular, circular, or The straight line can also have other shapes, which are not limited here.
  • the label is a value with a specific meaning, which is a value associated with the marked evaluation data, and this value can indicate the type, location, or other of the marked evaluation data.
  • the tag may indicate that the audio is a type of audio such as pop music and classical music.
  • each evaluation data in the multiple evaluation data may correspond to one label, or may correspond to multiple labels.
  • Different AI models can be applied to different application scenarios, and the same AI model can also be applied to different application scenarios.
  • the application scenarios of the AI model are different, and the task types of the AI model may be different. Due to the different task types of AI models, the evaluation indicators and data characteristics of AI models are also different. Therefore, after the AI model is obtained, the evaluation index and data feature of the task type of the AI model can be obtained, that is, the evaluation index and data feature corresponding to the task type of the AI model can be obtained.
  • the evaluation system includes multiple task types, and each task type is respectively set with corresponding evaluation indicators and data characteristics, the evaluation indicators and data characteristics of the task type of the AI model can be obtained.
  • the evaluation indicators and data characteristics of this task type can be obtained.
  • An evaluation index of a task type may include at least one evaluation index
  • a data characteristic of a task type may include at least one data characteristic.
  • Data characteristics are the abstraction of the characteristics of the data itself.
  • the data feature may be one or more, and each data feature is used to represent one aspect of the evaluation data in the evaluation data set.
  • FIG. 7 is a schematic diagram of a task creation interface disclosed in an embodiment of the present application.
  • the task creation interface can include a data set, model type, model source, and reasoning code.
  • the task creation interface can also include other content, which is not limited here.
  • the box after the data set can be used for the user to upload the evaluation data set, and can also be used for the user to enter the storage path of the evaluation data set.
  • the box behind the model type can be used for the user to select the task type of the AI model from the stored task types, or it can be used for the user to input the task type of the AI model.
  • the box behind the model source can be used for the user to upload the AI model, or it can be used for the user to input the storage path of the AI model.
  • the box after the reasoning code can be used for the user to upload the reasoning code, and can also be used for the storage path of the reasoning code input by the user. It can be seen that after the task is created, the task type of the AI model has been determined.
  • the reasoning code is used to call the AI model to reason about the evaluation data set.
  • the reasoning code may include calling code, and the calling code may call the AI model to perform reasoning on the evaluation data set.
  • the inference code may also include preprocessing code, which is used to preprocess the evaluation data in the evaluation data set, and then call the code to call the AI model to perform inference on the preprocessed evaluation data set.
  • the inference code may also include post-processing code, which is used to process the result of the inference to obtain the result of the inference.
  • the value of the data feature of each evaluation data in the evaluation data set can be calculated, that is, each evaluation data in the evaluation data set is calculated according to the multiple evaluation data included in the data set and the labels of the multiple evaluation data
  • the value of the data characteristic is the value used to measure the characteristic of the data.
  • the data feature can be one or multiple. In the case of multiple data features, the value of each data feature in the multiple data features of each evaluation data in the evaluation data set can be calculated.
  • each evaluation data in the evaluation data set is an image
  • the data features can include the aspect ratio of the image, the mean and standard deviation of the RGB of all images, the color degree of the image, the resolution of the image, General image characteristics such as image blurriness, image brightness, and image saturation.
  • the aspect ratio of the image is the ratio of the width to the height of the image.
  • the aspect ratio of the image AS can be expressed as follows:
  • ImageH is the height of the image
  • ImageW is the width of the image.
  • the average value of the RGB of all images is the average value of the R channel values, the average value of the G channel values, and the average value of the B channel values in all the images included in the evaluation data set.
  • the mean value T mean of RGB of all images can be expressed as follows:
  • n is the number of images included in the evaluation data set.
  • (R,G,B) i in i is the sum of the R channel values of all pixels in the i-th image included in the evaluation data set
  • G in (R,G,B) i is the first in the evaluation data set
  • the B in (R, G, B) i is the sum of the B channel values of all the pixels in the i image included in the evaluation data set.
  • the average value of RGB of all images can be divided into the following three formulas:
  • T mean,R is the average value of the R channel values of n images
  • T mean,G is the average value of the G channel values of n images
  • T mean,B is the average value of the B channel values of n images
  • R i is the sum of the R channel values of all pixels in the i-th image included in the evaluation data set
  • G i is the sum of the G channel values of all pixels in the i-th image included in the evaluation data set
  • B i is the evaluation The sum of the B channel values of all pixels in the i-th image included in the data set.
  • the standard deviation T STD of RGB of all images can be expressed as follows:
  • the chromaticity of the image is the richness of the color of the image, and the chromaticity CO of the image can be expressed as follows:
  • STD() is to calculate the standard deviation of the content in parentheses.
  • the resolution of an image is the number of pixels contained in a unit inch.
  • the blur degree of the image is the blur degree of the image.
  • the brightness of the image is the brightness of the picture in the image, and the brightness BR of the image can be expressed as follows:
  • the saturation of the image is the purity of the colors in the image, and the saturation SA of the image can be expressed as follows:
  • m is the number of pixels included in an image
  • max(R,G,B) j is the maximum value of the R channel value, G channel value and B channel value in the jth pixel in an image
  • Min(R,G,B) j is the minimum value of the R channel value, the G channel value and the B channel value in the j-th pixel in an image.
  • each evaluation data in the evaluation data set is an image.
  • the data features can include the number of labeled boxes, the proportion of the area of the labeled boxes in the image, the area variance of the labeled boxes, and the distance between the labeled boxes and the edge of the image.
  • the degree, the overlap degree of the label frame, the aspect ratio of the image, etc. are based on the characteristics of the label frame, the resolution of the image, the blur degree of the image, the brightness of the image, and the saturation of the image.
  • the label box is the label of the training image in the training data set.
  • the proportion of the area of the label frame to the image is the proportion of the area of the label frame to the image area, and the proportion AR of the area of the label frame to the image can be expressed as follows:
  • BboxW is the width of the label box, that is, the width of the label box corresponding to the label included in the evaluation data.
  • BboxH is the height of the label box, that is, the height of the label box corresponding to the label included in the evaluation data.
  • the overlap degree of the label frame is the proportion of the label frame covered by other label frames.
  • the overlap degree OV of the label frame can be expressed as follows:
  • M is the difference between the number of label frames included in an image and 1
  • C is the area of the target frame in the label frame included in this image
  • area(C) is the area of the target frame
  • G k is the image including The area of the k-th label box except the target box in the label box
  • C ⁇ G k is the overlap area of the area of the target label box and the area of the k-th label box
  • area(C ⁇ G k ) is the target label
  • the distance MA of the label frame from the edge of the image can be expressed as follows:
  • imgx is the coordinate of the center point of an image on the x axis
  • imgy is the coordinate of the center point of this image on the y axis
  • x is the coordinate of the center point of the label box in this image on the x axis
  • y is the image The coordinate of the center point of the middle label box on the y-axis.
  • the data features can include the number of words, the number of non-repeated words, the length, the number of stop words, the number of punctuation marks, the number of headline words, the average length of words, and the term frequency statistics. , TF), inverse document freq uency (UDF), etc.
  • Word count used to count the number of words in each line of text.
  • the number of non-repeated words is used to count the number of words that appear only once in each line of text.
  • Length used to count how much storage space the length of each line of text occupies (including the length of spaces, symbols, letters, etc.).
  • the number of stop words is used to count the number of words in the middle (between), but (but), about (about), very (very), etc.
  • the number of punctuation marks is used to count the number of punctuation marks contained in each line of text.
  • the number of uppercase words is used to count the number of uppercase words.
  • the number of headline words is used to count the number of words whose first letter is capitalized and the other letters are lowercase.
  • the average length of words used to count the average length of each word in each line of text.
  • the data features can include short-term average zero crossing rate (zero crossing rate), short-term energy (energy), energy entropy (entropy of energy), and spectral center (spectral centroid) , Spectral spread, spectral entropy, spectral flux, etc.
  • the short-term average zero-crossing rate is the number of signal zero-crossing points in each frame of signal, which is used to reflect the frequency characteristics.
  • the short-term energy is the sum of the squares of the signal in each frame and is used to reflect the strength of the signal energy.
  • Energy entropy is similar to the spectral entropy of the spectrum, but it describes the time domain distribution of the signal and is used to reflect continuity.
  • the center of the spectrum also known as the first order of the spectrum.
  • the spread of the spectrum also known as the second-order central moment of the spectrum, describes the distribution of the signal around the center of the spectrum.
  • Spectral entropy according to the characteristics of entropy, we can know that the more uniform the distribution, the greater the entropy, and the spectral entropy reflects the uniformity of each frame of signal. For example, the speaker's spectrum appears uneven due to the presence of formants, while the white noise spectrum is more uniform.
  • VAD voice activity detection
  • Spectrum flux used to describe the changes in the spectrum of adjacent frames.
  • the value of the data feature of each evaluation data in the evaluation data set can be calculated according to the method or formula similar to that given above.
  • the evaluation data in the evaluation data set can be divided into At least one subset of evaluation data. That is, the evaluation data in the evaluation data set is classified according to the value of the data feature to obtain the evaluation data subset.
  • the evaluation data set can be divided according to each data characteristic. For example, when the task type is image classification and the data features include image brightness and image saturation, after calculating the brightness and saturation values of each image in the evaluation data set, the evaluation data in the evaluation data set can be The distribution of brightness values is further divided into at least one evaluation data subset, and the evaluation data in the evaluation data set may be divided into at least one evaluation data subset according to the distribution of saturation values.
  • the evaluation data in the evaluation data set is divided according to the distribution of data characteristic values, it may be divided according to a threshold value, may be divided according to a percentage, or may be divided in other ways, which is not limited here.
  • the data feature includes the brightness of the image
  • the evaluation data set includes 100 images.
  • the 100 images can be sorted according to the brightness value of the image from large to small or from small to large, and then the sorted 100 images are divided into four evaluation data subsets according to the percentage. These four evaluation data sub-sets
  • Each evaluation data subset in the set can include 25 images. When dividing by percentage, it can be divided equally or unevenly.
  • the data feature includes the brightness of the image
  • the evaluation data set includes 100 images.
  • the 100 images can be sorted in the order of the brightness value of the image from the largest to the smallest or from the smallest to the largest.
  • images with brightness values greater than or equal to the first threshold can be divided into the first evaluation data subset
  • images with brightness values less than the first threshold and greater than or equal to the second threshold can be divided into the second evaluation data subset
  • Images whose brightness value is less than the second threshold and greater than or equal to the third threshold are classified into the third evaluation data subset
  • the images whose brightness value is less than the third threshold may be divided into the fourth evaluation data subset.
  • the first threshold, the second threshold, and the third threshold are sequentially reduced, and the number of images included in the first data subset, the second data subset, the third data subset, and the fourth data subset may be the same or different.
  • the data feature values of all evaluation data in each evaluation data subset obtained after division satisfy the same set of conditions.
  • the condition can be: the values of the data features of all the evaluation data in the evaluation data subset are within a specific numerical range (for example: the brightness values of all the evaluation data images are within the range of 0-20%), or the evaluation data subset
  • the value of the data feature of all the evaluation data conforms to the specific feature (for example, the aspect ratio of the image of all the evaluation data is an even number).
  • the evaluation data set may be divided according to multiple data characteristics to obtain at least one evaluation data subset, and the values of multiple data characteristics of the evaluation data in the evaluation data subset obtained by the division are Satisfying multiple sub-conditions in the same set of conditions, that is, the value of each data feature of the evaluation data in the evaluation data subset satisfies a sub-condition corresponding to the data feature.
  • the evaluation data is an image, and its data features include two: the brightness of the image and the aspect ratio of the image.
  • the images in the evaluation data set whose brightness is within the first threshold and the aspect ratio of the image is within the second threshold can be divided into an evaluation data subset, that is, two data corresponding to all evaluation data in the evaluation data subset.
  • the value of each data feature satisfies a corresponding sub-condition respectively.
  • the evaluation data subset is a subset of the evaluation data set, that is, the evaluation data included in the evaluation data subset is part of the evaluation data included in the evaluation data set.
  • the model After obtaining the AI model and the evaluation data set, or after dividing the evaluation data in the evaluation data set into at least one evaluation data subset according to the distribution of the data feature value of each evaluation data in the evaluation data set, you can use AI
  • the model performs inference on the evaluation data of each evaluation data subset in at least one evaluation data subset to obtain the inference result.
  • the evaluation data in each evaluation data subset can be input into the AI model to perform inference on the evaluation data in the evaluation data subset.
  • the AI model can be invoked by reasoning code to reason about the evaluation data in the evaluation data subset.
  • the reasoning code may include calling code for calling the AI model to perform reasoning on the evaluation data in the evaluation data subset.
  • the evaluation data in the evaluation data subset is preprocessed.
  • the inference code may also include preprocessing code for preprocessing the evaluation data in the evaluation data subset.
  • the inference code may also include post-processing code for post-processing the result of the inference.
  • the preprocessing code, the calling code, and the postprocessing code are executed in sequence. Under the system architecture corresponding to Figure 1, the reasoning code is developed based on the AI model. In the system architecture corresponding to Figure 2, the inference code is provided by the customer.
  • the sequence of steps 603 and 604 may not be followed, and the AI model may be used to reason about all the evaluation data in the evaluation data set.
  • the AI model may be used to reason about all the evaluation data in the evaluation data set.
  • the AI model After using the AI model to infer the evaluation data in at least one evaluation data subset to obtain the inference result, you can first compare the inference result of each evaluation data with the label of each evaluation data. When the inference result of the evaluation data is compared with the evaluation data When the labels are the same, the AI model's inference result for the evaluation data can be considered accurate, and the comparison result is correct; when the inference result of the evaluation data and the evaluation data label are not the same, the AI model can be considered as the inference result of the evaluation data Is inaccurate, the comparison result is incorrect. According to the comparison result, the accuracy of the AI model's reasoning for each evaluation data subset can be calculated, and the evaluation result can be obtained.
  • the AI model can calculate the AI model’s inference result for the evaluation data of each evaluation data subset in at least one evaluation data subset based on the comparison result.
  • the evaluation index value under the evaluation index is used to obtain the evaluation result.
  • the accuracy can be measured using one or more evaluation indicators of the AI model.
  • the evaluation indicators can include confusion matrix, accuracy, precision, recall, receiver operating characteristic (ROC) curve, and F1 value. (score) and so on.
  • the categories can include positive and negative categories.
  • the samples can be divided into true positive (TP), true negative (TN), and true negative (TN) based on their true and predicted categories. False positive (FP) and false negative (FN).
  • TP is the number of samples predicted by the AI model that the category is positive and the true category is positive, that is, the samples marked by the first label are positive samples, and the inference results of the samples marked by the first label are positive. .
  • TN is the number of samples whose category is negative and the true category is negative, which is predicted by the AI model, that is, the sample marked by the first label is negative, and the inference result of the sample marked by the first label is negative.
  • FP is the number of samples predicted by the AI model for the positive class and the real sample is the negative class, that is, the sample marked by the first label is a negative sample, and the inference result of the sample marked by the first label is positive.
  • FN is the number of samples predicted by the AI model that the category is negative and the true category is positive, that is, the sample marked by the first label is a positive sample, and the inference result of the sample marked by the first label is negative.
  • the confusion matrix includes TP, TN, FP and FN. The confusion matrix can be shown in Table 1:
  • the accuracy rate is the ratio of the number of samples that are correctly predicted to the total number of samples.
  • the accuracy rate AC can be expressed as follows:
  • the accuracy rate is the ratio of the number of samples that are correctly predicted to be positive to the number of samples that are predicted to be positive.
  • the accuracy rate PR can be expressed as follows:
  • the recall rate is the proportion of the number of samples correctly predicted to be positive to the number of all positive samples.
  • the recall rate RE can be expressed as follows:
  • the F1 value is the ratio of the arithmetic mean to the geometric mean, and the F1 value can be expressed as follows:
  • the ROC curve is a curve with a true positive ratio (TPR) on the vertical axis and a false positive ratio (FPR) on the horizontal axis.
  • TPR is the ratio of the number of samples that are predicted to be positive and the number of samples that are actually positive to the number of samples that are actually positive.
  • FPR is the proportion of the number of samples that are predicted to be positive and the number of true vices to the number of samples that are true to be negative.
  • the evaluation indicators may include average precision (mean average precision, mAP), accuracy-recall (presicion-recall, P-R) curve, etc.
  • the P-R curve is the recall rate on the abscissa and the accuracy on the ordinate.
  • mAP is the average value of average precision (AP)
  • AP is the area enclosed by the P-R curve.
  • Q is the number of labels
  • AP(q) is the average accuracy of the qth label
  • N is the number of predicted labeled boxes
  • RE idx is the predicted recall rate of the idxth labeled box
  • RE idx-1 is the prediction The recall rate of the idx-1th labeled box
  • PR idx is the predicted precision rate of the idxth labeled box.
  • the evaluation indicators may include accuracy, precision, recall, F1 value, and so on.
  • the evaluation index may include accuracy rate, precision rate, recall rate, F1 value, and so on.
  • the evaluation index value under the evaluation index can be calculated according to the above formula, or can be calculated according to other methods, and it is not limited here.
  • the evaluation result may include the evaluation index value under the evaluation index of the inference result of the evaluation data in the evaluation data subset corresponding to each data feature by the AI model. For one evaluation index and one data feature, multiple data feature values under this data feature can correspond to one evaluation index value under this evaluation index.
  • the evaluation result may also include phenomena obtained from the evaluation index value under the evaluation index based on the inference result of the evaluation data in the evaluation data subset corresponding to each data feature by the AI model, such as the brightness of the image has a greater influence on the accuracy.
  • the task type is face detection
  • the data features include the proportion of the area of the labeled frame in the image
  • the evaluation indicators include the recall rate.
  • the above method may further include: generating an optimization suggestion for the AI model according to the evaluation result.
  • the optimization suggestion may be based on the current evaluation result of the AI model for each evaluation data subset, It is recommended to continue to add new data that meets the same set of conditions as the evaluation data in one or more evaluation data subsets to continue training the AI model.
  • the current AI model does not meet the accuracy of the inference of the one or more evaluation data subsets. The accuracy of the model requirements or the current AI model's reasoning on the one or more evaluation data subsets is lower than that of other evaluation data subsets.
  • the optimization suggestion can be to train the AI model with new data that satisfies the condition of 0%-20% of the area of the labeled frame in the image.
  • the new data that is continuously used for training obtained according to the optimization suggestion may be re-collected data, or may be data after adjusting the data feature values of the data in the original training data.
  • the sensitivity of the data feature to the evaluation index can be determined according to the evaluation result. Specifically, it is possible to perform regression analysis on the value of the data feature and the AI model’s inference result of the evaluation data of each evaluation data subset corresponding to each data feature under the evaluation index value to obtain the value of the data feature against the evaluation index. Sensitivity. That is, the value of the data feature can be used as input, and the inference result of the evaluation data of each evaluation data subset corresponding to each data feature of the AI model can be used as the output of the evaluation index value under the evaluation index, and regression analysis can be performed.
  • the sensitivity of data characteristics to evaluation indicators can be used as input, and the inference result of the evaluation data of each evaluation data subset corresponding to each data feature of the AI model.
  • the value of a set of data features is the z t vector, such as the brightness value, sharpness value, resolution value, and saturation value of the image.
  • the inference result of the evaluation data of the evaluation data subset corresponding to the data feature is the evaluation index value under the evaluation index as f(z t ), and the fitted W vector is the influence weight of each data feature on each evaluation index, namely Sensitivity.
  • the optimization suggestion for the AI model can be generated according to the sensitivity of each data feature to each evaluation index.
  • the sensitivity is greater than a certain value, it can be considered that the data feature has a greater impact on the evaluation index, and corresponding optimization suggestions can be generated for this phenomenon.
  • the optimization proposal after continuing to train the current AI model with new data, the reasoning ability of the AI model can be improved with a greater probability.
  • the above method may further include: generating an evaluation report, and sending the evaluation report.
  • the evaluation report may include at least one of evaluation results and optimization suggestions.
  • the AI model s inference accuracy for each evaluation data subset is calculated. After the evaluation results are obtained, and/or after the optimization suggestions for the AI models are generated based on the evaluation results, the evaluation results and/or optimization suggestions can be generated. Evaluation Report.
  • the foregoing method may further include: calculating the accuracy of the overall reasoning of the AI model on the evaluation data set.
  • the AI model can first determine the inference results of the evaluation data in the evaluation data set, then compare the inference results of each evaluation data with the label of each evaluation data, and finally calculate the AI model's inference results on the evaluation data set based on the comparison results.
  • the accuracy of the reasoning is the result of the AI model's evaluation of the global data.
  • the AI model is used to evaluate the overall reasoning ability of the evaluation data set, and the reasoning ability of the AI model on global data can be evaluated, that is, the reasoning ability of the AI model on any kind of data that can be used as the input of the AI model.
  • the global data in this application is data that is not classified according to any data feature, and it can represent any kind of data that can be used as the input of the AI model.
  • the aforementioned evaluation report may also include the accuracy of the AI model's reasoning in the evaluation data set.
  • the above method may further include: obtaining performance parameters.
  • the performance parameters can be obtained by monitoring the use information of hardware resources, the use time of the operators included in the AI model, and the number of operators.
  • Hardware resources may include central processing unit (CPU), graphics processing unit (GPU), physical memory, GPU video memory, and so on.
  • the performance monitoring process can be used to monitor the inference process.
  • GPU performance monitoring tools such as NVIDIA system management interface (SMI)
  • SMI system management interface
  • optimization suggestions may also include optimization suggestions generated according to performance parameters.
  • optimization suggestions for the AI model can be generated according to the performance parameters.
  • the performance optimization suggestions for the AI model can be generated according to the usage information of the hardware resources, the usage duration of the operators included in the AI model, the number of operators used, and the performance tuning knowledge base.
  • the performance tuning knowledge base may include the phenomenon corresponding to the usage information of the hardware resource, the phenomenon corresponding to the usage situation of the operator, the phenomenon corresponding to the usage information of the hardware resource, and the performance optimization method corresponding to the phenomenon corresponding to the usage situation of the operator.
  • the performance optimization suggestion can be to adjust the accuracy of the parameters of the AI model to 8bit quantization, or to enable operator fusion.
  • the phenomenon corresponding to the usage information of hardware resources is more memory consumption, and the performance optimization method corresponding to the phenomenon corresponding to the usage information of hardware resources may be adjusting the accuracy of the parameters of the AI model to half-precision or int8 quantization.
  • the above steps can be performed multiple times, that is, multiple evaluations.
  • the execution steps are the same each time, the difference is that the evaluation data set used each time is slightly different.
  • the evaluation data set used for the first time is the evaluation data set uploaded by the received user or sent by the terminal device
  • the evaluation data set used subsequently is the evaluation data set after adjusting the data characteristics of the evaluation data in the received evaluation data set.
  • the adjustment may be adding noise, changing the brightness value of some data in an evaluation data, or adjusting other data characteristics of the evaluation data, which is not limited here.
  • the multiple evaluation reports and optimization suggestions can be integrated to get more accurate recommendations and reports, which can improve the robustness of the evaluation.
  • the evaluation data set used for the second time has increased noise relative to the received evaluation data set.
  • the accuracy and precision rate of the second evaluation report are reduced, indicating that noise has an effect on the AI model.
  • the impact is greater, so noise interference can be avoided as much as possible.
  • the evaluation of the AI model in the embodiments of the present application can also call engine-related tools, such as the profiler tool provided by TensorFlow, the profiler tool provided by MXNet, etc., to analyze the structure of the AI model, the operators included in the AI model, The time complexity of the operator, the space complexity of the operator, etc.
  • the structure of the AI model can include residual structure, multi-level feature extraction, and so on.
  • the above optimization suggestion may also include a structural modification suggestion given to the AI model based on the above analysis. For example, in the case where it is analyzed that the AI model does not include a batch normalization layer, since it will bring about the risk of overfitting, a suggestion to increase the BN layer can be generated.
  • the structure of the AI model includes multi-level features for feature extraction to classification, and the label frame to be recognized includes multiple scales, it may not be possible to recognize the label frame of all scales, but only part of the scale can be recognized. Callout box.
  • the time complexity and space complexity of the operator can be linear complexity or exponential complexity. In the case that the space complexity of the operator is exponential complexity, it indicates that the structure of the AI model is relatively complicated, and suggestions for cutting support can be generated, that is, to adjust the structure of the AI model.
  • the above suggestions and reports can be provided to users through GUI, can also be provided to users through java script object notation (JSON) documents, or can be sent to users' terminal devices.
  • JSON java script object notation
  • FIG. 8 is a schematic flowchart of another AI model evaluation method disclosed in an embodiment of the present application.
  • the evaluation method of the AI model is applied to the evaluation system.
  • the evaluation method of the AI model may include the following steps.
  • step 801 For a detailed description of step 801, refer to step 601.
  • step 802 is different from step 604 in that step 802 is to perform reasoning on the evaluation data in the evaluation data set, and there is no need to divide the evaluation data set, and step 604 is to divide the evaluation data in the evaluation data set into at least one evaluation data subset. To perform reasoning on the evaluation data, it is first necessary to divide the evaluation data in the evaluation data set into at least one evaluation data subset.
  • the performance data is used to represent the performance of the hardware performing the inference process during the AI model inferring the evaluation data, or the usage of the operators included in the AI model during the AI model inferring the evaluation data.
  • the usage of the operator represents the usage time of each operator in the AI model in the inference process or the number of each operator used in the AI model.
  • optimization suggestions for the AI model can be generated based on the performance data.
  • the optimization suggestions may include adjustments to the structure of the AI model, or optimization training for the operators of the AI model.
  • the above method may further include: generating an evaluation report, and sending the evaluation report.
  • an evaluation report can be generated and sent, which can be sent to the terminal device or sent to the user's mailbox, etc.
  • the evaluation report may include at least one of performance data and optimization suggestions.
  • the foregoing method may further include: calculating the accuracy of the AI model's reasoning on the evaluation data set.
  • the AI model can first determine the inference results of the evaluation data in the evaluation data set, then compare the inference results of each evaluation data with the label of each evaluation data, and finally calculate the AI model's inference results on the evaluation data set based on the comparison results.
  • the accuracy of reasoning For detailed description, please refer to the relevant description above.
  • the above steps are performed for the AI model in which the evaluation data in the evaluation data set is microbial images and the task type is object detection.
  • the inference results include the detected epithelial cells, subspores, cocci, white blood cells, spores, fungi and clue cells.
  • the evaluation result in the evaluation report may include the FI value of the evaluation data of the AI model on the evaluation data of the 4 evaluation data subsets divided according to the brightness value distribution, as shown in the table 3 shows:
  • the microbial images can be arranged in order of brightness value from large to small or from small to large, and then the top 25% (ie 0-25%) evaluation data is determined as the first Evaluate the data subset, determine the next 25% (ie 25%-50%) evaluation data as the second evaluation data subset, and then determine the next 25% (ie 50%-75%) evaluation data as the first
  • the third evaluation data subset, the final 25% (ie 75%-100%) evaluation data is determined as the fourth evaluation data subset.
  • the F1 values of the epithelial cells, subspores, cocci, leukocytes, spores, fungi and clue cells in the first evaluation data subset-fourth evaluation data subset are respectively calculated.
  • the first data can also be calculated Subset-The mAP of the F1 values of epithelial cells, subspores, cocci, leukocytes, spores, fungi and clue cells in the fourth data subset, as well as the epithelial cells, subspores, cocci, leukocytes, and spores calculated for all evaluation data
  • the standard deviation STD of the F1 value of the bacteria and clue cells is the sensitivity.
  • the evaluation result in the evaluation report may include the FI value of the AI model on the evaluation data of the 4 evaluation data subsets divided according to the size distribution of the label box. As shown in Table 4:
  • FIG. 9 is a distribution diagram of the brightness of the labeled frame for microbial detection disclosed in an embodiment of the present application. As shown in Figure 9, the brightness of the area where the label frame is located is mostly concentrated between 50-170. Please refer to FIG. 10. FIG.
  • FIG. 10 is a distribution diagram of the proportion of the area of the labeled frame of the microbial detection in the image disclosed in the embodiment of the present application. As shown in Figure 10, the area of the label frame accounts for most of the proportion of the image, which is concentrated in the range of 0-0.05.
  • the evaluation report can also include performance data, and the usage information of hardware resources in the acquired performance data can be shown in Table 5:
  • Hardware resource usage information Peak Mean GPU usage 65% 30% CPU usage 60% 40% Physical memory 390M 270M GPU memory 1570M 1240M
  • FIG. 11 is a schematic diagram of mAP before and after retraining of an AI model corresponding to microbial cells disclosed in an embodiment of the present application.
  • the mAP before retraining is 0.4421.
  • the mAP after retraining after randomly scaling the image is 0.4482, and the mAP after retraining after adjusting the brightness of the image is 0.45. It can be seen that the AI model after retraining according to the recommendations is better than the one before retraining.
  • the above steps are performed for the trained AI model whose evaluation data in the evaluation data set is a person image and the task type is object detection.
  • the inference results include five categories, namely, no safety helmet, white safety helmet, yellow safety helmet, red safety helmet, and blue safety helmet.
  • FIG. 12 is a curve of the FI value and the confidence threshold of an AI model for helmet detection disclosed in an embodiment of the present application.
  • the F1 value is calculated by the step of calculating the accuracy of the AI model's reasoning for each evaluation data subset to obtain the evaluation result according to the comparison result. As shown in Figure 12, as the confidence threshold increases, the F1 value first increases and then decreases.
  • FIG. 13 is a P-R curve of an AI model for helmet detection disclosed in an embodiment of the present application.
  • the P-R curve is calculated by calculating the accuracy of the AI model's inference for each evaluation data subset according to the comparison result to obtain the evaluation result.
  • the P-R curves of the five types of test results are different.
  • the evaluation report may include the recall rate value of the evaluation data of the 4 evaluation data subsets divided according to the ambiguity distribution by the AI model, as shown in Table 7:
  • Table 7 The recall value of the evaluation data of the 4 evaluation data subsets divided according to the ambiguity distribution
  • the evaluation report can include the AI model's recall rate values of the evaluation data of the 4 evaluation data subsets divided according to the number distribution of the labeled boxes, as shown in the table 8 shows:
  • Table 8 The recall value of the evaluation data of the four evaluation data subsets divided according to the number distribution of the labeled boxes. Table 8 can be used to obtain the effect of the ambiguity of the image on the impact of the unsafe helmet, the yellow safety helmet, and the white safety helmet. Large, correspondingly, suggestions can be given to train the AI model by increasing the number of images with the number of labeled frames between 85%-100%.
  • FIG. 14 is a schematic structural diagram of another evaluation system 1400 disclosed in an embodiment of the present application.
  • the evaluation system 1400 may include an I/O module 1401, a data analysis module 1402, and an inference module 1403.
  • the evaluation system 1400 may further include a diagnosis module 1404.
  • the evaluation system 1400 may further include a performance monitoring module 1405.
  • the evaluation system 1400 may further include a model analysis module 1406.
  • the I/O module 1401, the data analysis module 1402, the inference module 1403, the performance monitoring module 1405, and the model analysis module 1406 in the evaluation system 1400 please refer to the method embodiment corresponding to FIG. 6.
  • FIG. 15 is a schematic structural diagram of another evaluation system 1500 disclosed in an embodiment of the present application.
  • the evaluation system 1500 may include an I/O module 1501, an inference module 1502, a performance monitoring module 1503, and a diagnosis module 1504.
  • the evaluation system 1500 may further include a model analysis module 1505.
  • the I/O module 1501 the reasoning module 1502, the performance monitoring module 1503, the diagnosis module 1504, and the model analysis module 1505 in the evaluation system 1500, please refer to the method embodiment corresponding to FIG. 8.
  • FIG. 16 is a schematic structural diagram of a computing device disclosed in an embodiment of the application.
  • the computing device 1600 includes a memory 1601, a processor 1602, a communication interface 1603, and a bus 1604.
  • the memory 1601, the processor 1602, and the communication interface 1603 implement communication connections between each other through the bus 1604.
  • the memory 1601 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 1601 may store a program. When the program stored in the memory 1601 is executed by the processor 1602, the processor 1602 and the communication interface 1603 are used to execute the aforementioned method for evaluating the AI model by the user in FIG. 6 or FIG. 8.
  • the memory 1601 may also store evaluation data sets.
  • the processor 1602 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more integrated circuit.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • GPU graphics processing unit
  • the communication interface 1603 uses a transceiver module such as but not limited to a transceiver to implement communication between the computing device 1600 and other devices or a communication network.
  • a transceiver module such as but not limited to a transceiver to implement communication between the computing device 1600 and other devices or a communication network.
  • the evaluation data set can be obtained through the communication interface 1603.
  • the bus 1604 may include a path for transferring information between various components of the computing device 1600 (for example, the memory 1601, the processor 1602, and the communication interface 1603).
  • FIG. 17 is this application A schematic structural diagram of another computing device disclosed in the embodiment.
  • the computing device includes multiple computers, and each computer includes a memory, a processor, a communication interface, and a bus.
  • the memory, the processor, and the communication interface realize the communication connection between each other through the bus.
  • the memory can be ROM, static storage device, dynamic storage device or RAM.
  • the memory may store a program. When the program stored in the memory is executed by the processor, the processor and the communication interface are used to execute part of the method used by the evaluation system to evaluate the AI model for the user.
  • the memory can also store evaluation data sets. For example, a part of the storage resources in the memory is divided into a data set storage module for storing the evaluation data set required by the evaluation system, and a part of the storage resources in the memory is divided into a result storage module , Used to store evaluation reports.
  • the processor may be a general-purpose CPU, microprocessor, ASIC, GPU, or one or more integrated circuits.
  • the communication interface uses a transceiver module such as but not limited to a transceiver to implement communication between the computer and other devices or communication networks.
  • a transceiver module such as but not limited to a transceiver to implement communication between the computer and other devices or communication networks.
  • the evaluation data set can be obtained through the communication interface.
  • the bus may include a path for transferring information between various components of the computer (for example, a memory, a processor, and a communication interface).
  • a communication path is established between each of the above-mentioned computers through a communication network.
  • Any one or more modules of the evaluation system 500, the evaluation system 1400, and the evaluation system 1500 are run on each computer.
  • Any computer can be a computer in a cloud data center (for example, a server), a computer in an edge data center, or a terminal computing device.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product that provides evaluation includes one or more computer instructions for evaluation. When these computer program instructions are loaded and executed on a computer, the process or function described in FIG. 6 or FIG. 8 according to the embodiment of the present invention is generated in whole or in part. .
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server, or data center via wired (such as coaxial cable, optical fiber, digital subscriber line, or wireless (such as infrared, wireless, microwave, etc.)).
  • the computer-readable storage medium stores and provides A readable storage medium of the computer program instructions to be evaluated.
  • the computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, an SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种人工智能AI模型的评估方法,涉及AI领域,包括:计算设备获取AI模型和评估数据集,评估数据集包括多个携带用于表示评估数据对应的真实结果的标签的评估数据;根据数据特征对评估数据集中的评估数据进行分类以获得评估数据子集,评估数据子集中的所有评估数据的数据特征的值满足条件;确定AI模型对评估数据子集中的评估数据的推理结果,将评估数据子集中的每个评估数据的推理结果和标签进行比较,根据比较结果计算AI模型对评估数据子集的推理的准确度,以获得AI模型对数据特征的值满足该条件的数据的评估结果。上述方法可以得到AI模型对特定分类的数据的评估结果,可以更好地指导AI模型的优化方向。

Description

人工智能AI模型的评估方法、系统及设备 技术领域
本申请涉及人工智能(artificial intelligence,AI)领域,尤其涉及一种AI模型的评估方法、系统及设备。
背景技术
随着深度学习技术的不断发展,应用于不同场景的AI模型被不断训练出来,例如:被训练的用于图像分类的AI模型、被训练的用于物体识别的AI模型等。由于训练出来的AI模型可能存在一些问题,如:已训练的用于图像分类的AI模型对全部的输入图像或者部分的输入图像存在分类准确率较低的问题等。因此,需要对训练出来的AI模型进行评估。
现有技术中无法对AI模型作出具有指导性的评估。
发明内容
本申请公开了一种AI模型的评估方法、系统及设备,用于更有效地评估AI模型。
第一方面公开一种AI模型的评估方法,计算设备获取AI模型和包括多个携带标签的评估数据的评估数据集,根据数据特征对评估数据集中的评估数据进行分类,获得评估数据子集,其中,评估数据子集为评估数据集的子集,评估数据子集中的所有评估数据的数据特征的值满足条件,该计算设备进一步确定AI模型对评估数据子集中的评估数据的推理结果,将评估数据子集中的每个评估数据的推理结果和评估数据子集中的每个评估数据的标签进行比较,根据比较结果计算AI模型对评估数据子集的推理的准确度,以获得AI模型对数据特征的值满足条件的数据的评估结果。
上述方法可以得到AI模型对特定分类的数据的评估结果,该评估结果可以用于更好地指导对AI模型进行进一步的优化。上述每个评估数据的标签用于表示评估数据对应的真实结果。
作为一种可能的实施方式,计算设备可以生成对AI模型的优化建议。优化建议可以包括:用数据特征的值满足该条件的新数据训练AI模型。根据本申请获得的评估结果给出的对AI模型更具体的优化建议,可有效地优化AI模型,提升优化后的AI模型的推理能力,避免了技术人员仅根据经验进行AI模型的优化而带来的优化效果不佳的问题。
作为一种可能的实施方式,计算设备可以生成包括评估结果和/或优化建议的评估报告,并发送该评估报告至用户的设备或系统,以便用户可以根据评估报告了解AI模型对特定分类的数据的评估结果,以及根据评估报告对AI模型进行优化。
作为一种可能的实施方式,计算设备可以获取性能数据,性能数据可以表示在AI模型对评估数据进行推理的过程中,执行推理过程的硬件的性能表现,和/或,可以表示在AI模型对评估数据进行推理的过程中AI模型包括的算子的使用情况,以便用户根据性能数据了解AI模型对硬件的影响以及AI模型中算子的使用情况,以及可以根据性能数据对AI模型进行相应的优化。
作为一种可能的实施方式,性能数据可以包括中央处理器(central processing unit, CPU)的使用率、图形处理器(graphics processing unit,GPU)的使用率、内存使用量、显存使用量、算子的使用时长、算子的使用数量中的一种或多种。
作为一种可能的实施方式,上述数据特征的数量可以为多个,上述条件可以包括多个子条件,多个数据特征和多个子条件的关系为一一对应。计算设备在根据数据特征对评估数据集中的评估数据进行分类,获得评估数据子集时,可以根据上述多个数据特征对评估数据集中的评估数据进行分类,获得评估数据子集。其中,评估数据子集中的所有评估数据的多个数据特征的值中的每个值满足上述条件中对应的子条件。上述方法根据多个数据特征对评估数据集进行分类,可以得到AI模型对特定分类的数据的评估结果,该评估结果可以更好地用于指导AI模型的进一步优化方向。
作为一种可能的实施方式,计算设备可以确定AI模型对评估数据集中的评估数据的推理结果,根据评估数据集中的评估数据的推理结果和评估数据集中的评估数据的标签的比较结果,计算AI模型对所述评估数据集的推理的准确度,以获得所述AI模型对全局数据的评估结果。上述方法可以直观地得到AI模型对全局数据的整体推理能力。
作为一种可能的实施方式,评估数据集中的评估数据可以为图像,也可以为音频。
第二方面公开一种AI模型的评估系统,所述系统包括:
输入输出(input output,I/O)模块,用于获取所述AI模型和评估数据集,所述评估数据集包括多个携带标签的评估数据,每个评估数据的标签用于表示所述评估数据对应的真实结果;
数据分析模块,用于根据数据特征对所述评估数据集中的评估数据进行分类,以获得评估数据子集,所述评估数据子集为所述评估数据集的子集,所述评估数据子集中的所有评估数据的所述数据特征的值满足条件;
推理模块,用于确定所述AI模型对所述评估数据子集中的评估数据的推理结果;
所述数据分析模块,还用于将所述评估数据子集中的每个评估数据的推理结果和所述评估数据子集中的每个评估数据的标签进行比较,根据比较结果计算所述AI模型对所述评估数据子集的推理的准确度,以获得所述AI模型对所述数据特征的值满足所述条件的数据的评估结果。
作为一种可能的实施方式,所述系统还包括:
诊断模块,用于生成对所述AI模型的优化建议,所述优化建议包括:用所述数据特征的值满足所述条件的新数据训练所述AI模型。
作为一种可能的实施方式,所述诊断模块,还用于生成评估报告,所述评估报告包括所述评估结果和/或所述优化建议;
所述I/O模块,还用于发送所述评估报告。
作为一种可能的实施方式,所述系统还包括:
性能监测模块,用于获取性能数据,所述性能数据用于表示在所述AI模型对所述评估数据进行推理的过程中,执行所述推理过程的硬件的性能表现,或者在所述AI模型对所述评估数据进行推理的过程中所述AI模型包括的算子的使用情况。
作为一种可能的实施方式,所述性能数据包括以下数据中的一种或多种:中央处理器CPU的使用率、图形处理器GPU的使用率、内存使用量、显存使用量、算子的使用时 长、算子的使用数量。
作为一种可能的实施方式,所述推理模块,还用于确定所述AI模型对所述评估数据集中的评估数据的推理结果;
所述系统还包括:
模型分析模块,用于根据所述评估数据集中的评估数据的推理结果和所述评估数据集中的评估数据的标签的比较结果,计算所述AI模型对所述评估数据集的推理的准确度,以获得所述AI模型对全局数据的评估结果。
作为一种可能的实施方式,所述数据特征的数量为多个,所述条件包括多个子条件,所述多个数据特征和所述多个子条件的关系为一一对应;
所述数据分析模块,具体用于根据所述多个数据特征对所述评估数据集中的评估数据进行分类,以获得评估数据子集,其中,所述评估数据子集中的所有评估数据的所述多个数据特征的值中的每个值满足所述条件中对应的子条件。
作为一种可能的实施方式,所述评估数据集中的评估数据为图像或者音频。
第三方面公开一种计算设备,所述计算设备包括存储器和处理器,所述存储器用于存储一组计算机指令;所述处理器执行所述存储器存储的一组计算机指令,以使得所述计算设备执行第一方面或第一方面的任意一种可能的实施方式公开的方法。
第四方面公开一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被计算设备执行时,所述计算设备执行前述第一方面或第一方面的任意一种可能的实施方式中公开的方法。该存储介质包括但不限于易失性存储器,例如随机访问存储器,非易失性存储器,例如快闪存储器、硬盘(hard disk drive,HDD)、固态硬盘(solid state drive,SSD)。
第五方面公开一种计算机程序产品,所述计算机程序产品包括计算机程序代码,在所述计算机程序代码被计算设备执行时,所述计算设备执行前述第一方面或第一方面的任意可能的实施方式中公开的方法。该计算机程序产品可以为一个软件安装包,在需要使用前述第一方面或第一方面的任意可能的实施方式中公开的方法的情况下,可以下载该计算机程序产品并在计算设备上执行该计算机程序产品。
第六方面公开一种AI模型的评估方法,计算设备可以获取AI模型和包括多个携带标签的评估数据的评估数据集,利用AI模型对评估数据集中的评估数据进行推理,获取性能数据,根据性能数据,生成对AI模型的优化建议。上述方法根据本申请的评估方法获取的性能数据给出对AI模型更具体的优化建议,避免了技术人员仅根据经验进行AI模型的优化而带来的优化效果不佳的问题。性能数据用于表示在AI模型对评估数据进行推理的过程中,执行推理过程的硬件的性能表现,或者在AI模型对评估数据进行推理的过程中AI模型包括的算子的使用情况。优化建议可以包括对AI模型的结构进行调整,或者,对AI模型的算子进行优化训练。
作为一种可能的实施方式,计算设备可以生成包括性能数据和/或优化建议的评估报告,发送评估报告,以便用户可以根据评估报告了解AI模型基于数据特征的推理能力,以及根据评估报告对AI模型进行优化。
作为一种可能的实施方式,在AI模型对所述评估数据进行推理的过程中所述AI模型包括的算子的使用情况,包括:AI模型的算子的使用时长、AI模型的算子的使用数量。
作为一种可能的实施方式,在AI模型对评估数据进行推理的过程中,执行推理过程的硬件的性能表现,包括:CPU的使用率、GPU的使用率、内存使用量和显存使用量中的一种或多种。
作为一种可能的实施方式,计算设备可以确定AI模型对评估数据集中的评估数据的推理结果,根据评估数据集中的评估数据的推理结果和评估数据集中的评估数据的标签的比较结果,计算AI模型对所述评估数据集的推理的准确度,以获得所述AI模型对全局数据的评估结果。上述方法可以直观地得到AI模型对全局数据的整体推理能力。
作为一种可能的实施方式,评估数据集中的评估数据可以为图像,也可以为音频。
第七方面公开一种AI模型的评估系统,所述系统包括:
I/O模块,用于获取所述AI模型和评估数据集,所述评估数据集包括多个携带标签的评估数据,每个评估数据的标签用于表示所述评估数据对应的真实结果;
推理模块,用于利用所述AI模型对所述评估数据集中的评估数据进行推理;
性能监测模块,用于获取性能数据,所述性能数据用于表示在所述AI模型对所述评估数据进行推理的过程中,执行所述推理过程的硬件的性能表现,或者在所述AI模型对所述评估数据进行推理的过程中所述AI模型包括的算子的使用情况;
诊断模块,用于根据所述性能数据,生成对所述AI模型的优化建议,所述优化建议包括:对所述AI模型的结构进行调整,或者,针对所述AI模型的算子进行优化训练。
作为一种可能的实施方式,所述诊断模块,还用于生成评估报告,所述评估报告包括所述性能数据和/或所述优化建议;
所述I/O模块,还用于发送所述评估报告。
作为一种可能的实施方式,在所述AI模型对所述评估数据进行推理的过程中所述AI模型包括的算子的使用情况,包括:所述AI模型的算子的使用时长、所述AI模型的算子的使用数量。
作为一种可能的实施方式,在AI模型对评估数据进行推理的过程中,执行推理过程的硬件的性能表现,包括:CPU的使用率、GPU的使用率、内存使用量和显存使用量中的一种或多种。
作为一种可能的实施方式,所述推理模块,还用于确定AI模型对评估数据集中的评估数据的推理结果;
所述系统还包括:
模型分析模块,用于根据评估数据集中的评估数据的推理结果和评估数据集中的评估数据的标签的比较结果,计算AI模型对所述评估数据集的推理的准确度,以获得所述AI模型对全局数据的评估结果。。
作为一种可能的实施方式,所述评估数据集中的评估数据为图像或音频。
第八方面公开一种计算设备,所述计算设备包括存储器和处理器,所述存储器用于存储一组计算机指令;所述处理器执行所述存储器存储的一组计算机指令,以使得所述计算设备执行第六方面或第六方面的任意一种可能的实施方式公开的方法。
第九方面公开一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被计算设备执行时,所述计算设备执行前述第六方面或第六方面的任意一种可能的实施方式中公开的方法。该存储介质包括但不限于易失性存储器,例如随机访问存储器,非易失性存储器,例如快闪存储器、硬盘(hard disk drive,HDD)、固态硬盘(solid state drive,SSD)。
第十方面公开一种计算机程序产品,所述计算机程序产品包括计算机程序代码,在所述计算机程序代码被计算设备执行时,所述计算设备执行前述第六方面或第六方面的任意可能的实施方式中公开的方法。该计算机程序产品可以为一个软件安装包,在需要使用前述第六方面或第六方面的任意可能的实施方式中公开的方法的情况下,可以下载该计算机程序产品并在计算设备上执行该计算机程序产品。
附图说明
图1是本申请实施例公开的一种系统架构100的示意图;
图2是本申请实施例公开的另一种系统架构200的示意图;
图3是本申请实施例公开的一种评估系统的部署示意图;
图4是本申请实施例公开的另一种评估系统的部署示意图;
图5是本申请实施例公开的一种评估系统的结构示意图;
图6是本申请实施例公开的一种AI模型的评估方法的流程示意图;
图7是本申请实施例公开的一种任务创建界面示意图;
图8是本申请实施例公开的另一种AI模型的评估方法的流程示意图;
图9是本申请实施例公开的微生物检测的标注框亮度的分布图;
图10是本申请实施例公开的微生物检测的标注框的面积占图像的比重的分布图;
图11是本申请实施例公开的一种微生物细胞对应的模型重新训练前后mAP示意图;
图12是本申请实施例公开的一种用于安全帽检测的AI模型的FI值与置信度阈值的曲线;
图13是本申请实施例公开的一种用于安全帽检测的AI模型的P-R曲线;
图14是本申请实施例公开的另一种评估系统1500的结构示意图;
图15是本申请实施例公开的又一种评估系统1600的结构示意图;
图16为本申请实施例公开的一种计算设备的结构示意图;
图17为本申请实施例公开的另一种计算设备的结构示意图。
具体实施方式
本申请实施例公开了一种人工智能(artificial intelligence,AI)模型的评估方法、系统及设备,用于有效地评估AI模型。以下分别进行详细说明。
目前,AI受到了学术界和工业界的广泛关注,其在不少应用领域都发挥了超乎普通人类的水平。例如:AI技术在机器视觉领域(如人脸识别、图像分类、物体检测等)的应用使得机器视觉的准确率高于人类,AI技术在自然语言处理和推荐系统等领域也有较好的应用。
机器学习是一种实现AI的核心手段,计算机针对要解决的技术问题,根据已有的数据构建一种AI模型,再利用AI模型对未知数据进行推理,获得推理结果。这种方法就好像计算机像人类一样学习了某一能力(如认知能力、辨别能力、分类能力等),因此,将这种方法称为机器学习。
利用机器学习实现AI的各种应用要用到各种AI模型(如神经网络(neural network)模型等)。AI模型是一类用机器学习思想解决实际问题的数学算法模型,AI模型中包括大量的参数和计算公式(或计算规则),AI模型中的参数是可以通过数据集对AI模型进行训练获得的数值,例如:AI模型中的参数是AI模型中计算公式或因子的权重。AI模型可以分成多层或者多个节点,每一层或者每一个节点包括一种类型的计算规则及一个或多个参数(用于表示某种映射、关系或者变换),AI模型中的每一层或者每个节点采用的计算规则及一个或多个参数称为一个算子(operator)。一个AI模型可以包括大量的算子,例如,在神经网络中,算子可以为一层结构,可以为卷积层、池化层、全连接层等。卷积层用于特征提取。池化层用于下采样。全连接层用于特征提取或分类。AI模型包括深度卷积神经网络、残差网络(residual network,ResNet)、超分辨率测试序列(visual geometry group,VGG)网络、Inception网络、快速(Faser)-基于区域的卷积神经网络(region-based convolutional neural networks,R-CNN)、单个深层检测(single shot multibox detector,SSD)网络、你只需要看一遍(you only look once,YOLO)网络等。
在将一个AI模型用在一个特定的应用场景以解决一个技术问题之前,先需要对初始AI模型进行训练,之后对训练后的AI模型进行评估,进而根据评估结果决定该AI模型是否需要继续优化,优化后再评估。只有在AI模型评估结果较好的情况下,才能使用AI模型。随着深度学习的不断发展,逐渐形成了AI平台。AI平台是向个人或企业等用户提供AI模型的训练、评估、优化等服务的系统,AI平台可以通过接口接收用户的需求和数据,为用户训练和优化符合用户需求的各种AI模型,也可以为用户评估AI模型的性能,还可以为用户根据评估结果继续优化AI模型。
目前,AI平台在对初始AI模型进行训练,得到AI模型之后,AI平台使用AI模型对评估数据集进行推理得到推理结果,之后可以根据推理结果和评估数据集中的评估数据的标签确定AI模型对评估数据集的推理结果的准确度,准确度用于表示AI模型对评估数据集中的评估数据的推理结果与该评估数据集中的评估数据的真实结果之间的相近程度,准确度可以用很多指标来衡量,例如准确率、召回率等。上述方法中,研发人员只能得到AI模型对整个评估数据集的准确度的值,无法得到更具体的信息,如数据特征对AI模型的推理结果的影响等,以致评估结果较为笼统,不能给AI模型的进一步优化提供更多的信息。
推理是使用AI模型对评估数据集中的评估数据进行预测的过程。例如,在任务类型为人脸识别的情况下,推理可以是使用AI模型识别评估数据集中的图像中的人脸对应的人名。具体地,可以通过推理代码调用AI模型对评估数据集中的评估数据进行推理。推理代码可以包括调用代码,用于调用AI模型对评估数据集中的评估数据进行推理。推理代码还可以包括预处理代码,用于对评估数据集中的评估数据进行预处理,之后使用调用代码调用AI模型对预处理后的评估数据集中的评估数据进行推理。推理代码还可以包 括后处理代码,用于对推理结果进行进一步地统计分析等处理。
数据特征是对数据本身特性或特征的抽象,用于表示数据的特征或特性。例如,在评估数据为图像的情况下,数据特征可以为图像的长宽比、图像的色彩度、图像的分辨率、图像的模糊度、图像的亮度、图像的饱和度等。不同的数据在同一数据特征下对应有不同的数据特征值,根据数据特征可以对多个数据进行分类,每个分类下的数据为具有相似的数据特征的数据。例如:不同尺寸图像的图像的长宽比不同,可以分别计算10张图像的长宽比的值,得到一组图像的长宽比的值为[0.4、0.3、0.35、0.9、0.1、1.2、1.4、0.3、0.89、0.7],可以将上述图像按照图像的长宽比分为三类,一类为图像的长宽比的值为[0-0.5]的图像,共5张;一类为图像的长宽比的值为(0.5-1]的图像,共3张;还有一类为图像的长宽比的值为(1-1.5]的图像,共2张。
本申请实施例公开了一种AI模型的评估方法、系统及设备,该方法可以得到AI模型对特定分类的数据的评估结果,使评估结果可以更有效地用于指导AI模型的进一步优化。
为了更好地理解本申请实施例公开的一种AI模型的评估方法、系统及设备,下面先对本申请实施例使用的系统架构进行描述。请参阅图1,图1是本申请实施例公开的一种系统架构100的示意图。如图1所示,在该系统架构100中,可以包括训练系统11、评估系统12和终端设备13,其中,训练系统11和评估系统12可以通过AI平台为用户提供AI模型的训练和评估服务。
训练系统11,用于接收用户通过终端设备13发送的训练数据集,根据训练数据集对初始AI模型进行训练,以及将训练好的AI模型发送给评估系统12。
可选地,训练系统11,还用于接收用户通过终端设备13在AI平台上输入或选择的任务类型,根据任务类型确定初始AI模型。
可选地,训练系统11,还用于将接收的任务类型发送给评估系统12。
可选地,训练系统11,还用于接收用户通过终端设备13上传的初始AI模型。
评估系统12,用于接收来自训练系统11的AI模型,接收用户通过终端设备13上传的评估数据集,使用AI模型对评估数据集进行推理得到推理结果,根据评估数据集和推理结果生成包括评估结果和/或对AI模型的优化建议的评估报告,发送评估报告至终端设备13。
可选地,评估系统12,还用于接收来自训练系统11的任务类型。
可选地,评估系统12,还用于接收用户通过终端设备13在AI平台上输入或选择的任务类型。
终端设备13,用于根据用户的操作向训练系统11发送和评估系统12发送数据和信息,或者接收训练系统11或评估系统12发送的信息。
请参阅图2,图2是本申请实施例公开的另一种系统架构200的示意图。如图2所示,该系统架构200可以包括终端设备21和评估系统22。
终端设备21,用于根据用户的操作将训练好的AI模型、评估数据集和推理代码发送至评估系统22。
评估系统22,用于接收来自终端设备21的训练好的AI模型、评估数据集和推理代码, 通过推理代码调用AI模型对评估数据集中的评估数据进行推理得到推理结果,根据评估数据集和推理结果生成包括评估结果和对AI模型的优化建议的评估报告,以及向终端设备21发送评估报告。
可选地,评估系统22,还用于接收用户通过终端设备21发送的任务类型。
应理解,在一些实施例中,本申请提供的AI模型的评估方法由评估系统执行,例如:评估系统可以是上述评估系统12或者上述评估系统22。
请参阅图3,图3是本申请实施例公开的一种评估系统的部署示意图。如图3所示,评估系统可以部署在云环境。云环境是云计算模式下利用基础资源向用户提供云服务的实体。云环境包括云数据中心和云服务平台,云数据中心包括云服务提供商拥有的大量基础资源(包括计算资源、存储资源和网络资源),云数据中心包括的计算资源可以是大量的计算设备(例如服务器)。评估系统可以独立地部署在云数据中心中的服务器或虚拟机上,评估系统也可以分布式地部署在云数据中心中的多台服务器上、或者分布式地部署在云数据中心中的多台虚拟机上、再或者分布式地部署在云数据中心中的服务器和虚拟机上。如图3所示,评估系统由云服务提供商在云服务平台抽象成一种评估云服务提供给用户,用户在云服务平台购买该云服务后(可预充值再根据最终资源的使用情况进行结算),云环境利用部署在云数据中心的评估系统向用户提供评估云服务。应理解,评估系统提供的功能也可以与其他系统提供的功能共同抽象成一项云服务,例如:云服务提供商将评估系统提供的对AI模型评估的功能,以及训练系统提供的对初始AI模型进行训练的功能共同抽象成一种AI平台云服务。
评估系统还可以部署在边缘环境,边缘环境是指距离用户较近的数据中心或者边缘计算设备的集合,边缘环境包括一个或多个边缘计算设备。评估系统可以独立地部署在边缘计算设备上,评估系统也可以分布式地部署在多台边缘服务器上、或者分布式地部署在多台拥有计算力的边缘小站上、再或者分布式地部署在边缘服务器和拥有计算力的边缘小站上。此外,评估系统还可以部署在其它环境,例如终端计算设备集群。评估系统可以是一个软件系统,运行在服务器等计算设备上。评估系统也可以是AI平台的一个后台系统,在AI平台上可以是一项AI模型评估服务,该服务由评估系统后台提供。
请参阅图4,图4是本申请实施例公开的另一种评估系统的部署示意图。如图4所示,本申请提供的评估系统还可以分布式地部署在不同的环境中。本申请提供的评估系统可以在逻辑上分成多个部分,每个部分具有不同的功能。评估系统中的各部分可以分别部署在终端计算设备、边缘环境和云环境中的任意两个或三个环境中。终端计算设备包括:终端服务器、智能手机、笔记本电脑、平板电脑、个人台式电脑、智能摄相机等。边缘环境为包括距离终端计算设备较近的边缘计算设备集合的环境,边缘计算设备包括:边缘服务器、拥有计算力的边缘小站等。部署在不同环境或设备的评估系统的各个部分协同实现AI模型评估功能。应理解,本申请不对评估系统的哪些部分部署具体部署在什么环境进行限制性的划分,实际应用时可根据终端计算设备的计算能力、边缘环境和云环境的资源占有情况或具体应用需求进行适应性的部署。
在一些实施例中,AI平台包括训练系统和评估系统,训练系统和评估系统可以部署在同样的环境,如云环境、边缘环境等。训练系统和评估系统也可以部署在不同的环境, 例如,训练系统部署在云环境,评估系统部署在边缘环境。训练系统和评估系统可以是独立部署的,也可以是分布式部署地。
请参阅图5,图5是本申请实施例公开的一种评估系统500的结构示意图。如图5所示,评估系统500可以包括输入输出(input output,I/O)模块501、数据集存储模块502、推理模块503、性能监测模块504、模型分析模块505、数据分析模块506、诊断模块507和结果存储模块508。评估系统500可以包括上述模块中的全部模块或部分模块。下面先对评估系统500中的各个模块的功能进行描述。
I/O模块501,用于接收来自训练系统或终端设备发送的AI模型,接收用户通过终端设备上传的评估数据集和推理代码,发送评估报告至终端设备。
可选地,I/O模块501,还用于接收用户通过终端设备上传的任务类型。
数据集存储模块502,用于存储接收的评估数据集。
推理模块503,用于使用AI模型对数据集存储模块502存储的评估数据集或接收的评估数据集进行推理。
性能监测模块504,用于在推理模块503进行推理的过程中监测AI模型推理过程中对硬件资源的使用信息以及AI模型包括的算子的使用时长、算子的使用数量。算子的使用数量为算子在推理模块503进行推理的过程使用的次数。算子的使用时长为每个算子在推理模块503进行推理的过程使用的总时长和/或平均时长。
模型分析模块505,用于根据推理模块503的推理结果和评估数据集中评估数据的标签计算AI模型对评估数据集中的评估数据的推理结果的准确度。
数据分析模块506,用于计算评估数据集中的评估数据在一种或多种数据特征下的数据特征的值,根据数据特征的值对评估数据集中的评估数据进行分类,获得至少一个评估数据子集,根据推理模块503的推理结果和每个评估数据子集中评估数据的标签计算AI模型对每个评估数据子集中的评估数据的推理结果的准确度。
诊断模块507,用于根据性能监测模块504的监测结果、模型分析模块505的分析结果和数据分析模块506的分析结果中的任意一个或多个生成评估报告。
结果存储模块508,用于存储性能监测模块504的监测结果、模型分析模块505的分析结果、数据分析模块506的分析结果和诊断模块507的诊断结果。
由于上述各模块的功能,本申请实施例提供的评估系统可向用户提供评估AI模型的业务,且该评估系统可以深度分析不同数据特征对AI模型的影响等分析结果,进一步向用户提供AI模型优化建议。
基于图1或图2所示的系统架构,请参阅图6,图6是本申请实施例公开的一种AI模型的评估方法的流程示意图。其中,该AI模型的评估方法应用于评估系统。由于评估系统独立地或分布式地部署在计算设备上,因此,该AI模型的评估方法应用于计算设备,即本申请的AI模型的评估方法可以由计算设备中的处理器通过执行存储器存储的计算机指令来执行。如图6所示,该AI模型的评估方法可以包括以下步骤。
601、接收AI模型和评估数据集。
AI模型是已训练的模型,AI模型可以是训练系统发送的,还可以是用户通过终端设 备上传的。
评估数据集可以包括多个评估数据和这多个评估数据的标签,每个评估数据对应一个或多个标签,该标签用于表示评估数据对应的真实结果。这多个评估数据的类型相同,可以为图像、视频、音频、文本等。任务类型不同,评估数据集中的评估数据可能不同,也可能相同。例如,在任务类型为图像分类或物体检测的情况下,评估数据集中的评估数据均为图像,在任务类型为语音识别的情况下,评估数据集中的评估数据为音频。标签用于表示评估数据对应的真实结果,对于不同的任务类型和不同的评估数据,标签的形式也不相同。例如,对于评估数据是图像,任务类型是识别图像中的目标的类型的情况,则评估数据的标签即为目标的真实类型。再例如,对于评估数据是图像,任务类型是对图像中的目标进行检测,标签可以是评估图像中的目标对应的检测框,检测框的形状可以为矩形,也可以为圆形,还可以为直线,还可以为其它形状,在此不加限定。。即实际上标签是一个具有特定意义的值,是和被标注的评估数据相关联的一个值,这个值可以表示被标注的评估数据的类型、位置或者其它。再例如,在评估数据为音频的情况下,标签可以表示音频为流行音乐、古典音乐等音频的类型。其中,多个评估数据中的每个评估数据可以对应一个标签,也可以对应多个标签。
不同AI模型可以应用到不同的应用场景,而同一AI模型也可以应用到不同的应用场景。AI模型的应用场景不同,AI模型的任务类型可能不同。由于AI模型的任务类型不同,AI模型的评估指标和数据特征也不同。因此,获取到AI模型之后,可以获取AI模型的任务类型的评估指标和数据特征,即获取AI模型的任务类型对应的评估指标和数据特征。在评估系统包括多个任务类型,且每个任务类型分别设置有相应的评估指标和数据特征的情况下,可以获取AI模型的任务类型的评估指标和数据特征。在评估系统包括一个任务类型的情况下,可以获取这个任务类型的评估指标和数据特征。一个任务类型的评估指标可以包括至少一个评估指标,一个任务类型的数据特征可以包括至少一个数据特征。数据特征是对数据本身特性的抽象。数据特征可以为一个或多个,每个数据特征用于表示评估数据集中的评估数据的一方面特征。
在评估系统包括多个任务类型的情况下,任务类型可以是用户预先通过评估系统中的I/O模块输入或选择的。请参阅图7,图7是本申请实施例公开的一种任务创建界面示意图。如图7所示,任务创建界面可以包括数据集、模型类型、模型来源和推理代码。此外,任务创建界面还可以包括其他内容,在此不加限定。数据集后面的框可以用于用户上传评估数据集,也可以用于用户输入评估数据集的存储路径。模型类型后面的框可以用于用户从存储的任务类型中选取AI模型的任务类型,也可以用于用户输入AI模型的任务类型。模型来源后面的框可以用于用户上传AI模型,也可以用于用户输入AI模型的存储路径。推理代码后面的框可以用于用户上传推理代码,也可以用于用户输入推理代码的存储路径。可见,在任务创建完成之后,AI模型的任务类型就已确定。推理代码用于调用AI模型对评估数据集进行推理。推理代码可以包括调用代码,调用代码可以调用AI模型对评估数据集进行推理。推理代码还可以包括预处理代码,预处理代码用于对评估数据集中的评估数据进行预处理,之后调用代码调用AI模型对预处理后的评估数据集进行推理。推理代码还可以包括后处理代码,后处理代码用于对推理的结果进行处理得到推理结果。
602、计算评估数据集中每个评估数据的数据特征的值。
接收到AI模型和评估数据集之后,可以计算评估数据集中每个评估数据的数据特征的值,即根据数据集包括的多个评估数据以及多个评估数据的标签计算评估数据集中每个评估数据的数据特征的值。数据特征的值是用于衡量数据特性的值。数据特征可以是一个,也可以是多个。在数据特征为多个的情况下,可以计算评估数据集中每个评估数据的多个数据特征中每个数据特征的值。
在任务类型为图像分类的情况下,评估数据集中每个评估数据为图像,数据特征可以包括图像的长宽比、所有图像的RGB的均值和标准差、图像的色彩度、图像的分辨率、图像的模糊度、图像的亮度、图像的饱和度等通用图像特征。图像的长宽比为图像的宽度与高度的比值,图像的长宽比AS可以表示如下:
Figure PCTCN2020097651-appb-000001
ImageH为图像的高,ImageW为图像的宽。所有图像的RGB的均值为评估数据集包括的所有图像中R通道的值的平均值、G通道的值的平均值和B通道的值的平均值。所有图像的RGB的均值T mean可以表示如下:
Figure PCTCN2020097651-appb-000002
n为评估数据集包括的图像的数量。(R,G,B) i中的R为评估数据集包括的第i张图像中所有像素点R通道的值的和,(R,G,B) i中的G为评估数据集包括的第i张图像中所有像素点G通道的值的和,(R,G,B) i中的B为评估数据集包括的第i张图像中所有像素点B通道的值的和。所有图像的RGB的均值可以拆分为以下三个公式:
Figure PCTCN2020097651-appb-000003
Figure PCTCN2020097651-appb-000004
Figure PCTCN2020097651-appb-000005
T mean,R为n张图像的R通道的值的平均值,T mean,G为n张图像的G通道的值的平均值,T mean,B为n张图像的B通道的值的平均值。R i为评估数据集包括的第i张图像中所有像素点R通道的值的和,G i为评估数据集包括的第i张图像中所有像素点G通道的值的和,B i为评估数据集包括的第i张图像中所有像素点B通道的值的和。所有图像的RGB的标准差T STD可以表示如下:
Figure PCTCN2020097651-appb-000006
图像的色彩度为图像的色彩的丰富程度,图像的色彩度CO可以表示如下:
Figure PCTCN2020097651-appb-000007
STD()为对括号内的内容进行标准差计算。图像的分辨率为单位英寸中所包含的像素点数。图像的模糊度为图像的模糊程度。图像的亮度为图像中画面的明亮程度,图像的亮度BR可以表示如下:
Figure PCTCN2020097651-appb-000008
图像的饱和度为图像中色彩的纯度,图像的饱和度SA可以表示如下:
Figure PCTCN2020097651-appb-000009
m为一张图像包括的像素点的数量,max(R,G,B) j为一张图像中第j个像素点中R通道的值、G通道的值和B通道的值中的最大值,min(R,G,B) j为一张图像中第j个像素点中R通道的值、G通道的值和B通道的值中的最小值。
在任务类型为物体检测的情况下,评估数据集中每个评估数据为图像,数据特征可以包括标注框的数量、标注框的面积占图像的比重、标注框的面积方差、标注框距离图像边缘的程度、标注框的重叠度、图像的长宽比等基于标注框的特征、图像的分辨率、图像的模糊度、图像的亮度、图像的饱和度等。标注框即训练数据集中的训练图像的标签,在训练图像中,待识别的一类或多类物体采用标注框进行标注,使得在对AI模型进行训练的过程中,AI模型将学习到训练图像中标注框内的物体的特征,进而使得AI模型具备检测图像中的该一类或多类物体的能力。标注框的面积占图像的比重为标注框的面积占图像面积的比例,标注框的面积占图像的比重AR可以表示如下:
Figure PCTCN2020097651-appb-000010
BboxW为标注框的宽,即评估数据包括的标签对应的标注框的宽。BboxH为标注框的高,即评估数据包括的标签对应的标注框的高。标注框的重叠度为一个标注框被其它标注框覆盖部分所占这个标注框的比例,标注框的重叠度OV可以表示如下:
Figure PCTCN2020097651-appb-000011
M为一张图像包括的标注框的数量与1的差值,C为这张图像包括的标注框中的目标框的区域,area(C)为目标框的面积,G k为这张图像包括的标注框中除目标框之外的第k个标注框的区域,C∩G k为目标标注框的区域与第k个标注框的区域的重叠区域,area(C∩G k)为目标标注框的区域与第k个标注框的区域的重叠区域的面积。标注框距离图像边缘的程度MA可以表示如下:
Figure PCTCN2020097651-appb-000012
imgx为一张图像的中心点在x轴的坐标,imgy为这张图像的中心点在y轴的坐标,x为这张图像中标注框的中心点在x轴的坐标,y为这张图像中标注框的中心点在y轴的坐标。
在任务类型为自然语言中的文本分类的情况下,数据特征可以包括字数、非重复单词数量、长度、停止词数量、标点符号数量、标题式单词数量、单词的平均长度、词频统计(term frequency,TF)、逆文本频度(inverse document freq uency,UDF)等。字数,用于统计每一行文本(text)的词汇数量。非重复单词数量,用于统计每一行文本中只出现一次的单词个数。长度,用于统计每一行文本的长度占了多少存储空间(包含空格、符号、字母等的长度)。停止词数量,用于统计在…中间(between)、但(but)、关于(about)、非常(very)等词汇的数量。标点符号数量,用于统计每一行文本中包含的标点符号数量。大写单词数量,用于统计大写单词数量。标题式单词数量,用于统计单词拼写首字母为大写,且其他字母为小写的单词数量。单词的平均长度,用于统计每一行文本中每个单词长度的平均值。
在任务类型为音频中的声音分类的情况下,数据特征可以包括短时平均过零率(zero  crossing rate)、短时能量(energy)、能量熵(entropy of energy)、频谱中心(spectral centroid)、频谱延展度(spectral spread)、谱熵(spectral entropy)、频谱通量(spectral flux)等。短时平均过零率,为每帧信号内信号过零点的次数,用于体现频率特性。短时能量,为每帧信号的平方和,用于体现信号能量的强弱。能量熵,与频谱的谱熵(spectral entropy)有点类似,但它描述的是信号的时域分布情况,用于体现连续性。频谱中心,又称为频谱一阶距,频谱中心的值越小,表明越多的频谱能量集中在低频范围内,如:说话声(voice)与音乐(music)相比,通常频谱中心较低。频谱延展度,又称为频谱二阶中心矩,它描述了信号在频谱中心周围的分布状况。谱熵,根据熵的特性可以知道,分布越均匀,熵越大,谱熵反应每一帧信号的均匀程度,如说话人频谱由于共振峰存在显得不均匀,而白噪声的频谱就更加均匀,借此进行语音活体检测(voice activity detection,VAD)便是应用之一。频谱通量,用于描述相邻帧频谱的变化情况。
可以根据类似上述给出的方式或公式计算评估数据集中每个评估数据的数据特征的值。
603、按照评估数据集中每个评估数据的数据特征的值,将评估数据集中的评估数据划分为至少一个评估数据子集。
计算出评估数据集中每个评估数据的数据特征的值之后,可以按照评估数据集中每个评估数据的数据特征的值的分布或者根据预设定的划分阈值,将评估数据集中的评估数据划分为至少一个评估数据子集。即根据数据特征的值对评估数据集中的评估数据进行分类得到评估数据子集。评估数据的数据特征可以有多种,可以根据每种数据特征对评估数据集进行划分。例如,在任务类型为图像分类,数据特征包括图像的亮度和图像的饱和度的情形下,计算出评估数据集中每个图像的亮度值和饱和度值之后,可以将评估数据集中的评估数据按照亮度值的分布进分为至少一个评估数据子集,以及可以将评估数据集中的评估数据按照饱和度值的分布划分为至少一个评估数据子集。将评估数据集中的评估数据按照数据特征值的分布进行划分时,可以是按照阈值进行划分的,也可以是按照百分比进行划分的,还可以通过其它方式进行划分的,在此不加限定。
举例说明,以按照百分比进行划分为例进行说明。数据特征包括图像的亮度,评估数据集包括100张图像。可以先将这100张图像按照图像的亮度值从大到小或从小到大的顺序进行排序,之后将排序后的100张图像按照百分比划分为四个评估数据子集,这四个评估数据子集中每个评估数据子集可以包括25张图像。按照百分比划分时,可以是均分的,也可以是不均分的。
举例说明,以按照阈值划分为例进行说明。数据特征包括图像的亮度,评估数据集包括100张图像。可以先将这100张图像按照图像的亮度值从大到小或从小到大的顺序进行排序。之后可以将亮度值大于或等于第一阈值的图像划分为第一评估数据子集,可以将亮度值小于第一阈值且大于或等于第二阈值的图像划分为第二评估数据子集,可以将亮度值小于第二阈值且大于或等于第三阈值的图像划分为第三评估数据子集,可以将亮度值小于第三阈值的图像划分为第四评估数据子集。第一阈值、第二阈值和第三阈值依次减小,第一数据子集、第二数据子集、第三数据子集和第四数据子集包括的图像的数量可以相同,也可以不同。
经过划分得到的每个评估数据子集中的所有评估数据的数据特征的值满足同一组条 件。条件可以是:评估数据子集中的所有评估数据的数据特征的值均在特定的数值范围(例如:所有评估数据的图像的亮度值均在0-20%范围内),或者评估数据子集中的所有评估数据的数据特征的值符合特定的特征(例如:所有评估数据的图像的长宽比为偶数)。
在另一种实施例中,还可以根据多个数据特征对评估数据集进行划分,以获得至少一个评估数据子集,由此划分得到的评估数据子集中的评估数据的多个数据特征的值满足同一组条件中的多个子条件,即评估数据子集中的评估数据的每个数据特征的值满足该数据特征对应的一个子条件。例如,评估数据为图像,其数据特征包括两个:图像的亮度和图像的长宽比。可以将评估数据集中图像的亮度在第一阈值范围内,且图像的长宽比在第二阈值范围内的图像划分为一个评估数据子集,即该评估数据子集中的所有评估数据对应的两个数据特征的值分别满足对应的一个子条件。评估数据子集为评估数据集的子集,即评估数据子集包括的评估数据为评估数据集包括的评估数据中的部分数据。
604、使用AI模型对至少一个评估数据子集中的评估数据进行推理得到推理结果。
获取到AI模型和评估数据集之后,或者按照评估数据集中每个评估数据在数据特征下的数据特征值的分布,将评估数据集中的评估数据划分为至少一个评估数据子集之后,可以使用AI模型对至少一个评估数据子集中每个评估数据子集的评估数据进行推理得到推理结果。可以将每个评估数据子集中的评估数据输入AI模型对该评估数据子集中的评估数据进行推理。可以通过推理代码调用AI模型对评估数据子集中的评估数据进行推理。推理代码可以包括调用代码,用于调用AI模型对评估数据子集中的评估数据进行推理。在使用AI模型对评估数据子集中的评估数据进行推理之前,为了保证评估数据在某些方面的一致性,例如,在评估数据为图像的情况下,为了保证图像大小的一致性,可以先对评估数据子集中的评估数据进行预处理。推理代码还可以包括预处理代码,用于对评估数据子集中的评估数据进行预处理。在使用AI模型对评估数据子集中的评估数据进行推理之后,可能需要对推理的结果进行处理。可选地,推理代码还可以包括后处理代码,用于对推理的结果进行后处理。预处理代码、调用代码和后处理代码是依次执行的。在图1对应的系统架构下,推理代码是根据AI模型开发的。在图2对应的系统架构下,推理代码是客户提供的。
值得注意的是,在另一些实施例中,在执行对AI模型进行评估的方法时,可以不按照上述步骤603和步骤604的顺序,可以是先使用AI模型对评估数据集中所有评估数据进行推理,获得评估数据集中所有评估数据的推理结果,再根据评估数据集中每个评估数据在数据特征下的数据特征值的分布将评估数据集划分为至少一个评估数据子集,获得每个评估数据子集中的评估数据对应的推理结果。
605、将每个评估数据的推理结果和每个评估数据的标签进行比较,根据比较结果计算AI模型对每个评估数据子集的推理的准确度,获得评估结果。
使用AI模型对至少一个评估数据子集中的评估数据进行推理得到推理结果之后,可以先将每个评估数据的推理结果和每个评估数据的标签进行比较,当评估数据的推理结果和评估数据的标签相同时,可认为AI模型对该评估数据的推理结果是准确的,比较结果为正确;当评估数据的推理结果和评估数据的标签不相同时,可认为AI模型对该评估数据的推理结果是不准确的,比较结果为不正确。根据比较结果可以计算AI模型对每个 评估数据子集的推理的准确度,获得评估结果。根据比较结果计算AI模型对每个评估数据子集的推理的准确度获得评估结果时,可以根据比较结果计算AI模型对至少一个评估数据子集中每个评估数据子集的评估数据的推理结果在评估指标下的评估指标值,得到评估结果。准确度可以使用该AI模型的一个或多个评估指标来衡量。
在任务类型为图像分类的情况下,评估指标可以包括混淆矩阵、准确率(accuracy)、精确率(presicion)、召回率(recall)、接收者操作特征(receiver operating characteristic,ROC)曲线、F1值(score)等。在图像分类为二分类的情况下,类别可以包括正类和负类,可以将样本根据其真实类别与预测出来的类别划分为真正(true positive,TP)、真负(true negative,TN)、假正(false positive,FP)和假负(false negative,FN)。TP为AI模型预测出的类别为正类的真实类别为正类的样本的数量,即第一标签所标注的样本为正样本,第一标签所标注的样本的推理结果为正的样本的数量。TN为AI模型预测出的类别为负类的真实类别为负类的样本的数量,即第一标签所标注的样本为负样本,第一标签所标注的样本的推理结果为负的样本的数量。FP为AI模型预测出的类别为正类的真实样本为负类的样本的数量,即第一标签所标注的样本为负样本,第一标签所标注的样本的推理结果为正的样本的数量。FN为AI模型预测出的类别为负类的真实类别为正类的样本的数量,即第一标签所标注的样本为正样本,第一标签所标注的样本的推理结果为负的样本的数量。混淆矩阵包括TP、TN、FP和FN,混淆矩阵可以如表1所示:
Figure PCTCN2020097651-appb-000013
表1 混淆矩阵
准确率为预测正确的样本数占总样本数的比例,在图像分类为二分类的情况下,准确率AC可以表示如下:
Figure PCTCN2020097651-appb-000014
精确率为正确预测为正的样本数占所有预测为正的样本数的比例,在图像分类为二分类的情况下,精确率PR可以表示如下:
Figure PCTCN2020097651-appb-000015
召回率为正确预测为正的样本数占所有正样本数的比例,在图像分类为二分类的情况下,召回率RE可以表示如下:
Figure PCTCN2020097651-appb-000016
F1值为算数平均数与几何平均数的比值,F1值可以表示如下:
Figure PCTCN2020097651-appb-000017
ROC曲线为纵轴为正阳性率(true positive ratio,TPR)、横轴为伪阳性率(false positive ratio,FPR)的曲线。TPR为预测为正的真实为正的样本数占所有真实为正的样本数的比例。FPR 为预测为正的真实为副的样本数占所有真实为负的样本数的比例。在图像分类为二分类的情况下,FPR和TPR可以表示如下:
Figure PCTCN2020097651-appb-000018
Figure PCTCN2020097651-appb-000019
在任务类型为物体检测的情况下,评估指标可以包括平均精度均值(mean average precision,mAP)、准确率-召回率(presicion-recall,P-R)曲线等。P-R曲线为横坐标为召回率,纵坐标为准确率的曲线。mAP为平均精度(average precision,AP)的均值,AP为P-R曲线围起来的面积。mAP和AP可以表示如下:
Figure PCTCN2020097651-appb-000020
Figure PCTCN2020097651-appb-000021
Q为标签的数量,AP(q)为第q个标签的平均精度,N为预测出的标注框的数量,RE idx为预测出的第idx个标注框的召回率,RE idx-1为预测出的第idx-1个标注框的召回率,PR idx为预测出的第idx个标注框的精确率。
在任务类型为自然语言中的文本分类的情况下,评估指标可以包括准确率、精确率、召回率、F1值等。在任务类型为音频中的声音分类的情况下,评估指标可以包括准确率、精确率、召回率、F1值等。
在评估指标下的评估指标值可以根据上述公式进行计算,也可以根据其他方式计算,在此不加限定。评估结果可以包括AI模型对每个数据特征对应的评估数据子集中的评估数据的推理结果在评估指标下的评估指标值。针对一个评估指标和一个数据特征,在这个数据特征下的多个数据特征值可以对应在这个评估指标下的一个评估指标值。评估结果还可以包括根据AI模型对每个数据特征对应的评估数据子集中的评估数据的推理结果在评估指标下的评估指标值得到的现象,如图像的亮度对准确率的影响较大等。例如,任务类型为人脸检测,数据特征包括标注框的面积占图像的比重,评估指标包括召回率,评估结果可以如表2所示:
Figure PCTCN2020097651-appb-000022
表2 评估结果
可选地,执行完上述步骤601-步骤605后,上述方法还可以包括:根据评估结果,生成对AI模型的优化建议,优化建议可以是根据AI模型目前对各个评估数据子集的评估结果,建议继续增加与其中一个或多个评估数据子集中的评估数据满足同一组条件的新数 据继续训练AI模型,通常当前AI模型对该一个或多个评估数据子集的推理的准确度还不满足模型需求或者当前AI模型对该一个或多个评估数据子集的推理的准确度相较于其他评估数据子集较低。例如,对于表2中的评估结果,优化建议可以为用标注框的面积占图像的比重满足0%-20%这个条件的新数据训练AI模型。应理解,对于根据优化建议获得的继续用于训练的新数据可以是重新采集的数据,也可以是对原来的训练数据中的数据的数据特征的值进行调整后的数据。
可选地,可以根据评估结果确定数据特征对评估指标的敏感度。具体地,可以对数据特征的值和AI模型对每个数据特征对应的每个评估数据子集的评估数据的推理结果在评估指标下的评估指标值进行回归分析,得到数据特征对评估指标的敏感度。即可以将在数据特征的值作为输入,将AI模型对每个数据特征对应的每个评估数据子集的评估数据的推理结果在评估指标下的评估指标值作为输出,进行回归分析,可以得到数据特征对评估指标的敏感度。例如,使用线性回归f(z t)=W Tz t,一组数据特征的值为z t向量,如包含图像的亮度值、清晰度值、分辨率值和饱和度值4个维度,将数据特征对应的评估数据子集的评估数据的推理结果在评估指标下的评估指标值作为f(z t),拟合出的W向量就是每个数据特征对每个评估指标的影响权重,即敏感度。
计算出数据特征中每个数据特征对每个评估指标的敏感度之后,可以根据每个数据特征对每个评估指标的敏感度,生成对AI模型的优化建议。可以在敏感度大于一定值的情况下认为该数据特征对评估指标的影响较大,同时针对该现象可以生成对应的优化建议。例如,在图像的亮度对准确度影响较大的情况下,可以给出增加图像的亮度值在一个或多个范围内的图像继续训练AI模型,由于当前AI模型对该一个或多个范围内的图像的推理的准确度还有提升空间,根据该优化建议用新的数据继续训练当前AI模型后,AI模型的推理能力较大概率地可以提升。
可选地,上述方法还可以包括:生成评估报告,发送评估报告。评估报告可以包括评估结果和优化建议中的至少一种。根据比较结果计算出AI模型对每个评估数据子集的推理的准确度获得评估结果之后,和/或根据评估结果生成对AI模型的优化建议之后,可以生成包括评估结果和/或优化建议的评估报告。
可选地,上述方法还可以包括:计算AI模型对评估数据集的整体推理的准确度。具体地,可以先确定AI模型对评估数据集中的评估数据的推理结果,之后将每个评估数据的推理结果和每个评估数据的标签进行比较,最后根据比较结果计算AI模型对评估数据集的推理的准确度,得到AI模型对全局数据的评估结果。此处与上面不同在于,此处不需要将评估数据集划分为多个评估数据子集,而是将评估数据集作为一种整体来进行计算的,由于评估数据集中的所有评估数据为没有进行特别选择的数据,通过AI模型对评估数据集整体的推理能力进行评估,可以评估AI模型对全局数据的推理能力,即AI模型对任何一种可以作为该AI模型的输入的数据的推理能力。本申请中的全局数据为未根据任何一种数据特征进行分类获得的数据,其可以代表任何一种可以用作该AI模型的输入的数据。
可选地,上述评估报告还可以包括AI模型对评估数据集中的推理的准确度。
可选地,上述方法还可以包括:获取性能参数。可以在使用AI模型对评估数据集中的评估数据进行推理的过程中,监测硬件资源的使用信息以及AI模型包括的算子的使用 时长、算子的使用数量得到性能参数。在使用AI模型对评估数据集进行推理的过程中,可以监测硬件资源的使用信息以及AI模型包括的算子的使用时长、算子的使用数量。硬件资源可以包括中央处理器(central processing unit,CPU)、图像处理器(graphics processing unit,GPU)、物理内存、GPU显存等。可以使用性能监控进程监控推理过程。具体地,可以调用GPU性能监测工具,如NVIDIA系统管理接口(system management interface,SMI),采集GPU的使用率和显存占用。可以调用CPU性能监控工具,如topvmstatiostat,采集CPU的使用率和显存占用。可以调用算子性能监测工具,如分析器(profiler)工具,采集AI模型包括的算子的使用时长、算子的使用数量。
可选地,上述优化建议还可以包括根据性能参数生成的优化建议。获取到性能参数之后,可以根据性能参数生成对AI模型的优化建议。可以根据硬件资源的使用信息、AI模型包括的算子的使用时长、算子的使用数量以及性能调优知识库,生成对AI模型的性能优化建议。性能调优知识库可以包括硬件资源的使用信息对应的现象、算子的使用情况对应的现象以及硬件资源的使用信息对应的现象和算子的使用情况对应的现象对应的性能优化方式。例如,在硬件资源的使用信息对应的现象为显存消耗较多的情况下,性能优化建议可以为将AI模型的参数的精度调整为8bit量化,也可以为启用算子融合。再例如,在硬件资源的使用信息对应的现象为显存消耗较多,硬件资源的使用信息对应现象对应的性能优化方式可以为将AI模型的参数的精度调整为半精度或int8量化。
可选地,上述步骤可以被多次执行,即进行多次评估。每一次的执行步骤相同,区别在于每一次使用的评估数据集有些许差别。例如,第一次使用的评估数据集为接收的用户上传的或终端设备发送的评估数据集,后续使用的评估数据集是对接收的评估数据集中评估数据的数据特征进行调整后的评估数据集,但调整前后的评估数据不会影响视觉效果。调整可以是加噪声,也可以是改变一个评估数据中部分数据的亮度值,还可以是调整评估数据的其它数据特征,在此不加限定。之后可以综合这多次的评估报告和优化建议得到更加准确地建议和报告,从而可以提高评估的鲁棒性。例如,第二次使用的评估数据集相对接收的评估数据集增加了噪声,第二次的评估报告与第一次的评估报告相比,准确率和精确率降低了,表明噪声对AI模型的影响较大,因此,可以尽量避免噪声的干扰。
可选地,本申请实施例中进行对AI模型进行评估还可以调用引擎相关的工具,如TensorFlow提供的profiler工具、MXNet提供的profiler工具等,分析AI模型的结构、AI模型包括的算子、算子的时间复杂度、算子的空间复杂度等。AI模型的结构可以包括残差结构、多级特征提取等。上述优化建议还可以包括根据上述分析给出AI模型的结构修改建议。例如,在分析出AI模型不包括归一化(batch normalization)层的情况下,由于会带来过拟合的风险,因此,可以生成增加BN层的建议。再例如,在AI模型的结构包括用于特征提取到分类的多级特征,且待识别的标注框包括多个尺度的情况下,可能无法识别到全部尺度的标注框,只能识别出部分尺度的标注框。算子的时间复杂度和空间复杂度可以是线性复杂度,也可以是指数型复杂度。在算子的空间复杂度为指数型复杂度的情况下,表明AI模型的结构比较复杂,可以生成剪支的建议,即调整AI模型的结构。
其中,上述建议和报告可以通过GUI提供给用户,也可以通过java脚本对象简谱(java script object notation,JSON)文档提供给用户,也可以发送到用户的终端设备。
请参阅图8,图8是本申请实施例公开的另一种AI模型的评估方法的流程示意图。其中,该AI模型的评估方法应用于评估系统。如图8所示,该AI模型的评估方法可以包括以下步骤。
801、获取AI模型和评估数据集。
其中,步骤801的详细描述可以参考步骤601。
802、利用AI模型对评估数据集中的评估数据进行推理。
其中,步骤802的详细描述可以参考步骤604。步骤802与步骤604不同在于,步骤802是对评估数据集中的评估数据进行推理,不需要对评估数据集进行划分,而步骤604是对评估数据集中的评估数据划分为至少一个评估数据子集中的评估数据进行推理,先需要将评估数据集中的评估数据划分为至少一个评估数据子集。
803、获取性能数据。
可以在使用AI模型对评估数据集中的评估数据进行推理的过程中,监测推理过程中硬件的性能表现,即硬件资源的使用信息,以及AI模型包括的算子的使用时长、算子的使用数量得到性能参数。即性能数据用于表示在AI模型对评估数据进行推理的过程中,执行推理过程的硬件的性能表现,或者在AI模型对评估数据进行推理的过程中AI模型包括的算子的使用情况。其中,算子的使用情况表示AI模型中的每种算子在推理过程中的使用时长或者每种算子在AI模型中使用的数量。步骤803的详细描述可以参考上面的相关描述。
804、根据性能数据生成对AI模型的优化建议。
获取到性能数据之后,可以根据性能数据生成对AI模型的优化建议。优化建议可以包括对AI模型的结构进行调整,也可以包括针对AI模型的算子进行优化训练。其中,步骤804的详细描述可以参考上面的相关描述。
可选地,上述方法还可以包括:生成评估报告,发送评估报告。根据性能数据生成对AI模型的优化建议之后,可以生成评估报告,并发送评估报告,可以是发送给终端设备,也可以是发送给用户的邮箱等。评估报告可以包括性能数据和优化建议中的至少一个。
可选地,上述方法还可以包括:计算AI模型对评估数据集的推理的准确度。具体地,可以先确定AI模型对评估数据集中的评估数据的推理结果,之后将每个评估数据的推理结果和每个评估数据的标签进行比较,最后根据比较结果计算AI模型对评估数据集的推理的准确度。详细描述可以参考上面的相关描述。
下面结合具体的例子,针对评估数据集中的评估数据为微生物图像,以及任务类型为物体检测的AI模型执行上述步骤。AI模型对评估数据集中的评估数据进行推理后,推理结果包括检测到的上皮细胞、亚生孢子、球菌、白细胞、孢子、菌子和线索细胞。在数据特征包括图像的亮度、评估指标包括F1值的情况下,评估报告中的评估结果可以包括AI模型对按照亮度值分布划分的4个评估数据子集的评估数据的FI值,可以如表3所示:
分布范围 上皮细 亚生孢 球菌 白细 孢子 菌子 线索 mAP
        细胞  
0-25% 0.6437 0.6876 0.0274 0.5005 0.7976 0.5621 0.5638 0.5404
25%-50% 0.425 0.5359   0.6904 0.746 0.5651 0.106 0.5114
50%-75% 0.413 0.5414 0.0334 0.6456 0.7263 0.5543 0.1429 0.4367
75%-100% 0.5084 0.4632   0.6818 0.6744 0.5683 0.2065 0.5171
STD 0.092 0.081 0.0003 0.076 0.044 0.005 0.182 0.039
表3 按照亮度值分布划分的4个评估数据子集的FI值
如表3所示,在步骤603中可以对微生物图像按照亮度值从大到小或从小到大的顺序进行排列,之后将最前面的25%(即0-25%)评估数据确定为第一评估数据子集,将接下来的25%(即25%-50%)评估数据确定为第二评估数据子集,再将接下来的25%(即50%-75%)评估数据确定为第三评估数据子集,将最后的25%(即75%-100%)评估数据确定为第四评估数据子集。之后在步骤605中分别计算第一评估数据子集-第四评估数据子集中上皮细胞、亚生孢子、球菌、白细胞、孢子、菌子和线索细胞的F1值。此外,在步骤605中,计算出第一评估数据子集-第四评估数据子集中上皮细胞、亚生孢子、球菌、白细胞、孢子、菌子和线索细胞的F1值之后,还可以计算第一数据子集-第四数据子集中的上皮细胞、亚生孢子、球菌、白细胞、孢子、菌子和线索细胞的F1值的mAP,以及计算所有评估数据的上皮细胞、亚生孢子、球菌、白细胞、孢子、菌子和线索细胞的F1值的标准差STD,即敏感度。通过表3可以得到图像的亮度对上皮细胞和线索细胞的影响较大的结论,相应地,可以给出增加图像的亮度值在25%-50%之间以及图像的亮度值在50%-75%之间的图像对AI模型进行训练的建议。在数据特征包括标注框的大小、评估指标包括F1值的情况下,评估报告中的评估结果可以包括AI模型对按照标注框大小分布划分的4个评估数据子集的评估数据的FI值,可以如表4所示:
Figure PCTCN2020097651-appb-000023
表4 按照标注框大小分布划分的4活人评估数据子集的F1值
表4与表3的过程相似,在此不再详细赘述。通过表4可以得到标注框大小对上皮细胞和线索细胞的影响较大的结论,相应地,可以给出增加标注框大小在0-25%之间、标注框大小在25%-50%之间以及标注框大小在50%-75%之间的图像对AI模型进行训练。请参阅图9,图9是本申请实施例公开的微生物检测的标注框亮度的分布图。如图9所示,标注框所在区域的亮度大多都集中在50-170之间。请参阅图10,图10是本申请实施例公开的微生物检测的标注框的面积占图像的比重的分布图。如图10所示,标注框的面积占图像的比 重大多都集中在0-0.05之间。评估报告还可以包括性能数据,获取的性能数据中的硬件资源的使用信息可以如表5所示:
硬件资源的使用信息 峰值 均值
GPU使用率 65% 30%
CPU使用率 60% 40%
物理内存 390M 270M
GPU显存 1570M 1240M
表5 硬件资源的使用信息
根据表5可以得出显存消耗较多的结论,相应地,可以给出将AI模型中参数精度调整为半精度或int8量化的建议。获取的性能数据中的算子的使用情况下可以如表6所示:
算子 总使用时长 平均使用时长 使用数量
检测框生成(contrib_Proposal) 1329.748ms 120.886ms 11
卷积(convolution)、激活(activation) 1221.938ms 9.257ms 132
卷积、激活、池化(pooling) 1162.373ms 23.722ms 49
全连接(fullyconnected)、激活 260.557ms 13.028ms 20
归一化(softmax) 138.426ms 12.584ms 11
降维(flatten) 130.858ms 13.086ms 10
重置形状(reshape) 32.838ms 2.985ms 11
表6 算子的使用情况
根据表6可以得出检测框生成算子耗时较多的结论,相应地,可以给出对检测框生成算子进行优化的建议。执行完一次评估之后,可以根据上述给出的建议重新训练微生物细胞对应的AI模型。请参阅图11,图11是本申请实施例公开的一种微生物细胞对应的AI模型重新训练前后mAP示意图。如图11所示,重新训练前的mAP为0.4421。对图像进行随机缩放后重新训练后的mAP为0.4482,对图像的亮度调整后重新训练后的mAP为0.45。可见,根据建议重新训练后的AI模型优于重新训练前的。
下面结合具体的例子,针对评估数据集中的评估数据为人物图像,以及任务类型为物体检测的训练好的AI模型执行上述步骤。AI模型对评估数据集中的评估数据进行推理后,推理结果包括五类,分别为未带安全帽、带白色安全帽、带黄色安全帽、带红色安全帽和带蓝色安全帽。请参阅图12,图12是本申请实施例公开的一种用于安全帽检测的AI模型的FI值与置信度阈值的曲线。其中,F1值是根据比较结果计算AI模型对每个评估数据子集的推理的准确度获得评估结果的步骤计算出的。如图12所示,随着置信度阈值的增大,F1值先增大后减小。在置信度阈值为0.37时,F1值最大,因此,可以将置信度阈值设置为0.37。请参阅图13,图13是本申请实施例公开的一种用于安全帽检测的AI模型的P-R曲线。其中,P-R曲线是通过根据比较结果计算AI模型对每个评估数据子集的推 理的准确度获得评估结果的步骤计算得到的。如图13所示,五类检测结果的P-R曲线不同。在数据特征包括模糊度、评估指标包括召回率的情况下,评估报告可以包括AI模型对按照模糊度分布划分的4个评估数据子集的评估数据的召回率值,可以如表7所示:
Figure PCTCN2020097651-appb-000024
表7 按照模糊度分布划分的4个评估数据子集的评估数据的召回率值
通过表7可以得到图像的模糊度对未带安全帽的影响较大,相应地,可以给出增加图像的的模糊度在50%-85%之间以及图像的的模糊度在85%-100%之间的图像对AI模型进行训练的建议。在数据特征包括标注框的数量、评估指标包括召回率的情况下,评估报告可以包括AI模型对按照标注框的数量分布划分的4个评估数据子集的评估数据的召回率值,可以如表8所示:
Figure PCTCN2020097651-appb-000025
表8 按照标注框的数量分布划分的4个评估数据子集的评估数据的召回率值通过表8可以得到图像的模糊度对未带安全帽、带黄色安全帽和带白色安全帽的影响较大,相应地,可以给出增加标注框的数量在85%-100%之间的图像对AI模型进行训练的建议。
请参阅图14,图14是本申请实施例公开的另一种评估系统1400的结构示意图。如图14所示,该评估系统1400可以包括I/O模块1401、数据分析模块1402、推理模块1403。
可选地,该评估系统1400还可以包括诊断模块1404。
可选地,该评估系统1400还可以包括性能监测模块1405。
可选地,该评估系统1400还可以包括模型分析模块1406。
该评估系统1400中I/O模块1401、数据分析模块1402、推理模块1403、性能监测模块1405和模型分析模块1406的详细描述可以参考图6对应的方法实施例。
请参阅图15,图15是本申请实施例公开的又一种评估系统1500的结构示意图。如图15所示,该评估系统1500可以包括I/O模块1501、推理模块1502、性能监测模块1503和诊断模块1504。
可选地,该评估系统1500还可以包括模型分析模块1505。
该评估系统1500中I/O模块1501、推理模块1502、性能监测模块1503、诊断模块1504和模型分析模块1505的详细描述可以参考图8对应的方法实施例。
请参阅图16,图16为本申请实施例公开的一种计算设备的结构示意图。如图16所示,计算设备1600包括存储器1601、处理器1602、通信接口1603以及总线1604。其中,存储器1601、处理器1602、通信接口1603通过总线1604实现彼此之间的通信连接。
存储器1601可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器1601可以存储程序,当存储器1601中存储的程序被处理器1602执行时,处理器1602和通信接口1603用于执行前述图6或者图8为用户对AI模型进行评估的方法。存储器1601还可以存储评估数据集。
处理器1602可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路。
通信接口1603使用例如但不限于收发器一类的收发模块,来实现计算设备1600与其他设备或通信网络之间的通信。例如,可以通过通信接口1603获取评估数据集。
总线1604可包括在计算设备1600各个部件(例如,存储器1601、处理器1602、通信接口1603)之间传送信息的通路。
由于本申请提供的评估系统500、评估系统1400、评估系统1500中的各个模块可以分布式地部署在同一环境或不同环境中的多个计算机上,因此,请参阅图17,图17为本申请实施例公开的另一种计算设备的结构示意图。如图17所示的计算设备,该计算设备包括多个计算机,每个计算机包括存储器、处理器、通信接口以及总线。其中,存储器、处理器、通信接口通过总线实现彼此之间的通信连接。
存储器可以是ROM,静态存储设备,动态存储设备或者RAM。存储器可以存储程序,当存储器中存储的程序被处理器执行时,处理器和通信接口用于执行评估系统为用户对AI模型进行评估的部分方法。存储器还可以存储评估数据集,例如:存储器中的一部分存储资源被划分成一个数据集存储模块,用于存储评估系统所需的评估数据集,存储器中的一部分存储资源被划分成一个结果存储模块,用于存储评估报告。
处理器可以采用通用的CPU,微处理器,ASIC,GPU或者一个或多个集成电路。
通信接口使用例如但不限于收发器一类的收发模块,来实现计算机与其他设备或通信网络之间的通信。例如,可以通过通信接口获取评估数据集。
总线可包括在计算机各个部件(例如,存储器、处理器、通信接口)之间传送信息的通路。
上述每个计算机间通过通信网络建立通信通路。每个计算机上运行评估系统500、评估系统1400、评估系统1500中的任意一个或多个模块。任一计算机可以为云数据中心中的计算机(例如:服务器),或边缘数据中心中的计算机,或终端计算设备。
上述各个附图对应的流程的描述各有侧重,某个流程中没有详述的部分,可以参见其他流程的相关描述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。提供评估的计算机程序产品包括一个或多个进行评估的计算机指令,在计算机上加载和执行这些计算机程序指令时,全部或部分地产生按照本发明实施例图6或图8所述的流程或功能。
所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质存储有提供评估的计算机程序指令的可读存储介质。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如SSD)。

Claims (20)

  1. 一种人工智能AI模型的评估方法,其特征在于,包括:
    计算设备获取所述AI模型和评估数据集,所述评估数据集包括多个携带标签的评估数据,每个评估数据的标签用于表示所述评估数据对应的真实结果;
    所述计算设备根据数据特征对所述评估数据集中的评估数据进行分类,获得评估数据子集,所述评估数据子集为所述评估数据集的子集,所述评估数据子集中的所有评估数据的所述数据特征的值满足条件;
    所述计算设备确定所述AI模型对所述评估数据子集中的评估数据的推理结果,将所述评估数据子集中的每个评估数据的推理结果和所述评估数据子集中的每个评估数据的标签进行比较,根据比较结果计算所述AI模型对所述评估数据子集的推理的准确度,以获得所述AI模型对所述数据特征的值满足所述条件的数据的评估结果。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    所述计算设备生成对所述AI模型的优化建议,所述优化建议包括:用所述数据特征的值满足所述条件的新数据训练所述AI模型。
  3. 如权利要求1或2所述的方法,所述方法还包括:
    所述计算设备获取性能数据,所述性能数据表示在所述AI模型对所述评估数据进行推理的过程中,执行所述推理过程的硬件的性能表现,和/或,在所述AI模型对所述评估数据进行推理的过程中所述AI模型包括的算子的使用情况。
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:
    所述计算设备确定所述AI模型对所述评估数据集中的评估数据的推理结果;
    所述计算设备根据所述评估数据集中的评估数据的推理结果和所述评估数据集中的评估数据的标签的比较结果,计算所述AI模型对所述评估数据集的推理的准确度,以获得所述AI模型对全局数据的评估结果。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述数据特征的数量为多个,所述条件包括多个子条件,所述多个数据特征和所述多个子条件的关系为一一对应;
    所述计算设备根据数据特征对所述评估数据集中的评估数据进行分类,获得评估数据子集,包括:
    所述计算设备根据所述多个数据特征对所述评估数据集中的评估数据进行分类,获得评估数据子集,其中,所述评估数据子集中的所有评估数据的所述多个数据特征的值中的每个值满足所述条件中对应的子条件。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述评估数据集中的评估数据为图像或者音频。
  7. 一种人工智能AI模型的评估方法,其特征在于,包括:
    计算设备获取所述AI模型和评估数据集,所述评估数据集包括多个携带标签的评估数据,每个评估数据的标签用于表示所述评估数据对应的真实结果;
    所述计算设备利用所述AI模型对所述评估数据集中的评估数据进行推理;
    所述计算设备获取性能数据,所述性能数据表示在所述AI模型对所述评估数据进行推理的过程中,执行所述推理过程的硬件的性能表现,和/或,在所述AI模型对所述评估 数据进行推理的过程中所述AI模型包括的算子的使用情况;
    所述计算设备根据所述性能数据,生成对所述AI模型的优化建议,所述优化建议包括:对所述AI模型的结构进行调整,和/或,对所述AI模型的算子进行优化训练。
  8. 如权利要求7所述的方法,其特征在于,在所述AI模型对所述评估数据进行推理的过程中所述AI模型包括的算子的使用情况,包括:所述AI模型的算子的使用时长、所述AI模型的算子的使用数量。
  9. 如权利要求7或8所述的方法,其特征在于,所述评估数据集中的评估数据为图像或音频。
  10. 一种人工智能AI模型的评估系统,其特征在于,所述系统包括:
    输入输出I/O模块,用于获取所述AI模型和评估数据集,所述评估数据集包括多个携带标签的评估数据,每个评估数据的标签用于表示所述评估数据对应的真实结果;
    数据分析模块,用于根据数据特征对所述评估数据集中的评估数据进行分类,获得评估数据子集,所述评估数据子集为所述评估数据集的子集,所述评估数据子集中的所有评估数据的所述数据特征的值满足条件;
    推理模块,用于确定所述AI模型对所述评估数据子集中的评估数据的推理结果;
    所述数据分析模块,还用于将所述评估数据子集中的每个评估数据的推理结果和所述评估数据子集中的每个评估数据的标签进行比较,根据比较结果计算所述AI模型对所述评估数据子集的推理的准确度,以获得所述AI模型对所述数据特征的值满足所述条件的数据的评估结果。
  11. 如权利要求10所述的系统,其特征在于,所述系统还包括:
    诊断模块,用于生成对所述AI模型的优化建议,所述优化建议包括:用所述数据特征的值满足所述条件的新数据训练所述AI模型。
  12. 如权利要求10或11所述的系统,其特征在于,所述系统还包括:
    性能监测模块,用于获取性能数据,所述性能数据表示在所述AI模型对所述评估数据进行推理的过程中,执行所述推理过程的硬件的性能表现,和/或,在所述AI模型对所述评估数据进行推理的过程中所述AI模型包括的算子的使用情况。
  13. 如权利要求10-12任一项所述的系统,其特征在于,所述推理模块,还用于确定所述AI模型对所述评估数据集中的评估数据的推理结果;
    所述系统还包括:
    模型分析模块,用于根据所述评估数据集中的评估数据的推理结果和所述评估数据集中的评估数据的标签的比较结果,计算所述AI模型对所述评估数据集的推理的准确度,以获得所述AI模型对全局数据的评估结果。
  14. 如权利要求10-13任一项所述的系统,其特征在于,所述数据特征的数量为多个,所述条件包括多个子条件,所述多个数据特征和所述多个子条件的关系为一一对应;
    所述数据分析模块,具体用于根据所述多个数据特征对所述评估数据集中的评估数据进行分类,获得评估数据子集,其中,所述评估数据子集中的所有评估数据的所述多个数据特征的值中的每个值满足所述条件中对应的子条件。
  15. 如权利要求10-14任一项所述的系统,其特征在于,所述评估数据集中的评估数据为图像或者音频。
  16. 一种人工智能AI模型的评估系统,其特征在于,所述系统包括:
    输入输出I/O模块,用于获取所述AI模型和评估数据集,所述评估数据集包括多个携带标签的评估数据,每个评估数据的标签用于表示所述评估数据对应的真实结果;
    推理模块,用于利用所述AI模型对所述评估数据集中的评估数据进行推理;
    性能监测模块,用于获取性能数据,所述性能数据用于表示在所述AI模型对所述评估数据进行推理的过程中,执行所述推理过程的硬件的性能表现,或者在所述AI模型对所述评估数据进行推理的过程中所述AI模型包括的算子的使用情况;
    诊断模块,用于根据所述性能数据,生成对所述AI模型的优化建议,所述优化建议包括:对所述AI模型的结构进行调整,和/或,对所述AI模型的算子进行优化训练。
  17. 如权利要求16所述的系统,其特征在于,在所述AI模型对所述评估数据进行推理的过程中所述AI模型包括的算子的使用情况,包括:所述AI模型的算子的使用时长、所述AI模型的算子的使用数量。
  18. 如权利要求16或17所述的系统,其特征在于,所述评估数据集中的评估数据为图像或音频。
  19. 一种计算设备,其特征在于,所述计算设备包括存储器和处理器,所述存储器用于存储一组计算机指令;
    所述处理器执行所述存储器存储的一组计算机指令,以执行上述权利要求1至9中任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被计算设备执行时,所述计算设备执行上述权利要求1至9中任一项所述的方法。
PCT/CN2020/097651 2019-09-16 2020-06-23 人工智能ai模型的评估方法、系统及设备 WO2021051917A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20865555.5A EP4024297A4 (en) 2019-09-16 2020-06-23 ARTIFICIAL INTELLIGENCE (AI) MODEL EVALUATION METHOD AND SYSTEM, AND DEVICE
US17/696,040 US20220207397A1 (en) 2019-09-16 2022-03-16 Artificial Intelligence (AI) Model Evaluation Method and System, and Device

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201910872910 2019-09-16
CN201910872910.1 2019-09-16
CN201911425487.7A CN112508044A (zh) 2019-09-16 2019-12-31 人工智能ai模型的评估方法、系统及设备
CN201911425487.7 2019-12-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/696,040 Continuation US20220207397A1 (en) 2019-09-16 2022-03-16 Artificial Intelligence (AI) Model Evaluation Method and System, and Device

Publications (1)

Publication Number Publication Date
WO2021051917A1 true WO2021051917A1 (zh) 2021-03-25

Family

ID=74883928

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097651 WO2021051917A1 (zh) 2019-09-16 2020-06-23 人工智能ai模型的评估方法、系统及设备

Country Status (3)

Country Link
US (1) US20220207397A1 (zh)
EP (1) EP4024297A4 (zh)
WO (1) WO2021051917A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116701923A (zh) * 2022-10-13 2023-09-05 荣耀终端有限公司 算子性能的评估方法和评估装置
WO2023208840A1 (en) * 2022-04-29 2023-11-02 Interdigital Ce Patent Holdings, Sas Methods, architectures, apparatuses and systems for distributed artificial intelligence
CN118172589A (zh) * 2024-02-02 2024-06-11 北京视觉世界科技有限公司 自动化模型质量评估方法、装置、设备及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6800453B1 (ja) * 2020-05-07 2020-12-16 株式会社 情報システムエンジニアリング 情報処理装置及び情報処理方法
WO2024031984A1 (zh) * 2022-08-10 2024-02-15 华为云计算技术有限公司 一种任务处理系统、任务处理的方法及装置
WO2024064025A1 (en) * 2022-09-22 2024-03-28 Apple Inc. Feedback-based ai/ml models adaptation in wireless networks
CN116528282B (zh) * 2023-07-04 2023-09-22 亚信科技(中国)有限公司 覆盖场景识别方法、装置、电子设备和可读存储介质
CN117371943A (zh) * 2023-10-17 2024-01-09 江苏润和软件股份有限公司 一种基于数据驱动的ai中台模型管理方法及ai中台系统
JP7539738B1 (ja) 2023-11-20 2024-08-26 株式会社Adansons プログラム、方法、情報処理装置、システム

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232457A1 (en) * 2015-02-11 2016-08-11 Skytree, Inc. User Interface for Unified Data Science Platform Including Management of Models, Experiments, Data Sets, Projects, Actions and Features
CN106250986A (zh) * 2015-06-04 2016-12-21 波音公司 用于机器学习的高级分析基础构架
CN109376419A (zh) * 2018-10-16 2019-02-22 北京字节跳动网络技术有限公司 一种数据建模的方法、装置、电子设备及可读介质
CN110135592A (zh) * 2019-05-16 2019-08-16 腾讯科技(深圳)有限公司 分类效果确定方法、装置、智能终端及存储介质
CN110210558A (zh) * 2019-05-31 2019-09-06 北京市商汤科技开发有限公司 评估神经网络性能的方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6460095B2 (ja) * 2014-03-28 2019-01-30 日本電気株式会社 学習モデル選択システム、学習モデル選択方法及びプログラム
US20170220930A1 (en) * 2016-01-29 2017-08-03 Microsoft Technology Licensing, Llc Automatic problem assessment in machine learning system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232457A1 (en) * 2015-02-11 2016-08-11 Skytree, Inc. User Interface for Unified Data Science Platform Including Management of Models, Experiments, Data Sets, Projects, Actions and Features
CN106250986A (zh) * 2015-06-04 2016-12-21 波音公司 用于机器学习的高级分析基础构架
CN109376419A (zh) * 2018-10-16 2019-02-22 北京字节跳动网络技术有限公司 一种数据建模的方法、装置、电子设备及可读介质
CN110135592A (zh) * 2019-05-16 2019-08-16 腾讯科技(深圳)有限公司 分类效果确定方法、装置、智能终端及存储介质
CN110210558A (zh) * 2019-05-31 2019-09-06 北京市商汤科技开发有限公司 评估神经网络性能的方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4024297A4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023208840A1 (en) * 2022-04-29 2023-11-02 Interdigital Ce Patent Holdings, Sas Methods, architectures, apparatuses and systems for distributed artificial intelligence
CN116701923A (zh) * 2022-10-13 2023-09-05 荣耀终端有限公司 算子性能的评估方法和评估装置
CN116701923B (zh) * 2022-10-13 2024-05-17 荣耀终端有限公司 算子性能的评估方法和评估装置
CN118172589A (zh) * 2024-02-02 2024-06-11 北京视觉世界科技有限公司 自动化模型质量评估方法、装置、设备及存储介质

Also Published As

Publication number Publication date
EP4024297A4 (en) 2022-11-09
EP4024297A1 (en) 2022-07-06
US20220207397A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
WO2021051917A1 (zh) 人工智能ai模型的评估方法、系统及设备
CN112508044A (zh) 人工智能ai模型的评估方法、系统及设备
CN108921206B (zh) 一种图像分类方法、装置、电子设备及存储介质
CN108090902B (zh) 一种基于多尺度生成对抗网络的无参考图像质量客观评价方法
WO2021077841A1 (zh) 一种基于循环残差网络的信号调制识别方法及装置
Mittal et al. Blind image quality assessment without human training using latent quality factors
WO2019228089A1 (zh) 人体属性识别方法、装置、设备及介质
CN108171191B (zh) 用于检测人脸的方法和装置
CN109783879B (zh) 一种雷达辐射源信号识别效能评估方法及系统
WO2019242627A1 (zh) 一种数据处理方法及其装置
CN111612261B (zh) 基于区块链的金融大数据分析系统
WO2018006631A1 (zh) 一种用户等级自动划分方法及系统
CN106663210A (zh) 基于感受的多媒体处理
CN111353377A (zh) 一种基于深度学习的电梯乘客数检测方法
US20220245448A1 (en) Method, device, and computer program product for updating model
CN107623924A (zh) 一种验证影响关键质量指标kqi相关的关键性能指标kpi的方法和装置
CN113221721A (zh) 图像识别方法、装置、设备及介质
Alasadi et al. A fairness-aware fusion framework for multimodal cyberbullying detection
CN113128329A (zh) 用于在自主驾驶应用中更新对象检测模型的视觉分析平台
CN115309985A (zh) 推荐算法的公平性评估方法及ai模型选择方法
CN113343123B (zh) 一种生成对抗多关系图网络的训练方法和检测方法
CN108268877A (zh) 一种识别目标终端的方法和装置
CN116468479A (zh) 确定页面质量评估维度方法、页面质量的评估方法和装置
WO2023086918A1 (en) Methods and systems for identifying and reducing gender bias amplification
CN111461199B (zh) 基于分布的垃圾邮件分类数据的安全属性选择方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20865555

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020865555

Country of ref document: EP

Effective date: 20220328