CN110807566A

CN110807566A - Artificial intelligence model evaluation method, device, equipment and storage medium

Info

Publication number: CN110807566A
Application number: CN201910849376.2A
Authority: CN
Inventors: 柴华
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-09
Filing date: 2019-09-09
Publication date: 2020-02-18

Abstract

The embodiment of the invention discloses an artificial intelligence model evaluation method, an artificial intelligence model evaluation device and a storage medium, wherein the artificial intelligence model to be evaluated and the evaluation attribute of the artificial intelligence model to be evaluated are obtained; obtaining an evaluation parameter according to the artificial intelligence model to be evaluated and the evaluation attribute; constructing an evaluation set according to the evaluation parameters; and evaluating the artificial intelligence model to be evaluated according to the evaluation set, and outputting an evaluation result. The scheme can be more refined in verification of the artificial intelligence degree and the capability.

Description

Artificial intelligence model evaluation method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an artificial intelligence model evaluation method, an artificial intelligence model evaluation device, an artificial intelligence model evaluation equipment and a storage medium.

Background

With the development of the artificial intelligence industry, the demand for evaluating artificial intelligence products is increasing.

At present, the evaluation proposed by the artificial intelligence industry development alliance is mainly from the perspective of a judge, and whether an artificial intelligence product can obtain a passing standard or not is judged by giving a fixed standard index value. The evaluation of the product usually pays more attention to whether the characteristics of the product and the functions under a given scene are met, but the evaluation is rough in verification of the artificial intelligence degree and capability, and the more detailed comparison evaluation and defect mining cannot be achieved.

Disclosure of Invention

The embodiment of the invention provides an artificial intelligence model evaluation method, an artificial intelligence model evaluation device, an artificial intelligence model evaluation equipment and a storage medium, which can improve the accuracy and efficiency of evaluation.

In order to solve the above technical problems, an embodiment of the present invention provides an artificial intelligence model evaluation method, specifically providing the following technical solutions:

acquiring an artificial intelligence model to be evaluated and evaluation attributes of the artificial intelligence model to be evaluated;

obtaining an evaluation parameter according to the artificial intelligence model to be evaluated and the evaluation attribute;

constructing an evaluation set according to the evaluation parameters;

and evaluating the artificial intelligence model to be evaluated according to the evaluation set, and outputting an evaluation result.

Optionally, in some embodiments, the evaluating attribute includes a granularity of an artificial intelligence model to be evaluated, and the obtaining an evaluation parameter according to the artificial intelligence model to be evaluated and the evaluating attribute includes:

obtaining an evaluation target of the artificial intelligence model to be evaluated according to the granularity of the artificial intelligence model to be evaluated;

and acquiring corresponding evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation target.

Optionally, in some embodiments, the obtaining, according to the artificial intelligence model to be evaluated and the evaluation target, a corresponding evaluation parameter includes:

acquiring a corresponding evaluation parameter mapping relation according to the evaluation target;

obtaining an evaluation parameter corresponding to the artificial intelligence model to be evaluated according to the evaluation parameter mapping relation;

displaying the evaluation parameters through a display interface;

and confirming the selected evaluation parameters based on the selection instruction received by the display interface.

Optionally, in some embodiments, the evaluating attribute includes an evaluating classification, and the obtaining an evaluation parameter according to the artificial intelligence model to be evaluated and the evaluating attribute includes:

acquiring a development stage of the artificial intelligence model to be evaluated;

determining the evaluation classification of the artificial intelligence model to be evaluated according to the development stage;

and obtaining evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation classification.

Optionally, in some embodiments, the evaluating classification includes offline evaluation and online evaluation, and the obtaining of the evaluation parameter according to the artificial intelligence model to be evaluated and the evaluating classification includes:

if the evaluation classification is determined to be offline evaluation, acquiring evaluation parameters by combining the artificial intelligence model to be evaluated in an offline acquisition mode;

and if the evaluation classification is determined to be on-line evaluation, acquiring evaluation parameters on line in a preset time period by combining the artificial intelligence model to be evaluated in an on-line acquisition mode.

Optionally, in some embodiments, the constructing an evaluation set according to the evaluation parameters includes:

preprocessing the obtained evaluation parameters to obtain preprocessed evaluation parameters;

acquiring a label of the preprocessed evaluation parameter;

and constructing an evaluation set according to the preprocessed evaluation parameters and the corresponding label tags.

Optionally, in some embodiments, the evaluating the object to be evaluated according to the evaluation set, and after obtaining an evaluation result, the evaluating further includes:

obtaining an adjustment parameter of the object to be evaluated according to the evaluation result;

and monitoring and adjusting the artificial intelligence model to be evaluated based on the adjustment parameters.

Correspondingly, the embodiment of the invention also provides an artificial intelligence model evaluating device, which comprises:

the system comprises a first obtaining unit, a second obtaining unit and a judging unit, wherein the first obtaining unit is used for obtaining an artificial intelligence model to be evaluated and evaluation attributes of the artificial intelligence model to be evaluated;

the second obtaining unit is used for obtaining an evaluation parameter according to the artificial intelligence model to be evaluated and the evaluation attribute;

the construction unit is used for constructing an evaluation set according to the evaluation parameters;

and the evaluation unit is used for evaluating the artificial intelligence model to be evaluated according to the evaluation set and outputting an evaluation result.

Optionally, in some embodiments, the second obtaining unit includes:

the first obtaining subunit is used for obtaining an evaluation target of the artificial intelligence model to be evaluated according to the granularity of the artificial intelligence model to be evaluated;

and the second obtaining subunit is used for obtaining corresponding evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation target.

Optionally, in some embodiments, the second obtaining subunit includes:

the obtaining module is used for obtaining a corresponding evaluation parameter mapping relation according to the evaluation target; obtaining an evaluation parameter corresponding to the artificial intelligence model to be evaluated according to the evaluation parameter mapping relation;

the display module is used for displaying the evaluation parameters through a display interface;

and the confirming module is used for confirming the selected evaluation parameters based on the selection instruction received by the display interface.

Optionally, in some embodiments, the second obtaining unit further includes:

the third acquisition subunit is used for acquiring the development stage of the artificial intelligence model to be evaluated;

the determining subunit is used for determining the evaluation classification of the artificial intelligence model to be evaluated according to the development stage;

and the fourth obtaining subunit is used for obtaining the evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation classification.

Optionally, in some embodiments, the fourth obtaining subunit includes:

the off-line obtaining module is used for obtaining an evaluation parameter by combining the artificial intelligence model to be evaluated in an off-line obtaining mode if the evaluation classification is determined to be off-line evaluation;

and the online acquisition module is used for acquiring the evaluation parameters online in a preset time period by combining the artificial intelligence model to be evaluated in an online acquisition mode if the evaluation is determined to be classified as online evaluation.

Optionally, in some embodiments, the building unit includes:

the preprocessing subunit is used for preprocessing the obtained evaluation parameters to obtain preprocessed evaluation parameters;

a fifth obtaining subunit, configured to obtain a label of the preprocessed evaluation parameter;

and the construction subunit is used for constructing an evaluation set according to the preprocessed evaluation parameters and the corresponding label tags.

Optionally, in some embodiments, the artificial intelligence model evaluating apparatus further includes:

the third obtaining unit is used for obtaining the adjusting parameters of the object to be evaluated according to the evaluation result;

and the adjusting unit is used for monitoring and adjusting the artificial intelligence model to be evaluated based on the adjusting parameters.

In addition, an embodiment of the present invention further provides an evaluation apparatus, including: a processor and a memory; the memory stores a plurality of instructions, and the processor loads the instructions stored in the memory to execute any one of the artificial intelligence model evaluation methods provided by the embodiments of the present invention.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where multiple instructions are stored in the computer-readable storage medium, and the instructions are suitable for being loaded by a processor to perform any one of the artificial intelligence model evaluation methods provided in the embodiments of the present invention.

The embodiment of the invention obtains the artificial intelligence model to be evaluated and the evaluation attribute of the artificial intelligence model to be evaluated; then obtaining an evaluation parameter according to the artificial intelligence model to be evaluated and the evaluation attribute, and constructing an evaluation set according to the evaluation parameter so that the constructed evaluation set is matched with the artificial intelligence model to be evaluated; when the artificial intelligence model to be evaluated is evaluated according to the evaluation set, the output evaluation result is more accurate, and the verification of the artificial intelligence degree and the artificial intelligence capability is more precise.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a method for evaluating an artificial intelligence model according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for evaluating an artificial intelligence model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a basic component flow of an artificial intelligence model evaluation method provided by an embodiment of the invention;

FIG. 4a is a schematic structural diagram of an artificial intelligence model evaluation device according to an embodiment of the present invention;

FIG. 4b is a schematic structural diagram of an artificial intelligence model evaluation device according to an embodiment of the present invention

FIG. 4c is another schematic structural diagram of an artificial intelligence model evaluation device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides an artificial intelligence model evaluation method, an artificial intelligence model evaluation device, an artificial intelligence model evaluation equipment and a computer readable storage medium. The following are detailed below.

The artificial intelligence model evaluating device can be integrated in a terminal, the terminal can be a mobile phone, a tablet computer, a notebook computer and other equipment, and the artificial intelligence model evaluating device can also be a server, such as a cloud server.

In a specific implementation process, when the artificial intelligence model evaluation device is integrated in the server, the artificial intelligence model to be evaluated is firstly obtained, wherein the artificial intelligence model to be evaluated may include an artificial intelligence sound box, an artificial intelligence robot, an artificial intelligence home, and the like, and is not limited herein. Obtaining the evaluation attribute of the artificial intelligence model to be evaluated, wherein the evaluation attribute of the artificial intelligence model to be evaluated can comprise the granularity of the artificial intelligence model to be evaluated, evaluation classification and the like, and the granularity of the artificial intelligence model to be evaluated can comprise artificial intelligence product end evaluation, abstract module evaluation, technical scheme evaluation and the like; the profile classification may include online profiling as well as offline profiling. The different types of artificial intelligence models require different evaluation parameters, so that specific evaluation parameters can be obtained according to the artificial intelligence model to be evaluated and the evaluation attributes, then a corresponding evaluation set is constructed according to the evaluation parameters, and the artificial intelligence model to be evaluated is evaluated according to the constructed evaluation set, so that an evaluation result is obtained and output.

Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Key technologies for Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The automatic driving technology generally comprises technologies such as high-precision maps, environment perception, behavior decision, path planning, motion control and the like, and the self-determined driving technology has wide application prospect,

with the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to the technology of various aspects of artificial intelligence, and is specifically explained by the following embodiment. The order of the following examples is not intended to limit the preferred order of the examples.

In this embodiment, a description will be given from the perspective of an artificial intelligence model evaluating apparatus, which may be specifically integrated in a network device such as a terminal or a server.

As shown in fig. 1, the specific process of the artificial intelligence model evaluation device may be as follows:

101, obtaining an artificial intelligence model to be evaluated and an evaluation attribute of the artificial intelligence model to be evaluated.

In a specific implementation process, the artificial intelligence model to be evaluated can be obtained according to specific requirements, for example, when a user needs to evaluate a technical implementation scheme adopted by the artificial intelligence sound box, the obtained artificial intelligence model to be evaluated is the artificial intelligence sound box; or periodically obtaining the model according to the evaluation period of the artificial intelligence model to be evaluated, for example, if the evaluation period of the artificial intelligence robot is set to be 24 hours, and the evaluation time of the artificial intelligence robot is reached, the obtained artificial intelligence model to be evaluated is the artificial intelligence robot. The specific acquired artificial intelligence model is not limited. When the artificial intelligence model to be evaluated is obtained, the evaluation parameters corresponding to the artificial intelligence model to be evaluated may be obtained together, for example, if the obtained artificial intelligence model to be evaluated is an artificial intelligence sound box, the obtained corresponding evaluation parameters may include user satisfaction, task completion rate of a task dialog system, word error rate or sentence error rate of speech recognition, and the like.

And 102, obtaining an evaluation parameter according to the artificial intelligence model to be evaluated and the evaluation attribute.

When the evaluation attribute is the granularity of the artificial intelligence model to be evaluated, an evaluation target of the artificial intelligence model to be evaluated can be obtained according to the granularity of the artificial intelligence model to be evaluated, and corresponding evaluation parameters are obtained according to the artificial intelligence model to be evaluated and the evaluation target. When the evaluation attribute is evaluation classification, a development stage of an artificial intelligence model to be evaluated is obtained; then determining the evaluation classification of the artificial intelligence model to be evaluated according to the development stage; and obtaining evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation classification.

Specifically, when the evaluation attribute is the granularity of the artificial intelligence model to be evaluated, step 202 includes:

and A1, obtaining the evaluation target of the artificial intelligence model to be evaluated according to the granularity of the artificial intelligence model to be evaluated.

Obtaining an evaluation target of the artificial intelligence model to be evaluated according to the granularity of the artificial intelligence model to be evaluated, wherein the granularity of the artificial intelligence model to be evaluated can comprise product end evaluation, abstract module evaluation and technical scheme evaluation, and the product end evaluation can specifically comprise a complete product end, such as an intelligent sound box, a vehicle-mounted voice assistant, translation concurrent transmission, face recognition and the like; sub-products such as semantic understanding services, speech recognition services, machine translation services, object detection services, and the like; the abstract module can comprise a domain intention classification module, a named entity recognition module, an entity error correction module, a voice activity detection module, a wake-up module, an acoustic model, a language model, an image Chinese character recognition module, a named entity recognition module, a text-to-speech module and the like; the technical solution may include classifiers such as logistic regression algorithms, named entity recognition algorithms, conditional random field algorithms, speech recognition language models, recurrent neural networks, and the like.

Before the implementation, a mapping relation between each artificial intelligence model granularity to be evaluated and a corresponding evaluation target can be established. For example, if the granularity of the artificial intelligence model to be evaluated is product end evaluation, the evaluation target of the product may include knowing the capability position of the product in the industry, knowing the head defect of the current product, predicting the online service satisfaction degree of the product, and performing horizontal classification and vertical analysis by Badcase (bad case), then establishing the mapping relationship between the product end evaluation and the corresponding evaluation target. And if the granularity of the artificial intelligence model to be evaluated is the evaluation of the abstract module, establishing a mapping relation between the evaluation of the abstract module and a corresponding evaluation target. Or directly acquiring the mapping relation between the granularity of the artificial intelligence model to be evaluated and the corresponding evaluation target from the server and the local database, and then acquiring the corresponding evaluation target according to the acquired mapping relation.

And A2, acquiring corresponding evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation target.

And then, acquiring corresponding evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation target, wherein the evaluation parameters have a mapping relation with the artificial intelligence model to be evaluated and the evaluation target, for example, when the user satisfaction of the artificial intelligence sound box needs to be evaluated, the evaluation parameters can comprise the use evaluation of the user, and when the artificial intelligence model to be evaluated is the artificial intelligence sound box and the evaluation target is the user satisfaction, the correspondingly acquired evaluation parameters are the user evaluation, and specifically comprise the scoring, the character evaluation and the like of the user. The specific mapping relationship may also be obtained from a server or a local database, and the obtained mapping relationship is established according to the specific evaluation parameter, the evaluation target and the evaluation model, which is not limited herein.

The number of the evaluation parameters corresponding to the artificial intelligence model to be evaluated and the evaluation target may be multiple, and at this time, specific evaluation parameters may be further selected according to the actual needs of the user, that is, step a2 may include:

b1, acquiring a corresponding evaluation parameter mapping relation according to the evaluation target;

b2, obtaining an evaluation parameter corresponding to the artificial intelligence model to be evaluated according to the evaluation parameter mapping relation;

b3, displaying the evaluation parameters through a display interface;

and B4, confirming the selected evaluation parameters based on the selection instruction received by the display interface.

Specifically, a corresponding evaluation parameter mapping relation is obtained according to an evaluation target, an evaluation parameter corresponding to the artificial intelligence model to be evaluated is obtained according to the evaluation parameter mapping relation, the evaluation parameter is displayed through a display interface for a user to select, and the user can trigger a selection instruction through clicking or touching the display interface. And when a selection instruction is received, confirming the selected evaluation parameters based on the selection instruction.

Further, evaluating attributes further includes evaluating classifications, in which case step 202 includes:

c1, obtaining the development stage of the artificial intelligence model to be evaluated;

c2, determining the evaluation classification of the artificial intelligence model to be evaluated according to the development stage;

and C3, obtaining evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation classification.

Specifically, the evaluation classification of the artificial intelligence model to be evaluated can be determined according to the development stage of the artificial intelligence model to be evaluated. Specifically, the development stage of the product may include technology type selection, product development, and online operation, and in general, the evaluation adopted in the technology type selection and product development stage is classified as offline evaluation, and the evaluation adopted in the online operation stage is classified as online evaluation. Therefore, the evaluation classification of the artificial intelligence model to be evaluated can be determined as off-line evaluation or on-line evaluation through the development stage of obtaining the artificial intelligence model to be evaluated, and evaluation parameters can be obtained according to the evaluation artificial intelligence model and the evaluation classification after the evaluation classification of the artificial intelligence model to be evaluated is determined.

Specifically, step C3 may include:

d1, if the evaluation is determined to be classified as off-line evaluation, acquiring evaluation parameters by combining the artificial intelligence model to be evaluated in an off-line acquisition mode;

d2, if the evaluation is determined to be classified as on-line evaluation, obtaining evaluation parameters on line in a preset time period by combining the artificial intelligence model to be evaluated in an on-line obtaining mode.

If the evaluation classification is determined to be offline evaluation, acquiring evaluation parameters by combining the artificial intelligence model to be evaluated in an offline acquisition mode, for example, performing offline collection according to the artificial intelligence model to be evaluated, for example, acquiring from crowdsourcing tasks, open source tasks, assembling tasks and generating tasks; if the evaluation is determined to be classified as on-line evaluation, the evaluation parameters are obtained on line in a preset time period by combining the artificial intelligence model to be evaluated in an on-line obtaining mode, for example, the evaluation parameters are obtained by combining the artificial intelligence model to be evaluated and performing on-line sampling at fixed time intervals, such as user ID-Hash sampling, importance sampling and the like.

103, constructing an evaluation set according to the evaluation parameters.

After the evaluation attributes and the evaluation parameters are determined, a corresponding evaluation set can be constructed according to the specific attributes and the obtained evaluation parameters. The evaluation set can be divided into two types from a large level: an offline evaluation set and an online evaluation set.

The specific evaluation set construction process may include:

e1, preprocessing the obtained evaluation parameters to obtain preprocessed evaluation parameters;

e2, obtaining the label of the preprocessed evaluation parameter;

e3, constructing an evaluation set according to the preprocessed evaluation parameters and the corresponding label labels.

Preprocessing the obtained evaluation parameters to obtain preprocessed evaluation parameters, wherein the preprocessing may include proportioning or screening, such as deleting invalid data sources or sampling, to obtain labeling labels of the preprocessed evaluation parameters, and constructing an evaluation set according to the preprocessed evaluation parameters and the corresponding labeling labels.

The evaluation set may include an offline evaluation set and an online evaluation set, and when the evaluation set is the offline evaluation set, the specific construction process may include:

the offline evaluation set mainly comprises three elements of evaluation resources (Resource), result labels (Label) and Label classification (Tag), and the evaluation resources can be assembled according to sessions (Session) and used for reserving context of a use scene and index output of Session dimensions.

Evaluation resources (Resource): refers to the raw input data used for evaluation. Such as the original text corpus of a text dialog system, the original audio of speech recognition, the original corpus of text-to-speech, the translated original text corpus.

Results labeling (Label): the expected result or result candidate which should be returned by the evaluation system under the ideal state is generally marked as a true value manually. Such as the correct domain truth in domain classification.

Label classification (Tag): and representing the resource subdivision for the classification index statistics when the index is output. Such as the original text of the dialog system labeling the single entity, the label containing wrong words, and the like, and the label of gender, age, and the like in the speech recognition.

The offline profile construction process may include: and obtaining each path of data source, performing proportioning or screening, for example, deleting the invalid data source, then obtaining the labeled label, and warehousing, thus completing the construction of the evaluation set.

The storage of the evaluation set may be stored in a database format, such as a mysql database or mongodb database, to facilitate query and modification synchronization. While in file format for version control, backup, and offline transfer.

The offline evaluation set itself also needs to be continuously corrected and promoted, since the marking accuracy is limited, the offline evaluation set needs to be frequently corrected, and can be marked as the iteration of Patch, and when the Label is updated in the evaluation set each time, the Patch is updated to the current update time. The update of Patch can be used to backtrack once the metrics are evaluated offline.

When the evaluation set is an online evaluation set, the specific construction process may include:

the online evaluation set mainly comes from real data on a business product line, and the labeling can be performed by manual labeling in each evaluation period (such as daily/weekly) and by an automatic labeling method.

The online evaluation set construction process may include: and (3) acquiring data on line, then sampling the acquired data to obtain the sampled data, labeling the sampled data, and carrying out Pipeline conveying or warehousing to finish the structure of the evaluation set.

One of the storage of the evaluation set is direct Pipeline, and the data directly enters manual marking after being sampled or the marking result is output through a calculation module. While being backed up in a database or file format.

And 104, evaluating the artificial intelligence model to be evaluated according to the evaluation set, and outputting an evaluation result.

And then evaluating the artificial intelligence model to be evaluated according to the constructed evaluation set, and outputting an evaluation result.

After the evaluation result is output, the artificial intelligence model to be evaluated may be further monitored and adjusted according to the evaluation result, that is, after step 204, the method further includes:

f1, obtaining the adjustment parameters of the object to be evaluated according to the evaluation result;

f2, monitoring and adjusting the artificial intelligence model to be evaluated based on the adjustment parameters.

After the evaluation result is output, the adjustment parameters of the artificial intelligence model to be evaluated, that is, the optimization target, need to be formulated, usually by means of the current capability value of a product corresponding to the artificial intelligence model to be evaluated, such as a competitive product, as a reference for formulating the adjustment parameters. Such as first determining to promote capacity to align with the capacity of the current competitor within six months. And acquiring the adjustment parameters of the object to be evaluated according to the output evaluation result, the current capability value of the competitive product and the determined time period. And based on the adjustment parameters, monitoring and adjusting the artificial intelligence model to be evaluated, namely a continuous iterative tracking process in a set time period. And particularly, iteratively promoting by means of the patch of the version. The Version (Version) is specific to the evaluation set, and deletion adjustment can be added to evaluation data; the Patch (Patch) is a label aiming at the evaluation set and mainly corrects the place where the manual label is inaccurate. One very important feature of patches is the support of backtracking. Backtracking is to recalculate the metrics through the output of the previous system and the latest updated patches, thereby updating the results of the old metrics to meet the purpose that the same set of metrics comes from the same evaluation set. And the evaluation process is smoother by alternately updating the versions and the patches, such as updating one version a month and updating the patches once a week.

The embodiment comprises the steps of obtaining an artificial intelligence model to be evaluated and evaluation attributes of the artificial intelligence model to be evaluated; then obtaining an evaluation parameter according to the artificial intelligence model to be evaluated and the evaluation attribute, and constructing an evaluation set according to the evaluation parameter so that the constructed evaluation set is matched with the artificial intelligence model to be evaluated; when the artificial intelligence model to be evaluated is evaluated according to the evaluation set, the output evaluation result is more accurate, and the verification of the artificial intelligence degree and the artificial intelligence capability is more precise.

The method according to the preceding embodiment is illustrated in further detail below by way of example.

In this embodiment, the artificial intelligence model evaluation device is specifically integrated in a terminal, and an artificial intelligence speaker is exemplified as an artificial intelligence model to be evaluated. The terminal may include a mobile phone, etc.

As shown in fig. 2, a specific process of the artificial intelligence model evaluation method may be as follows:

201, obtaining an artificial intelligence sound box model to be evaluated and evaluation attributes of the artificial intelligence sound box to be evaluated;

the artificial intelligence sound box model to be evaluated can comprise an artificial intelligence sound box in a technology model selection stage, a product development stage and an online operation stage, and therefore the evaluation attributes of the artificial intelligence sound box to be evaluated can comprise the satisfaction degree of a user on the artificial intelligence sound box model to be evaluated, the task completion rate of a task dialogue system of the artificial intelligence sound box model to be evaluated, the word error rate or the sentence error rate of voice recognition and the like.

And 202, obtaining an evaluation parameter according to the artificial intelligent sound box model to be evaluated and the evaluation attribute.

When the evaluation attribute is the granularity of the artificial intelligence sound box model to be evaluated, an evaluation target of the artificial intelligence sound box model to be evaluated can be obtained according to the granularity of the artificial intelligence sound box model to be evaluated, and corresponding evaluation parameters are obtained according to the artificial intelligence sound box model to be evaluated and the evaluation target. When the evaluation attribute is evaluation classification, a development stage of an artificial intelligent sound box model to be evaluated is obtained; then determining the evaluation classification of the artificial intelligent sound box model to be evaluated according to the development stage; and obtaining evaluation parameters according to the artificial intelligent sound box model to be evaluated and the evaluation classification.

The granularity of the artificial intelligent sound box model to be evaluated can specifically comprise artificial intelligent sound box end evaluation, abstract module evaluation in the artificial intelligent sound box and artificial intelligent sound box technical scheme evaluation, and the product end evaluation can specifically comprise a complete artificial intelligent sound box; sub-products of the artificial intelligence speaker can also be included, such as semantic understanding services, voice recognition services, machine translation services, and the like of the artificial intelligence speaker; the abstract module of the artificial intelligent sound box can comprise a domain intention classification module, a named entity recognition module, an entity error correction module, a voice activity detection module, a wake-up module, an acoustic model, a language model, an image Chinese character recognition module, a named entity recognition module, a text-to-speech module and the like; technical solutions for artificial intelligence speakers may include classifiers such as logistic regression algorithms, named entity recognition algorithms, conditional random field algorithms, speech recognition language models, recurrent neural networks, and the like.

Before the implementation, a mapping relation between each artificial intelligence sound box model granularity to be evaluated and a corresponding evaluation target can be established. For example, if the granularity of the artificial intelligence model to be evaluated is the evaluation of the artificial intelligence sound box end, the evaluation target of the artificial intelligence sound box may include knowing the capability position of the artificial intelligence sound box in the industry, knowing the current head defect of the artificial intelligence sound box, predicting the on-line service satisfaction degree of the artificial intelligence sound box, performing horizontal classification and longitudinal analysis by Badcase (bad case), and the like, a mapping relationship between the evaluation of the artificial intelligence sound box end and the corresponding evaluation target may be established. And if the granularity of the artificial intelligent sound box model to be evaluated is the evaluation of the abstract module, establishing a mapping relation between the evaluation of the abstract module and a corresponding evaluation target. Or directly acquiring the mapping relation between the model granularity of the artificial intelligent sound box to be evaluated and the corresponding evaluation target from the server and the local database, and then acquiring the corresponding evaluation target according to the acquired mapping relation.

And then, acquiring corresponding evaluation parameters according to the artificial intelligent sound box model to be evaluated and an evaluation target, wherein the evaluation parameters have a mapping relation with the artificial intelligent sound box model to be evaluated and the evaluation target, for example, when the user satisfaction of the artificial intelligent sound box needs to be evaluated, the evaluation parameters can comprise the use evaluation of a user, and when the artificial intelligent model to be evaluated is the artificial intelligent sound box and the evaluation target is the user satisfaction, the correspondingly acquired evaluation parameters are the evaluation of the user on the artificial intelligent sound box, and specifically can comprise the scoring, character evaluation and the like of the user. The specific mapping relationship may also be obtained from a server or a local database, and the obtained mapping relationship is established according to the specific evaluation parameter, the evaluation target and the evaluation model, which is not limited herein.

The method comprises the steps that a plurality of evaluation parameters corresponding to an artificial intelligence loudspeaker box model to be evaluated and an evaluation target can be set, at the moment, specific evaluation parameters can be further selected according to actual requirements of a user, a corresponding evaluation parameter mapping relation can be obtained according to the evaluation target, the evaluation parameters corresponding to the artificial intelligence loudspeaker box model to be evaluated are obtained according to the evaluation parameter mapping relation, the evaluation parameters are displayed through a display interface and are selected by the user, and the user can trigger a selection instruction through clicking or touching the display interface. And when a selection instruction is received, confirming the selected evaluation parameters based on the selection instruction.

The evaluation attributes further comprise evaluation classifications, and the evaluation classifications of the artificial intelligent sound box model to be evaluated can be determined according to the development stage of the artificial intelligent sound box model to be evaluated. Specifically, the development stage of the artificial intelligence sound box may include technology type selection, product development and on-line operation, and in general, the evaluation adopted in the technology type selection and product development stage is classified as off-line evaluation, and the evaluation adopted in the on-line operation stage is classified as on-line evaluation. Therefore, the evaluation classification of the artificial intelligent sound box model to be evaluated can be determined to be off-line evaluation or on-line evaluation through the development stage of obtaining the artificial intelligent sound box model to be evaluated, and evaluation parameters can be obtained according to the artificial intelligent sound box model to be evaluated and the evaluation classification after the evaluation classification of the artificial intelligent sound box model to be evaluated is determined.

Specifically, if it is determined that the evaluation classification is offline evaluation, obtaining evaluation parameters by combining the artificial intelligence sound box model to be evaluated in an offline obtaining mode, for example, performing offline collection according to the artificial intelligence sound box model to be evaluated, for example, obtaining from crowdsourcing tasks, sourcing tasks, assembling tasks and generating tasks; if the evaluation classification is determined to be on-line evaluation, the evaluation parameters are obtained on line in a preset time period by combining the artificial intelligent sound box model to be evaluated in an on-line obtaining mode, for example, the evaluation parameters are obtained by combining the artificial intelligent sound box model to be evaluated and on-line sampling at fixed time intervals, for example, user ID-Hash sampling, importance sampling and the like.

And 203, constructing an evaluation set according to the evaluation parameters.

And 204, evaluating the artificial intelligence model to be evaluated according to the evaluation set, and outputting an evaluation result.

205, obtaining the adjustment parameter of the object to be evaluated according to the evaluation result.

And 206, monitoring and adjusting the artificial intelligence model to be evaluated based on the adjustment parameters.

After the evaluation result is output, adjustment parameters of the artificial intelligence speaker model to be evaluated, that is, an optimization target, need to be formulated, and the current capability value of a product, such as a competitive product, corresponding to the artificial intelligence speaker model to be evaluated is usually used as a reference for formulating the adjustment parameters. Such as first determining to promote capacity to align with the capacity of the current competitor within six months. And acquiring the adjustment parameters of the object to be evaluated according to the output evaluation result, the current capability value of the competitive product and the determined time period. And monitoring and adjusting the artificial intelligent sound box model to be evaluated based on the adjustment parameters, namely a continuous iterative tracking process in a set time period. And particularly, iteratively promoting by means of the patch of the version. The Version (Version) is specific to the evaluation set, and deletion adjustment can be added to evaluation data; the Patch (Patch) is a label aiming at the evaluation set and mainly corrects the place where the manual label is inaccurate. One very important feature of patches is the support of backtracking. Backtracking is to recalculate the metrics through the output of the previous system and the latest updated patches, thereby updating the results of the old metrics to meet the purpose that the same set of metrics comes from the same evaluation set. And the evaluation process is smoother by alternately updating the versions and the patches, such as updating one version a month and updating the patches once a week.

For example, in the task of speech recognition, the input evaluation resources are recording segments, and in order to further understand more information, attribute labels such as "age: 25-30, sex: female, noise: 50-60DB, loudness: 40-45DB, frequency: 600- > 650Hz, dialect: sichuan dialect, accent: flat and tilted. The above attributes are given to the evaluation set of magnitude of 1w, and statistics can be carried out to obtain that the system does not support any type of corpus well, for example, if the WER/SER identified by "middle-aged women who speak Sichuan" is very high, a plan should be made to optimize the corpus of the part of population.

Longitudinal component analysis is observed from the disassembly of the system itself, and mainly evaluates the specific support capability of the subdivision modules of the system. For vertical component analysis, the system to be evaluated needs to be subdivided into component streams, such as a dialog system, and a specific basic component stream can be referred to in fig. 3.

In the embodiment, the artificial intelligent sound box model to be evaluated and the evaluation attribute of the artificial intelligent sound box model to be evaluated are obtained; then obtaining an evaluation parameter according to the artificial intelligent sound box model to be evaluated and the evaluation attribute, and constructing an evaluation set according to the evaluation parameter so that the constructed evaluation set is matched with the artificial intelligent sound box model to be evaluated; when the artificial intelligence sound box model to be evaluated is evaluated according to the evaluation set, the output evaluation result is more accurate, and the verification of the artificial intelligence degree and the artificial intelligence capability is more precise.

In order to better implement the above method, an embodiment of the present invention may further provide an artificial intelligence model evaluation device, where the artificial intelligence model evaluation device may be specifically integrated in a network device, and the network device may be a terminal or a server.

For example, as shown in fig. 4a, the artificial intelligence model evaluating apparatus may include a first obtaining unit 401, a second obtaining unit 402, a constructing unit 403, and an evaluating unit 404, as follows:

(1) a first obtaining unit 401, configured to obtain an artificial intelligence model to be evaluated and an evaluation attribute of the artificial intelligence model to be evaluated;

(2) and a second obtaining unit 402, configured to obtain an evaluation parameter according to the artificial intelligence model to be evaluated and the evaluation attribute.

(3) A constructing unit 403, configured to construct an evaluation set according to the evaluation parameters.

(4) And the evaluating unit 404 is configured to evaluate the artificial intelligence model to be evaluated according to the evaluation set, and output an evaluation result.

Specifically, as shown in fig. 4b, the first acquisition unit 401 includes a first acquisition sub-unit 405 and a second acquisition sub-unit 406;

the first obtaining subunit 405 is configured to obtain an evaluation target of the artificial intelligence model to be evaluated according to the granularity of the artificial intelligence model to be evaluated.

And the second obtaining subunit 406 is configured to obtain a corresponding evaluation parameter according to the artificial intelligence model to be evaluated and the evaluation target.

Specifically, the second acquiring subunit 406 includes an acquiring module, a displaying module, and a confirming module:

Specifically, as shown in fig. 4c, the second acquiring unit 402 may further include a third acquiring subunit 407, a determining subunit 408, and a fourth acquiring subunit 409:

a third obtaining subunit 407, configured to obtain a development stage of the artificial intelligence model to be evaluated;

a determining subunit 408, configured to determine an evaluation classification of the artificial intelligence model to be evaluated according to the development phase;

and a fourth obtaining subunit 409, configured to obtain evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation classification.

As can be seen from the above, the first obtaining unit 401 and the second obtaining unit 402 of the artificial intelligence model evaluating device of the embodiment obtain the artificial intelligence speaker model to be evaluated and the evaluation attribute of the artificial intelligence speaker model to be evaluated; then, the construction unit 403 acquires an evaluation parameter according to the artificial intelligent sound box model to be evaluated and the evaluation attribute, and constructs an evaluation set according to the evaluation parameter, so that the constructed evaluation set is matched with the artificial intelligent sound box model to be evaluated; when the evaluation unit 404 evaluates the artificial intelligence speaker model to be evaluated according to the evaluation set, the output evaluation result is more accurate, and the verification of the artificial intelligence degree and the capability is more precise.

An embodiment of the present invention further provides a server, as shown in fig. 5, which shows a schematic structural diagram of the server according to the embodiment of the present invention, specifically:

the server may include components such as a processor 701 of one or more processing cores, memory 702 of one or more computer-readable storage media, a power supply 703, and an input unit 704. Those skilled in the art will appreciate that the server architecture shown in FIG. 5 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

Wherein:

the processor 701 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 702 and calling data stored in the memory 702, thereby performing overall monitoring of the server. Optionally, processor 701 may include one or more processing cores; preferably, the processor 701 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 701.

The memory 702 may be used to store software programs and modules, and the processor 701 executes various functional applications and data processing by operating the software programs and modules stored in the memory 702. The memory 702 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 702 may also include a memory controller to provide the processor 701 with access to the memory 702.

The server further includes a power source 703 for supplying power to each component, and preferably, the power source 703 may be logically connected to the processor 701 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The power supply 703 may also include any component including one or more of a dc or ac power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The server may also include an input unit 704, and the input unit 704 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the server may further include a display unit and the like, which will not be described in detail herein. Specifically, in this embodiment, the processor 701 in the server loads the executable file corresponding to the process of one or more application programs into the memory 702 according to the following instructions, and the processor 701 runs the application program stored in the memory 702, thereby implementing various functions as follows:

constructing an evaluation set according to the evaluation parameters;

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present invention provide a storage medium (i.e., a computer-readable storage medium) having stored therein a plurality of instructions, which can be loaded by a processor to perform any of the steps of the method for evaluating an artificial intelligence model provided by the embodiments of the present invention. For example, the instructions may perform the steps of:

constructing an evaluation set according to the evaluation parameters;

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in any artificial intelligence model evaluation method provided by the embodiment of the present invention, the beneficial effects that can be achieved by any artificial intelligence model evaluation method provided by the embodiment of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The method, the device, the equipment and the computer-readable storage medium for evaluating the artificial intelligence model provided by the embodiment of the invention are described in detail, a specific example is applied in the description to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An artificial intelligence model evaluation method is characterized by comprising the following steps:

constructing an evaluation set according to the evaluation parameters;

2. The method for evaluating an artificial intelligence model according to claim 1, wherein the evaluation attribute includes a granularity of the artificial intelligence model to be evaluated, and the obtaining of the evaluation parameter according to the artificial intelligence model to be evaluated and the evaluation attribute comprises:

3. The method for evaluating an artificial intelligence model according to claim 2, wherein the obtaining of corresponding evaluation parameters according to the artificial intelligence model to be evaluated and an evaluation target comprises:

displaying the evaluation parameters through a display interface;

4. The method for evaluating an artificial intelligence model according to claim 1, wherein the evaluation attributes comprise evaluation classifications, and the obtaining of evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation attributes comprises:

5. The method for evaluating an artificial intelligence model according to claim 4, wherein the evaluation classification includes off-line evaluation and on-line evaluation, and the obtaining of evaluation parameters according to the artificial intelligence model to be evaluated and the evaluation classification includes:

6. The method for evaluating an artificial intelligence model according to claim 1, wherein the constructing an evaluation set according to the evaluation parameters comprises:

acquiring a label of the preprocessed evaluation parameter;

7. An artificial intelligence model evaluation method according to any one of claims 1 to 6, wherein the evaluating the object to be evaluated according to the evaluation set and after obtaining an evaluation result, further comprising:

8. An artificial intelligence model evaluating apparatus, comprising:

9. An artificial intelligence model evaluating apparatus, comprising: a processor and a memory; the memory stores a plurality of instructions, and the processor loads the instructions stored in the memory to execute the artificial intelligence model evaluation method according to any one of claims 1 to 7.

10. A computer-readable storage medium storing instructions adapted to be loaded by a processor to perform a method for evaluating an artificial intelligence model according to any one of claims 1 to 7.