CN113468892A

CN113468892A - Model testing method and device for model testing

Info

Publication number: CN113468892A
Application number: CN202110688290.3A
Authority: CN
Inventors: 刘琮玮; 张静军; 姜琳
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2021-10-01

Abstract

The embodiment of the invention provides a model test method, a model test device and a model test device. The method comprises the following steps: generating test data of preset semantic dimensions corresponding to a target model according to the function of the target model corresponding to the preset semantic dimensions, wherein the preset semantic dimensions comprise any one or more of a near meaning word dimension, an antisense word dimension, a robustness dimension, a named entity dimension, a time sequence relation dimension, a negative word dimension, a reference relation dimension and a word sequence relation dimension; and testing the target model by using the test data of the preset semantic dimension corresponding to the target model to obtain the test result of the preset semantic dimension corresponding to the target model. The embodiment of the invention can evaluate the capability of the target model more systematically, more comprehensively and more accurately.

Description

Model testing method and device for model testing

Technical Field

The invention relates to the technical field of computers, in particular to a model testing method and device and a device for model testing.

Background

NLP (Natural Language Processing) is a sub-field of AI (Artificial Intelligence), and an NLP model is widely applied to scenes such as text classification, speech recognition, machine translation, and the like.

Currently, developers usually make evaluation indexes matched with businesses and calculate model scores through testing data sets to evaluate the capability of the NLP model. However, this standardized evaluation method is to divide the data set into a training set-a validation set-a test set on which the accuracy, recall, etc. of the model is evaluated. The evaluation method has certain limitations, for example, the data of the test set is not comprehensive, and the same deviation exists with the training set, which results in inaccurate evaluation results.

Disclosure of Invention

The embodiment of the invention provides a model test method, a model test device and a model test device, which can evaluate the capability of a target model more systematically, more comprehensively and more accurately.

In order to solve the above problem, an embodiment of the present invention discloses a model testing method, including:

generating test data of preset semantic dimensions corresponding to a target model according to the function of the target model corresponding to the preset semantic dimensions, wherein the preset semantic dimensions comprise any one or more of a near meaning word dimension, an antisense word dimension, a robustness dimension, a named entity dimension, a time sequence relation dimension, a negative word dimension, a reference relation dimension and a word sequence relation dimension;

and testing the target model by using the test data of the preset semantic dimension corresponding to the target model to obtain the test result of the preset semantic dimension corresponding to the target model.

Optionally, the generating test data of the preset semantic dimension corresponding to the target model according to the function of the preset semantic dimension corresponding to the target model includes:

determining a testing method corresponding to a preset semantic dimension according to a function of the target model corresponding to the preset semantic dimension, wherein the testing method comprises at least one of a basic test, an invariance test and a specified directivity test;

and generating test data of the test method corresponding to the preset semantic dimension.

Optionally, the generating test data of the test method corresponding to the preset semantic dimension includes:

based on a given template text, determining two filling words with a preset semantic relation according to the preset semantic dimension, and respectively filling slot positions of the template text by using the two filling words to obtain test data of a basic test corresponding to the preset semantic dimension; and/or the presence of a gas in the gas,

based on a given target text, converting the target text according to a first conversion rule corresponding to the preset semantic dimension to enable the converted text semantic to be unchanged, and obtaining test data of invariance test corresponding to the preset semantic dimension; and/or the presence of a gas in the gas,

and based on a given target text, transforming the target text according to a second transformation rule corresponding to the preset semantic dimension, so that the transformed text semantic conforms to the expected direction, and obtaining test data of a specified directivity test corresponding to the preset semantic dimension.

Optionally, the first transformation rule comprises any one or more of: replacing target words in a target text with synonyms, replacing named entities in the target text with related named entities of the same type, performing position interchange on two target words with a parallel relation in the target text, adding target characters at a target position in the target text, performing active and passive relation interchange on two target words with an active and passive relation in the target text, and replacing target words in the target text with pronouns;

the second transformation rule comprises any one or more of: replacing target words in the target text with antisense words, adding negative words at the target position in the target text, and performing active and passive relationship interchange on two target words with active and passive relationship in the target text.

Optionally, the method further comprises:

positioning the abnormal target semantic dimension of the target model according to the test result of the preset semantic dimension corresponding to the target model;

if the abnormity of the target semantic dimension is determined to be from a data level, generating test data of the target model corresponding to the target semantic dimension;

and training and optimizing the target model by taking the test data of the target semantic dimension as training data.

Optionally, after generating the test data of each semantic dimension, the method further includes:

predicting statement fluency weight of the test data by using a first model, and filtering the test data with the statement fluency weight smaller than a first threshold, wherein the statement fluency weight represents statement fluency degree of the test data; and/or the presence of a gas in the gas,

and predicting word fitness weight of the test data by using a second model, and filtering the test data with the word fitness weight smaller than a second threshold, wherein the word fitness weight represents the matching degree between words and sentences appearing in the sentences of the test data.

Optionally, the test data includes labeling information of corresponding semantic dimensions, and after the test result representing each semantic dimension corresponding to the target model is obtained, the method further includes:

and judging whether the test result of the corresponding semantic dimension is consistent with the labeling information of the test data, and if not, recording an abnormal case.

In another aspect, an embodiment of the present invention discloses a model testing apparatus, including:

the test data generation module is used for generating test data of preset semantic dimensions corresponding to the target model according to the function of the preset semantic dimensions corresponding to the target model, wherein the preset semantic dimensions comprise any one or more of a near-synonym dimension, an anti-synonym dimension, a robustness dimension, a named entity dimension, a time sequence relationship dimension, a negative-synonym dimension, a reference relationship dimension and a word sequence relationship dimension;

and the target model testing module is used for testing the target model by using the testing data of the preset semantic dimension corresponding to the target model to obtain the testing result of the preset semantic dimension corresponding to the target model.

Optionally, the test data generating module includes:

the test method determination submodule is used for determining a test method corresponding to a preset semantic dimension according to the function of the target model corresponding to the preset semantic dimension, and the test method comprises at least one of a basic test, an invariance test and a specified directivity test;

and the test data generation submodule is used for generating test data of the test method corresponding to the preset semantic dimension.

Optionally, the test data generation sub-module includes:

the first generation unit is used for determining two filling words with a preset semantic relation according to the preset semantic dimension based on a given template text, and respectively performing slot filling on the template text by using the two filling words to obtain test data of a basic test corresponding to the preset semantic dimension; and/or the presence of a gas in the gas,

the second generation unit is used for transforming the target text according to a first transformation rule corresponding to the preset semantic dimension based on the given target text, so that the transformed text semantic is unchanged, and test data of invariance test corresponding to the preset semantic dimension are obtained; and/or the presence of a gas in the gas,

and the third generating unit is used for transforming the target text according to a second transformation rule corresponding to the preset semantic dimension based on the given target text, so that the transformed text semantics conform to the expected direction, and obtaining test data of the specified directionality test corresponding to the preset semantic dimension.

Optionally, the apparatus further comprises:

the anomaly positioning module is used for positioning the abnormal target semantic dimension of the target model according to the test result of the preset semantic dimension corresponding to the target model;

the training data generation module is used for generating test data of the target model corresponding to the target semantic dimension if the abnormity of the target semantic dimension is determined to be from a data level;

and the target model optimization module is used for training and optimizing the target model by taking the test data of the target semantic dimension as training data.

Optionally, the apparatus further comprises:

the first filtering module is used for predicting statement fluency weight of the test data by using a first model and filtering the test data of which the statement fluency weight is smaller than a first threshold, wherein the statement fluency weight represents statement fluency degree of the test data; and/or the presence of a gas in the gas,

and the second filtering module is used for predicting word fitness weight of the test data by using the second model and filtering the test data with the word fitness weight smaller than a second threshold, wherein the word fitness weight represents the matching degree between words and sentences appearing in the sentences of the test data.

In yet another aspect, an embodiment of the present invention discloses an apparatus for model testing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, and the one or more programs comprise instructions for performing one or more of the model testing methods described above.

In yet another aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon, which when executed by one or more processors of an apparatus, cause the apparatus to perform a model testing method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

according to the embodiment of the invention, based on the capability of semantic understanding of the core of the target model, test data of the preset semantic dimension corresponding to the target model is generated according to the function of the preset semantic dimension corresponding to the target model, and the test data of the preset semantic dimension is used for testing the target model, so that the test result of the preset semantic dimension corresponding to the target model can be obtained. The test result can represent the semantic understanding capability of the target model on the preset semantic dimension, and further can find whether the semantic understanding of the target model on the preset semantic dimension has problems. According to the test result, developers can be effectively helped to accurately find the problems of the target model, an optimization scheme is formulated, and the model is optimized. Further, the preset semantic dimension includes any one or more of a near meaning word dimension, an antisense word dimension, a robustness dimension, a named entity dimension, a time sequence relation dimension, a negative word dimension, a reference relation dimension, and a word sequence relation dimension. According to the embodiment of the invention, the function of the target model in the semantic dimension is subdivided to generate the test data of the multilevel preset semantic dimension, so that the test interpretation result is more refined and diversified, and the test result can evaluate the capability of the target model more systematically, more comprehensively and more accurately.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a flow chart of the steps of one embodiment of a model testing method of the present invention;

FIG. 2 is a block diagram of a model test apparatus according to an embodiment of the present invention;

FIG. 3 is a block diagram of an apparatus 800 for model testing in accordance with the present invention;

fig. 4 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Method embodiment

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a model testing method according to the present invention is shown, which may specifically include the following steps:

step 101, generating test data of preset semantic dimensions corresponding to a target model according to a function of the target model corresponding to the preset semantic dimensions, wherein the preset semantic dimensions comprise any one or more of a near meaning word dimension, an antisense word dimension, a robustness dimension, a named entity dimension, a time sequence relation dimension, a negative word dimension, a reference relation dimension and a word sequence relation dimension;

and 102, testing the target model by using the test data of the preset semantic dimension corresponding to the target model to obtain the test result of the preset semantic dimension corresponding to the target model.

The model testing method provided by the invention can be applied to electronic equipment, and the electronic equipment comprises but is not limited to: a server, a smart phone, a recording pen, a tablet computer, an e-book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a car computer, a desktop computer, a set-top box, a smart tv, a wearable device, and the like.

The target model refers to an NLP model to be tested, and the model testing method provided by the invention can be used for testing the capability of the NLP model. The embodiment of the invention does not limit the specific type of the NLP model. For example, the NLP model includes, but is not limited to, a text matching model, a text classification model, an emotion analysis model, a machine translation model, and the like.

In a specific implementation, a test task is first designed according to test requirements. For different types of NLP models, the core is the semantic understanding capability of the model. Although the different types of NLP models have different functions, the different types of NLP models implement different functions based on the capability of accurate semantic understanding. Therefore, from the semantic perspective, the embodiment of the invention generates the test data of the preset semantic dimension corresponding to the target model according to the function of the preset semantic dimension corresponding to the target model. And testing the target model by using the test data of the preset semantic dimension to obtain a test result of the preset semantic dimension corresponding to the target model. The test result can represent the semantic understanding capability of the target model on the preset semantic dimension, and further can find whether the semantic understanding of the target model on the preset semantic dimension has problems. According to the test result, developers can be effectively helped to accurately find the problems of the target model, an optimization scheme is formulated, and the model is optimized.

The preset semantic dimensions include, but are not limited to, any one or more of the following: a near word dimension, an antisense word dimension, a robustness dimension, a named entity dimension, a timing relationship dimension, a negative word dimension, a reference relationship dimension, and a word order relationship dimension.

And the dimension of the similar meaning words represents that the target model has the capacity of identifying similar semantics of the similar meaning words. The dimension of the antisense words indicates that the target model has the capacity of recognizing that the semantics of the antisense words are repulsive. The robustness dimension represents the ability of the target model to recognize punctuation, special characters, and nonsense help words. The named entity dimension represents the ability of the target model to identify related named entities. The time sequence relation dimension represents the capability of the target model to identify the time sequence relation. The dimension of the negative word represents the capability of the target model to identify the opposite semantics after the negative word is added. The reference relation dimension represents the capability of the target model for identifying reference relations such as parallel relations, active and passive relations and the like. The language order relation dimension represents the capability of the target model to identify the language order relations such as symmetric relations, asymmetric relations, active and passive relations and the like.

In one example, taking the target model as a text matching model as an example, the text matching model should have the ability to identify similar meanings of similar words, the ability to identify semantic repulsion of antisense words, and the ability to identify semantic opposition after adding a negative word. Therefore, for the text matching model, test data of preset semantic dimensions such as a near meaning word dimension, an antisense word dimension, a negative word dimension and the like can be generated respectively, so as to test the semantic understanding ability of the text matching model in the preset semantic dimensions such as the near meaning word dimension, the antisense word dimension, the negative word dimension and the like.

In a specific implementation, the NLP model is typically used to identify semantic relationships between two texts, and thus the input to the NLP model is typically a text pair. The test data generated by embodiments of the present invention may include at least one text pair. Taking the near word dimension as an example, the test data for generating the near word dimension may include the following pairs of historical texts: "what season is the most beautiful in Huangshan" (denoted as sensor 1) and "what time to go to Huangshan is the most suitable" (denoted as sensor 2). The two texts in the historical text pair (sensor 1 and sensor 2) have semantic similarity. If any of the historical text pairs is replaced by a similar meaning word, the obtained converted text pairs still have semantic similarity. For example, replacing "beauty" in content 1 with the synonym "nice", the following transformed text pair can be obtained: "what season is best seen in Huangshan" (denoted as sensor 1 ') and "what time to go to Huangshan is best seen" (denoted as sensor 2'). The two texts in the transformed text pair (content 1 'and content 2') still have semantic similarity. The historical text pair and the converted text pair can be used as test data to test a text matching model, so that a test result of the text matching model corresponding to the dimensionality of the synonym is obtained, and the test result can represent the semantic understanding capability of the text matching model on the dimensionality of the synonym. If the text matching model can accurately identify the semantic similarity of the two texts in the historical text pair, and if the text matching model can accurately identify the semantic similarity of the two texts in the converted text pair.

The target model is tested by using the test data of the target model corresponding to different preset semantic dimensions, and finally the test result of the target model corresponding to each preset semantic dimension can be obtained, namely, the semantic understanding ability of the target model on each preset semantic dimension can be obtained. For example, the ability of whether the target model can recognize semantic similarity of similar words, the ability of whether the target model can recognize semantic repulsion of antisense words, the ability of whether the target model can recognize semantic opposite after adding a negative word, the semantic understanding ability of whether the target model has understanding time sequence relationship, and the like can be obtained.

After the test result of the target model corresponding to the preset semantic dimension is obtained, the target model can be optimized according to the test result so as to improve the semantic understanding capability of the target model.

In an optional embodiment of the invention, the method may further comprise:

s11, positioning the abnormal target semantic dimension of the target model according to the test result of the preset semantic dimension corresponding to the target model;

step S12, if the abnormity of the target semantic dimension is determined to be from a data level, generating test data of the target model corresponding to the target semantic dimension;

and step S13, training and optimizing the target model by taking the test data of the target semantic dimension as training data.

The abnormal occurrence of the model means that the test result does not meet the expectation, for example, the text matching model should have semantic understanding capability of the near-synonym dimension, but if the text matching model is tested by using the test data of the near-synonym dimension, the effect of the text matching model for identifying the near-synonym cannot meet the expectation, for example, the accuracy of the text matching model for identifying the near-synonym does not reach the preset threshold, it can be considered that the text matching model is abnormal, and the target semantic dimension with the abnormal occurrence is the near-synonym dimension.

In a specific application, the abnormality of the model mainly comes from the following two aspects: a data level and a model structure level. For NLP models, the capabilities of the model are closely related to the training data. The data level means that the abnormity of the model is related to the training data of the model. In addition, the capabilities of the model are also strongly related to the structure of the model itself. The model structure level means that the abnormity appearing in the model is related to the structure of the model.

Therefore, in the process of optimizing the target model according to the test result, the target semantic dimension with abnormality of the target model is firstly positioned, and whether the abnormality of the target semantic dimension is from a data level or a model structure level is judged. The embodiment of the present invention does not limit the specific manner of determining the source of the abnormality. For example, the data-level anomaly may be located by analyzing the data distribution. For the abnormity of the model structure level, a model interpretation or model visualization method can be adopted to position the abnormity.

Further, if the test result shows that the target model has higher semantic understanding capability on a long character string in a certain preset semantic dimension and has lower semantic understanding capability on a short character string, it can be determined that the abnormality of the target model originates from a model structure level. The long character string refers to a character string with the number of characters meeting a preset number, and for example, the long character string with the number of characters more than or equal to 5 is used; the short character string refers to a character string with the number of characters smaller than a preset number, and for example, a short character string with the number of characters smaller than 5 is used.

In one example, a long string is used as the vocabulary, and a short string is used as the word. For example, when test data is generated, a task text is converted in units of vocabulary, such as "beauty" to "beauty" or the like. The target model is tested by using the test data obtained by converting the vocabulary as a unit, and the semantic comprehension capability of the target model obtained by testing is higher. For another example, the task text is transformed in units of words, such as "beauty" to "ugly" and the like. The target model is tested by using the test data obtained by converting the character unit, and the semantic understanding capability of the target model obtained by testing is lower. In this case, it can be determined that the anomaly of the target model originates from a model structural level.

And if the abnormity of the target semantic dimension is determined to be from a data level, generating training data of the target model corresponding to the target semantic dimension, and training and optimizing the target model through the training data of the target semantic dimension. In the embodiment of the invention, a method for generating test data of preset semantic dimensions can be adopted to generate training data of target semantic dimensions. For example, when the target semantic dimension of the text matching model with the abnormality is determined to be the synonym dimension, and the abnormality of the synonym dimension is determined to be from the data level, training data of the synonym dimension can be generated, and the training data of the synonym dimension is used as special training data to train the text matching model, so that the semantic understanding capability of the text matching model in the synonym dimension is improved, and the capability of the text matching model for recognizing the synonym is further improved.

Further, if the target model is trained by the special training data, the problem that the target model is abnormal in the target semantic dimension is solved, and the target model is optimized. If the target model is not trained by the special training data, the abnormality of the target model in the target semantic dimension is not solved, which indicates that the abnormality in the target semantic dimension is not caused by the data level but caused by the model structure level of the target model, the target model can be optimized by adjusting the model structure of the target model.

In an optional embodiment of the present invention, the step 101 of generating test data of a preset semantic dimension corresponding to an object model according to a function of the preset semantic dimension corresponding to the object model includes:

step S21, determining a test method corresponding to the preset semantic dimension according to the function of the preset semantic dimension corresponding to the target model, wherein the test method comprises at least one of a basic test, an invariance test and a specified directivity test;

and step S22, generating test data of the test method corresponding to the preset semantic dimension.

The basic test is to generate a text pair through a specified template text, and the generated text pair is used as test data to test the semantic understanding ability of a target model on a preset semantic dimension.

The invariance test refers to converting the collected historical texts according to a preset rule to obtain converted texts, enabling the semantics of the converted texts to be unchanged, and using a text pair consisting of the original historical texts and the converted texts as test data to test the semantic understanding capability of a target model on a preset semantic dimension.

The specified directivity test is to transform the collected historical texts according to a preset rule to obtain transformed texts, so that the semantics of the transformed texts conform to the expected direction, and the text pair consisting of the original historical texts and the transformed texts is used as the semantic understanding capability of the test data test target model on the preset semantic dimension.

It should be noted that, the embodiment of the present invention does not limit the text content and the specific source of the history text. The historical text may comprise a single text, or the historical text may comprise pairs of historical texts of known semantic relationships. The historical text may include words, phrases, sentences, paragraphs, and the like.

In one example, for both the near word dimension and the anti-word dimension, a test method of a base test or an invariance test may be selected. For the robustness dimension, a test method of invariance testing may be selected. For named entity dimensions, an invariance test or a test method that specifies a directional test may be selected. For the timing relationship dimension, the test method of the base test may be selected. For negative word dimensions, a test method may be selected that specifies a directionality test. For referring to the relationship dimension, an invariance test or a test method that specifies a directionality test may be selected. For the dimension of the word order relationship, an invariance test or a test method for specifying a directional test can be selected.

It can be understood that, in specific implementation, for different preset semantic dimensions, a corresponding test method can be flexibly selected, and test data of the corresponding preset semantic dimensions are generated according to the selected test method. The embodiment of the invention does not limit the test method for selecting different preset semantic dimensions.

In an optional embodiment of the present invention, the generating test data of the test method corresponding to the preset semantic dimension in step S22 includes:

based on a given template text, determining two filling words with a preset semantic relation according to the preset semantic dimension, and respectively filling slot positions in the template text by using the two filling words to obtain test data of a basic test corresponding to the preset semantic dimension.

In the embodiment of the invention, the test data of the basic test can be generated by a template filling mode. Specifically, two filling words with a preset semantic relationship are determined according to the preset semantic dimension based on a given template text. The preset semantic relationship is determined according to the function of the preset semantic dimension. For example, if the preset semantic dimension is a near-meaning word dimension, the preset semantic relationship may be the same or similar in semantics, such as determining two filler words with the same or similar semantics. And then, the two filling words are used for respectively filling slots of a given template text, so that test data of the basic test corresponding to the preset semantic dimension can be obtained.

In one example, taking the near word dimension as an example, assume that a given template text is as follows: how to become more () ". Two filler words with the same or similar semantics, such as "out" and "open," can be determined according to the function of the near word dimension. The two filling words are used for respectively filling the slot positions of the template text 'how to become more ()', so that the following text pairs can be obtained: how "get more outward" and how "get more open". The generated text pair can be used as test data of the corresponding basic test of the dimensionality of the similar meaning word.

In another example, taking the time-series relationship dimension as an example, assume that a given template text is as follows: "Zhang three () is the class length of one class. Two filler words having a time series relationship, such as "now" and "before", can be determined from the function of the time series relationship dimension. The slot filling is respectively carried out on the template text by using the filling words of 'now' and 'before', so that the following text pairs can be obtained: "Zusanlong of a class before Zusanlong" and "Zusanlong of a class now". The generated text pair can be used as test data of a time sequence relation dimension corresponding basic test. As another example, it may also be determined that two filler words having a time series relationship include "now" and "later," the following text pairs may be generated: zusanli is now the shift length of a class of people and Zusanli is followed by the shift length of a class of people, and the text pair can also be used as test data of a time sequence relation dimension corresponding basic test.

and based on a given target text, transforming the target text according to a first transformation rule corresponding to the preset semantic dimension to enable the transformed text semantic to be unchanged, and obtaining test data of the invariance test corresponding to the preset semantic dimension.

In the embodiment of the invention, the test data of the invariance test can be obtained by transforming the target text through the first transformation rule. The target text may be a specific historical text, or the target text may be a specific text in a specific historical text pair. Further, the target text may be any text in a certain historical text pair, or may be two texts in a certain historical text pair.

The invariance test expects that the semantics of the transformed text and the text before transformation remain unchanged. The test data of the invariance test can be obtained by transforming a specified certain historical text. For example, a certain historical text is transformed to obtain a transformed text, and the semantic of the transformed text is not changed, so that the semantic of the transformed text is the same as or similar to that of the historical text, and a text pair consisting of the transformed text and the historical text can be used as test data for invariance test.

Alternatively, the test data of the invariance test can be obtained by transforming the target text in the historical text pair with known semantic relationship. For example, if the semantics of two texts in a certain historical text pair are similar, the target text in the historical text pair is transformed by using the first transformation rule, and the semantics of the two texts in the transformed text pair are also similar.

For another example, when the semantics of two texts in a certain historical text pair are mutually exclusive, the target text in the historical text pair is transformed by using the first transformation rule, and the semantics of the two texts in the obtained transformed text pair are also mutually exclusive. The transformed text pair may be used as a test data for the invariance test and/or the historical text pair before transformation may be used as a test data for the invariance test.

For example, for the historical text "do smoking and health related", the positions of two target words "smoking" and "health" in the historical text are interchanged by using the first transformation rule, and the transformed text can be obtained as "do health and smoking related". The converted text "do it's healthy and smoking" and the historical text "do it's smoking and healthy" have the same semantics, i.e., the semantics are unchanged before and after conversion. Therefore, the text pair composed of the transformed text "do and smoking-related" and the historical text "do and smoking-related" can be used as a test data of invariance test, and the test data can be used for testing the semantic understanding ability of the target model in the dimension of word order relationship.

As another example, for a historical text pair: "what season is the most beautiful in yellow mountain" and "what time to go to yellow mountain is the most appropriate", it is known that the semantic relationship of the two texts in the historical text pair is semantically similar. And transforming the target text 'what season is the best in yellow mountain' in the historical text pair by using a first transformation rule, replacing the target word 'beautiful' in the target text with the similar word 'good look', obtaining the transformed text as 'what season is the best in yellow mountain', and keeping the semantic meaning of the transformed text unchanged. Thus, the transformed text pair may be: "what season the yellow mountain is best seen" and "what time to go to the yellow mountain is best suited" are used as a test data of the invariance test, which can be used to test the semantic comprehension ability of the target model in the dimension of the near-synonym. Further, the historical text pair before transformation: "what season is the best beauty of yellow mountain" and "what time to go to yellow mountain is the best fit" can also be used as test data for one synonym dimension, and it is expected that the historical text pair ("what season is the best beauty of yellow mountain" and "what time to go to yellow mountain is the best fit") and the transformed text pair ("what season is the best look of yellow mountain" and "what time to go to yellow mountain is the best fit") have the same test results during the test.

In the embodiment of the present invention, the first transformation rule may include any one of:

replacing target words in the target text with replacement words; alternatively, the first and second electrodes may be,

adding a target character at a target position in the target text;

or, interchanging (including position interchange or relationship interchange) two target words in the target text.

Wherein, the target word refers to a word which can be transformed in the target text. Wherein alternatives may include synonyms, related named entities, pronouns, and the like. The target characters may include punctuation marks, special characters, nonsense help words, and the like.

Further, the first transformation rule may include, but is not limited to, any one or more of:

replacing target words in the target text with similar words;

replacing named entities in the target text with related named entities of the same type;

performing position interchange on two target words with a parallel relation in a target text;

adding a target character at a target position in the target text;

performing active and passive relationship interchange on two target words with active and passive relationships in a target text;

and replacing the target words in the target text with pronouns.

In the embodiment of the invention, the test data of the invariance test can be used for testing the near meaning word dimension, the named entity dimension, the robustness dimension, the reference relationship dimension, the word order relationship dimension and the like of the target model.

For the near word dimension, the first transformation rule may include: and replacing the target words in the target text with replacement words, wherein the replacement words are similar words of the target words, so that the converted text has the same or similar semantics with the target text.

For named entity dimensions, the first transformation rule may include: and replacing the target word in the target text with a replacement word, wherein the target word is a named entity, and the replacement word is a related named entity of the same type as the named entity, so that the converted text has the same or similar semantics with the target text.

For the robustness dimension, the first transformation rule may include: and adding target characters at the target positions in the target text, wherein the target characters comprise punctuation marks, special characters, nonsense auxiliary words and the like, so that the converted text has the same or similar semantics with the target text.

For referring to the relationship dimension, the first transformation rule may include: and replacing the target words in the target text with replacement words, wherein the replacement words are pronouns corresponding to the target words, so that the converted text has the same or similar semantics with the target text.

For the endian relationship dimension, the first transformation rule may include: and interchanging two target words with a word order relationship in the target text (including position interchange or relationship interchange), wherein the relationship interchange includes symmetric relationship interchange, asymmetric relationship interchange, active-passive relationship interchange and the like, so that the converted text and the target text have the same or similar semantics.

In an optional embodiment of the present invention, the transforming the target text according to the first transformation rule corresponding to the preset semantic dimension may include: carrying out syntactic analysis on the target text to obtain a syntactic analysis result; determining a target word or a target position in the target text according to a syntactic analysis result; according to the determined target word or the target position, transforming the target text according to a first transformation rule corresponding to a preset semantic dimension to obtain a transformed text, so that the semantics of the transformed text and the target text are the same or similar; and obtaining the test data of the invariance test corresponding to the preset semantic dimension according to the converted text. Wherein the syntactic analysis includes, but is not limited to, any one or more of: word segmentation, part of speech tagging, named entity recognition, dependency syntax analysis, semantic role tagging and the like.

Taking a dimension of a similar meaning word as an example, firstly, carrying out syntactic analysis on a target text in a given historical text pair, and determining a target word in the target text; and then determining that the similar meaning words corresponding to the target words are replacement words, and finally replacing the target words in the target text with the replacement words to obtain a converted text and keeping the converted text semanteme unchanged. In one example, assume that a given pair of historical texts comprises: "what season is the most beautiful" and "what time to go to the yellow mountain is the most suitable". Assuming that the target text is 'beautiful what season in yellow mountain', the target text is analyzed syntactically to determine that the target word in the target text is 'beautiful'. And querying the similar meaning word of the target word as a replacement word in a mode of a knowledge graph or a crawler. For example, the alternative word is determined to be "nice looking". Replacing the target word 'beauty' in the target text with the replacement word 'good-looking', so that a converted text pair can be obtained: the converted text pair can be used as test data of a dimension corresponding invariance test of the similar meaning words. Of course, the alternative word may also be determined to be a similar word such as "beautiful", and the transformed text pair is obtained: the converted text pair can also be used as test data for the dimension corresponding invariance test of the similar meaning words. And expecting the target model to output the same test result by utilizing the semantic understanding capability of the converted text pair and the historical text before conversion on the dimension of the near-meaning word of the test target model.

Taking robustness dimension as an example, firstly, carrying out syntactic analysis on a target text in a given historical text pair, and determining a target position in the target text; and adding target characters such as punctuation marks or nonsense auxiliary words and the like at the target position to obtain a converted text, and keeping the semantic meaning of the converted text unchanged.

In one example, assume that a given pair of historical texts comprises: "what season is the most beautiful" and "what time to go to the yellow mountain is the most suitable". Assuming that the target text is "what season is the most beautiful in yellow mountain", the target text is subjected to syntactic analysis to determine that the target position is the position shown by parentheses in the target text "what season is the most beautiful () in yellow mountain ()". And adding target characters such as punctuation marks or nonsense auxiliary words and the like at the target position to ensure that the converted text semantics are unchanged. For example, the text after adding punctuation marks is transformed as follows: "what season is the most beautiful (|)" in Huangshan (,). For another example, the text after the addition of the nonsense help word is transformed as follows: "what season was the most beautiful (o)" in Huangshan mountain. Thus, a transformed text pair can be obtained: "what season(s) is the most beautiful (|)" and "what time to go to yellow mountain is the most suitable", the transformed text pair can be used as a test data of the robustness dimension corresponding to the invariance test. Alternatively, transformed text pairs may be obtained: "what season(s) is the most beautiful (o) in the yellow mountain" and "what time to go to the yellow mountain is the most suitable", the converted text pair can be used as a test data of the robustness dimension corresponding invariance test. And (4) expecting the target model to output the same test result by utilizing the semantic understanding capability of the transformed text pair and the historical text before transformation on the robustness dimension of the test target model.

Taking the dimension of a named entity as an example, firstly, carrying out syntactic analysis on a target text in a given historical text pair, and determining the named entity as a target word in the target text; then determining related named entities with the same type as the named entities as replacement words; and finally, replacing the target words in the target text with the replacement words to obtain a converted text, and keeping the converted text semanteme unchanged. Wherein the named entities in the target text can be determined by the existing entity recognition method. The named entities may include place names, organization names, person names, etc., and related named entities of the same type as the named entities in the target text may be queried from a preset database. Different types of named entities that are pre-collected may be included in the pre-set database.

In one example, assume that a given pair of historical texts comprises: "how Chinese people eat potatoes" and "how American people eat potatoes". Assuming that the target text includes two texts in the historical text pair, by performing syntactic analysis on the two target texts respectively, the named entity "potato" in the two target texts can be determined as the target word, and the related named entity "beef" of the same type as the named entity "potato" can be determined as the replacement word. Replacing the target word 'potato' in the two target texts with a replacement word 'beef' respectively to obtain a converted text pair as follows: the 'how Chinese to eat beef' and the 'how to eat beef' of American people can be used as test data for the dimension corresponding invariance test of named entities. And (4) expecting the target model to output the same test result by utilizing the semantic understanding capability of the converted text pair and the historical text before conversion on the dimension of the named entity.

Taking the reference relation dimension as an example, the reference relation dimension is used for representing the semantic understanding ability of the target model to pronouns in the text. Firstly, syntactic analysis is carried out on a given target text, and nouns are recognized in the target text to be used as target words; and then determining that the pronouns corresponding to the target words are replacement words, and finally replacing the target words in the target text with the replacement words to obtain a converted text and enable the converted text to have unchanged semantics.

In one example, assume that the given target text is the following historical text: "if you find your mingmen happy with your plum married". And performing syntactic analysis on the target text, determining a second noun 'Xiaoming' as a target word in the target text, replacing the target word with a corresponding pronoun 'he', and obtaining a converted text which is 'if the Xiaoming and the Xiaoli are married, the user feels that he will be happy', so that the semantic meaning of the converted text is not changed. The text pair composed of the historical text "do you feel happy if twilight and small plum are married" and the converted text "do you feel happy if twilight and small plum are married" can be used as a test data for the invariance test of the corresponding dimension of the reference relationship. The test data is input into the target model to test whether the target model can understand the pronouns in the sentence, for example, whether the pronouns "he" in the text "if Xiaoming and Xiao Li are married, and you feel that he will be happy" can be identified, and the pronouns "he" in the text "if Xiaoming and Xiao Li are married, and you feel that the Xiaoming will be happy" have the same meaning.

Taking the dimension of the language order relationship as an example, firstly, carrying out syntactic analysis on a given target text, and determining two target words with the language order relationship in the target text; the two target words are then interchanged (including positional interchange or relationship interchange) and the converted text semantics are made invariant, the relationship interchange including symmetric relationship interchange, asymmetric relationship interchange, active-passive relationship interchange, etc.

In one example, assume that the given target text is the following historical text: "smoking and health related Do". The target text is subjected to syntactic analysis, and two target words with a symmetric word order relationship are determined to be smoking and healthy in the target text. And then, interchanging the positions of the two target words to obtain a converted text which is 'health and smoking related' so that the converted text semantics are unchanged. The text pair composed of the historical text 'do it depends on smoking and health' and the converted text 'do it depends on health and smoking' can be used as a test data of the corresponding invariance test of the language sequence relation dimension. In another example, assume that the given target text is the following historical text: "the water on the table is scattered by the Xiaoming". The target text is subjected to syntactic analysis, and two target words with active and passive word order relations are determined to be Xiaoming and water in the target text. And then, carrying out active and passive relationship interchange on the two target words to obtain a converted text, namely that the converted text is 'the water on a table is lightly scattered', so that the semantic meaning of the converted text is unchanged. The text pair formed by the historical text 'the water on the desk is scattered slightly' and the converted text 'the water on the desk is scattered slightly' can be used as test data for the corresponding invariance test of the language order relation dimension.

In the embodiment of the present invention, the test data for specifying the directionality test may be obtained by transforming the target text according to the second transformation rule. The target text may be a specific historical text, or the target text may be a specific text in a specific historical text pair. Further, the target text may be any text in a certain historical text pair, or may be two texts in a certain historical text pair.

The directionality test is specified to expect that the converted text semantics conform to the expected direction. For example, the desired direction may include:

for a given historical text, the semantic meaning of the text after transformation is opposite to the semantic meaning of the text before transformation;

or the semantic meaning of the text after conversion is the same as or similar to the semantic meaning of the text before conversion;

or for a given historical text pair, transforming a target text in the historical text pair through a second transformation rule to obtain a transformed text pair, wherein the semantic similarity of two texts in the transformed text pair is reduced relative to the semantic similarity of two texts in the historical text pair;

or for a given historical text pair, converting the target text in the historical text pair through a second conversion rule to obtain a converted text pair, wherein the semantic similarity of the two texts in the converted text pair is improved relative to the semantic similarity of the two texts in the historical text pair.

The test data for specifying the directionality test may be obtained by converting the target text by the second conversion rule. The second transformation rule may include any one of:

replacing target words in the target text with replacement words;

or adding target characters at a target position in the target text;

Wherein, the target word refers to a word which can be transformed in the target text.

Further, the second transformation rule includes any one or more of:

replacing target words in the target text with antisense words;

adding a negative word at a target position in the target text;

and performing active and passive relationship interchange on two target words with active and passive relationships in the target text.

In an embodiment of the present invention, the test data specifying the directionality test may be used to test the antisense dimension, named entity dimension, negative word dimension, word order relationship dimension, and the like of the target model.

For the antisense dimension, the second transformation rule may include: and replacing the target words in the target text with replacement words, wherein the replacement words are antisense words of the target words, so that the converted text has a semantic opposite to that of the target text.

For named entity dimensions, the second transformation rule may include: and replacing the target word in the target text with a replacement word, wherein the target word is a named entity, and the replacement word is a related named entity of the same type as the named entity, so that the converted text and the target text have different semantics.

For the negative word dimension, the second transformation rule may include: and adding negative words at the target position in the target text, wherein the negative words comprise No, other words and the like, so that the converted text and the target text have opposite semantics.

For the endian relationship dimension, the second transformation rule may include: two target words with a language order relationship in a target text are interchanged (including position interchange or relationship interchange), wherein the relationship interchange includes symmetric relationship interchange, asymmetric relationship interchange, active-passive relationship interchange and the like, so that the converted text and the target text have different semantics.

In an optional embodiment of the present invention, the transforming the target text according to the second transformation rule corresponding to the preset semantic dimension includes: carrying out syntactic analysis on the target text to obtain a syntactic analysis result; determining a target word or a target position in the target text according to the syntactic analysis result; according to the determined target words or target positions, converting the target text according to a second conversion rule corresponding to the preset semantic dimension to obtain a converted text, so that the converted text semantics conform to the expected direction; and obtaining test data of the preset semantic dimension corresponding to the specified directivity test according to the converted text. Wherein the syntactic analysis includes, but is not limited to, any one or more of: word segmentation, part of speech tagging, named entity recognition, dependency syntax analysis, semantic role tagging and the like.

Taking the dimension of an antisense word as an example, firstly, carrying out syntactic analysis on a given target text, and determining a target word in the target text; and then determining that the antisense word corresponding to the target word is a replacement word, finally replacing the target word in the target text with the replacement word to obtain a converted text, and enabling the converted text semantic to accord with the expected direction, for example, enabling the converted text semantic to be opposite to the semantic of the target text.

In one example, assume that the given target text is the following historical text: "Only a few days later, the peach blossom in the garden withers". And determining that the target word in the target text is 'withered' by performing syntactic analysis on the target text, and inquiring an antisense word of the target word as a replacement word in a knowledge map or crawler mode. For example, the alternative word is determined to be "full on". The semantic of the target word 'withering' in the target text is replaced by the replacement word 'blooming', the converted text is 'only a few days before the peach blossom in the garden blooms', so that the semantic of the converted text 'only a few days before the peach blossom in the garden blooms' is opposite to the semantic of the target text 'only a few days before the peach blossom in the garden withers'. The converted text is 'only for a few days, the peaches in the garden are full of' and the target text 'only for a few days, and the peaches in the garden wither' to form a text pair which can be used as a test data of the dimension of the antisense word corresponding to the specified directivity test. Inputting the test data into a target model for testing, and expecting to output test results with opposite text semantics or low similarity.

In addition, test data corresponding to the dimension of the antisense word to specify the directional test can be generated through the given template text. In another example, assume that the given template text is: how to become more () ". And filling the slot positions of the template text by using a group of antisense words respectively to obtain test data of the dimension of the antisense words corresponding to the specified directivity test. For example, the following text pairs can be obtained by slot filling the template text with a set of antisense words "outward" and "inward", respectively: how "to get more outward" and how "to get more inward" the semantics of the two texts in the pair conform to the expected direction, which is the opposite of the semantics of the two texts. The text pair "how to become more outward" and "how to become more inward" may be corresponded as a test data specifying a directionality test as an anti-word dimension. And inputting the test data into a target model for testing, and expecting to output test results with opposite semantics or low semantic similarity of the two texts.

Further, in semantic recognition, since double negation means positive, the dimension of the antisense word also needs to have semantic ability to recognize double negation. In generating test data for which the antisense dimension corresponds to a specified directionality test, a negative word may also be added before a word in the set of antisense words. For example, in the above example, for a set of anti-sense words "outward" and "inward", the following text pairs can be obtained by adding the negative word "no" to the "inward" and then slot-filling the template text with the "outward" and "no inward", respectively: how "to become more outward" and how "to become less inward" the semantics of the two texts in the pair conform to an expected direction, which is the same or similar semantics of the two texts. The text pair "how to become more outward" and "how to become less inward" may also be used as a test data for specifying a directionality test for the antisense dimension. And inputting the test data into a target model for testing, and expecting to output a test result with the same text semantics or high semantic similarity.

Taking the dimension of a named entity as an example, firstly, carrying out syntactic analysis on a target text in a given historical text pair, and determining the named entity as a target word in the target text; and then determining related named entities of the same type as the named entities as replacement words, finally replacing the target words in the target text with the replacement words to obtain a converted text, and enabling the converted text semantics to be different from the semantics of the target text.

In one example, assume that a given pair of historical texts comprises: "how Chinese people eat potatoes" and "how American people eat potatoes". Assuming that the target text is 'how Chinese eats potatoes', the named entity 'potato' in the target text is determined as a target word through syntactic analysis of the target text, and the related named entity 'beef' is determined as a replacement word. Replacing the target word 'potato' in the historical text pair with the replacement word 'beef', so that the converted text pair can be obtained as follows: the Chinese people can eat beef and the American people can eat potatoes, so that the semantemes of the two texts in the converted text pair accord with an expected direction, and the expected direction is that the semantic similarity of the two texts in the converted text pair is reduced relative to the semantic similarity of the two texts in the historical text pair before conversion. The converted text pair can be used as test data of a named entity dimension corresponding to a specified directivity test, the historical text pair before conversion can be used as test data of the named entity dimension corresponding to the specified directivity test, the target model is tested by utilizing the two test data, and different test results are expected to be output by the target model.

Taking a dimension of a negative word as an example, firstly carrying out syntactic analysis on a given target text, and determining a target position in the target text; and then adding a negative word at the target position to obtain a converted text, wherein the semantic meaning of the converted text is opposite to that of the target text.

In one example, assume that the given target text is: "how to become a healthy person", the target text is subjected to syntactic analysis, and the target position is determined to be the position shown in parentheses in the target text "how to become a () healthy person". After adding negative words to the target position in the target text, obtaining how the converted text becomes an unhealthy person, so that the semantics of the converted text conforms to the expected direction, wherein the expected direction is that the semantics of the converted text is opposite to the semantics of the target text. The text pair consisting of the target text "how to become a healthy person" and the converted text "how to become an unhealthy person" is used as a test data of the negative word dimension corresponding to the prescribed methodological test. And inputting the test data into a target model for testing, and expecting to output test results with opposite semantics or low semantic similarity of the two texts.

Taking the dimension of the language order relationship as an example, firstly, carrying out syntactic analysis on a given target text, and determining two target words with the language order relationship in the target text; then interchanging (including position interchange or relation interchange) the two target words, and making the semantics of the converted text different from those of the target text, wherein the relation interchange includes symmetrical relation interchange, asymmetrical relation interchange, active and passive relation interchange, and the like.

In one example, assume that the given target text is the following historical text: "Xiaoming quilt Xiaowang". And performing syntactic analysis on the target text, and determining two target words with a symmetrical word order relationship in the target text as Xiaoming and Xiaowang. And then, interchanging the positions of the two target words to obtain a converted text, namely that the King is marked with little bright, so that the semantics of the converted text conforms to the expected direction, and the expected direction is that the semantics of the converted text is different from the semantics of the target text. The text pair formed by the target text 'Xiaoming is marked by Xiaowang' and the converted text 'Xiaowang is marked by Xiaowang' can be used as test data of the corresponding prescribed methodological test of the language order relation dimension. And inputting the test data into a target model for testing, and expecting to output test results with opposite semantics or low semantic similarity of the two texts.

In another example, assume that the given target text is the following historical text: "do xiao ming support the view of queen? ". And performing syntactic analysis on the target text, and determining two target words with active and passive word order relations as Xiaoming and Xiaowang in the target text. And then, performing active and passive relationship interchange on the two target words to obtain the converted text, namely that the view of the king is slightly supported, so that the semantics of the converted text accord with the expected direction, wherein the expected direction is that the semantics of the converted text is the same as or similar to the semantics of the target text. Will the target text "do you support the idea of queen? The text pair formed by the converted text and the little-obvious supporting of the idea of the king can be used as test data of the corresponding prescribed methodological test of the language order relation dimension. And inputting the test data into a target model for testing, and expecting to output a test result with the same text semantics or high semantic similarity.

In specific implementation, unreasonable test data with semantic problems such as unsmooth sentences and improper used words may exist in the generated test data, and the unreasonable test data may cause inaccurate test results of the target model. The embodiment of the invention avoids the situation by adopting a certain strategy, so as to ensure that the generated test data is more reasonable and improve the accuracy of the test result.

In one example, in order to avoid generating unreasonable test data, in the process of generating the test data by transforming the task text, the embodiment of the invention selects the alternative words with high word frequency as much as possible. For example, in the process of converting the target text, if it is determined that the target word in the target text corresponds to multiple replacement words, the multiple replacement words may be sorted according to word frequency, and the replacement word n (n is a positive integer) before sorting is selected as the target replacement word; or selecting the replacement words with the word frequency meeting the preset value as target replacement words. Still taking the above target text "what season is beautiful in yellow mountain" as an example, according to the result of syntactic analysis of the target text, the target word in the target text can be determined to be "beautiful". The target word 'beauty' can comprise a plurality of similar meaning words such as 'beautiful', 'handsome', 'beautiful' and the like, and the plurality of similar meaning words can be used as replacement words of the target word 'beautiful'. In order to ensure the reasonability of the test data generated after replacement, the embodiment of the invention can sequence a plurality of replacement words and select the replacement word n before the sequence as the target replacement word. And replacing the target words in the target text by using the target replacement words.

In an optional embodiment of the present invention, after the generating the test data of each semantic dimension in step 101, the method may further include:

predicting statement fluency weight of the test data by using a first model, and filtering the test data with the statement fluency weight smaller than a first threshold, wherein the statement fluency weight represents statement fluency degree of the test data; and/or

In a specific implementation, even if a replacement word with a high word frequency is selected, it may be possible that the generated test data is not reasonable. For example, the generated text of the test data still has semantic problems of unsmooth sentences, improper words and the like. Further, the embodiment of the invention filters the generated test data by using the first model to filter out test data with inconsistent statements.

The first model may be used to predict statement fluency weights for test data. The predicted statement fluency weight can be used to represent the statement fluency level of the test data. If the statement fluency weight is smaller than the first threshold, the statement fluency degree of the test data cannot meet the preset requirement, and the test data can be considered to be unreasonable test data. The first threshold value can be set according to actual experience or requirements of specific scenes.

The embodiment of the invention does not limit the model type and the model structure of the first model. For example, the first model may be a BERT model. The BERT model is a pre-trained language model, which refers to training performed before training a model using sample data. The pre-training aims to train partial models of the middle and bottom layers and the commonalities of the downstream tasks in advance, and then train respective models by using respective sample data of the downstream tasks, so that the convergence speed can be greatly increased. The BERT model obtained by pre-training can be finely adjusted (Fine-tuning stage) when being subsequently used for specific NLP tasks, and can be suitable for various different NLP tasks.

After the pre-trained BERT model is subjected to fine tuning, the method and the device can be used for predicting the sentence fluency weight of the test data, and further can judge whether the sentences of the test data are fluent and whether the test data should be reserved or filtered.

The embodiment of the invention can filter unreasonable test data with unsmooth sentences by using the first model and can also filter unreasonable test data with improper words by using the second model.

The second model may be used to predict word fitness weights for text of the test data. The predicted word fitness weight may be used to represent how well words in the text of the test data match the text. If the word fitness weight is smaller than the second threshold, it indicates that words which do not match the text exist in the text of the test data, and the test data can be considered as unreasonable test data. The second threshold value can be set according to actual experience or the requirements of a specific scene.

In one example, assume that the test data for the synonym dimension generated according to the first transformation rule includes the text "what season in yellow mountain is the most commander". Although the text conforms to the grammar rule, the sentence also meets the fluency requirement. However, in practical applications, "general" is not applicable to the season. Therefore, the word fitness weight of the word in the text is predicted through the second model, the value of the word fitness weight of the word of commander is possibly small, if the value is smaller than a second threshold value, the word is not matched with the text, and the test data can be used as unreasonable test data to be filtered.

The embodiment of the invention does not limit the model type and the model structure of the second model. For example, the second model may be an ELECTRA model. The ELECTRA model is a pre-trained language model, and has higher computational efficiency and fewer training parameters compared with the BERT model.

It should be noted that, in the process of generating test data, the embodiment of the present invention selects a replacement word with a high word frequency, and filters out unreasonable test data with a word not flowing smoothly by using the first model and filters out unreasonable test data with a word not flowing properly by using the second model, which are all optional strategies for filtering unreasonable test data in the embodiment of the present invention.

In an optional embodiment of the present invention, the test data includes label information of corresponding semantic dimensions, and after obtaining the test result representing each semantic dimension corresponding to the target model, the method may further include:

The test data of the preset semantic dimension generated by the embodiment of the invention can comprise the marking information of the corresponding semantic dimension. In the embodiment of the present invention, the generated test data of the preset semantic dimension may be a text pair, and the label information may indicate a semantic correlation degree between two texts in the test data of the preset semantic dimension, such as identical semantics, high semantic similarity, opposite semantics, low semantic similarity, and the like.

For example, in the above example, the test data that generated the near word dimension via the first transformation rule includes the text pair: "what season of yellow mountain is the best to see" and "what season of yellow mountain is the best to see". The test data may contain annotation information indicating that the two texts in the pair of texts are semantically identical or have a high semantic similarity. Inputting the test data into a target model for testing, and expecting the target model to output a test result with the same text semantics or high semantic similarity. If the test result output by the target model does not conform to the labeling information, the target model can be regarded as an abnormal use case. The exception case is different from the model capability in that if the capability of the model is poor, for example, if the recognition capability of a certain text matching model in the dimension of the synonym is poor, the test result of the text matching model on a large amount of test data is not consistent with the labeling information. If the test result of the text matching model does not accord with the labeling information, the individual phenomenon is only shown on individual test data, the individual phenomenon is called an abnormal case, and the abnormal case does not reflect the capability of the model.

According to the embodiment of the invention, in the process of testing the target model, the abnormal cases generated in the testing process can be recorded, and the quality of the target model can be more comprehensively and accurately evaluated by combining the recorded abnormal cases and the testing result of the preset semantic dimension corresponding to the target model obtained by testing.

To sum up, the embodiment of the present invention generates the test data of the preset semantic dimension corresponding to the target model according to the function of the preset semantic dimension corresponding to the target model based on the core of the target model as the capability of semantic understanding, and tests the target model by using the test data of the preset semantic dimension, thereby obtaining the test result of the preset semantic dimension corresponding to the target model. The test result can represent the semantic understanding capability of the target model on the preset semantic dimension, and further can find whether the semantic understanding of the target model on the preset semantic dimension has problems. According to the test result, developers can be effectively helped to accurately find the problems of the target model, an optimization scheme is formulated, and the model is optimized. Further, the preset semantic dimension includes any one or more of a near meaning word dimension, an antisense word dimension, a robustness dimension, a named entity dimension, a time sequence relation dimension, a negative word dimension, a reference relation dimension, and a word sequence relation dimension. According to the embodiment of the invention, the function of the target model in the semantic dimension is subdivided to generate the test data of the multilevel preset semantic dimension, so that the test interpretation result is more refined and diversified, and the test result can evaluate the capability of the target model more systematically, more comprehensively and more accurately.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Device embodiment

Referring to fig. 2, a block diagram of a model testing apparatus according to an embodiment of the present invention is shown, the apparatus may include:

the test data generating module 201 is configured to generate test data of a preset semantic dimension corresponding to a target model according to a function of the preset semantic dimension corresponding to the target model, where the preset semantic dimension includes any one or more of a near-synonym dimension, an anti-synonym dimension, a robustness dimension, a named entity dimension, a time sequence relationship dimension, a negative-synonym dimension, a reference relationship dimension, and a word sequence relationship dimension;

the target model testing module 202 is configured to test the target model by using the test data of the preset semantic dimension corresponding to the target model, so as to obtain a test result of the preset semantic dimension corresponding to the target model.

Optionally, the test data generating module includes:

Optionally, the test data generation sub-module includes:

Optionally, the apparatus further comprises:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention provides an apparatus for model testing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for: generating test data of preset semantic dimensions corresponding to a target model according to the function of the target model corresponding to the preset semantic dimensions, wherein the preset semantic dimensions comprise any one or more of a near meaning word dimension, an antisense word dimension, a robustness dimension, a named entity dimension, a time sequence relation dimension, a negative word dimension, a reference relation dimension and a word sequence relation dimension; and testing the target model by using the test data of the preset semantic dimension corresponding to the target model to obtain the test result of the preset semantic dimension corresponding to the target model.

FIG. 3 is a block diagram illustrating an apparatus 800 for model testing in accordance with an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 3, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as the display and keypad of the apparatus 800, the sensor assembly 814 may also test for changes in the position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and temperature changes of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 4 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the model testing method shown in fig. 1.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a model testing method, the method comprising: generating test data of preset semantic dimensions corresponding to a target model according to the function of the target model corresponding to the preset semantic dimensions, wherein the preset semantic dimensions comprise any one or more of a near meaning word dimension, an antisense word dimension, a robustness dimension, a named entity dimension, a time sequence relation dimension, a negative word dimension, a reference relation dimension and a word sequence relation dimension; and testing the target model by using the test data of the preset semantic dimension corresponding to the target model to obtain the test result of the preset semantic dimension corresponding to the target model.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The above detailed description is provided for a model testing method, a model testing device and a device for model testing, and the specific examples are applied herein to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of model testing, the method comprising:

2. The method according to claim 1, wherein the generating test data corresponding to a preset semantic dimension of the target model according to the function corresponding to the preset semantic dimension of the target model comprises:

3. The method of claim 2, wherein the generating test data for the test method corresponding to the preset semantic dimension comprises:

4. The method of claim 3, wherein the first transformation rule comprises any one or more of: replacing target words in a target text with synonyms, replacing named entities in the target text with related named entities of the same type, performing position interchange on two target words with a parallel relation in the target text, adding target characters at a target position in the target text, performing active and passive relation interchange on two target words with an active and passive relation in the target text, and replacing target words in the target text with pronouns;

5. The method of claim 1, further comprising:

6. The method of claim 1, wherein after generating the test data for each semantic dimension, the method further comprises:

7. The method of claim 1, wherein the test data includes label information of corresponding semantic dimensions, and after obtaining the test result representing each semantic dimension corresponding to the target model, the method further comprises:

8. A model testing apparatus, characterized in that the apparatus comprises:

9. The apparatus of claim 8, wherein the test data generation module comprises:

10. The apparatus of claim 9, wherein the test data generation submodule comprises:

11. The apparatus of claim 10, wherein the first transformation rule comprises any one or more of: replacing target words in a target text with synonyms, replacing named entities in the target text with related named entities of the same type, performing position interchange on two target words with a parallel relation in the target text, adding target characters at a target position in the target text, performing active and passive relation interchange on two target words with an active and passive relation in the target text, and replacing target words in the target text with pronouns;

12. The apparatus of claim 8, further comprising:

13. The apparatus of claim 8, further comprising:

14. An apparatus for model testing comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the model testing method of any of claims 1-7.

15. A machine-readable medium having stored thereon instructions which, when executed by one or more processors of an apparatus, cause the apparatus to perform the model testing method of any of claims 1 to 7.