CN113673702A

CN113673702A - Method and device for evaluating pre-training language model and storage medium

Info

Publication number: CN113673702A
Application number: CN202110852575.6A
Authority: CN
Inventors: 闫真; 胡韧奋
Original assignee: Beijing Normal University
Current assignee: Beijing Normal University
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-11-19
Anticipated expiration: 2041-07-27
Also published as: CN113673702B

Abstract

The application discloses an evaluation method and device for a pre-training language model and a storage medium. The method for evaluating the pre-training language model comprises the following steps: determining a probe task for evaluating the specified language knowledge capability of the pre-training language model; acquiring a probe question related to a probe task, wherein the probe question comprises at least one statement for evaluating the specified language knowledge ability, and the statement comprises a test text for evaluating the specified language knowledge ability; generating a set of vectors associated with the sentence using the pre-trained language model, the set of vectors including context vectors corresponding to text of the sentence; and evaluating the specified language knowledge capability according to the context vector corresponding to the test text by using a preset probe model, wherein the pre-trained language model and the probe model are not required to be trained aiming at the evaluation of the specified language knowledge capability.

Description

Method and device for evaluating pre-training language model and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for evaluating a pre-trained language model, and a storage medium.

Background

In the field of natural language processing technology, pre-trained language models such as BERT are widely used, so that it is increasingly important to evaluate the capability (e.g., knowledge capability) of the pre-trained language models. Most researchers demonstrate that models have more knowledge through performance enhancement of the models on the target task. Representative Evaluation benchmarks (Evaluation Benchmark) are GLUE and SuperGLUE. Referring to fig. 1, the conventional evaluation method evaluates a pre-trained language model according to an output result of a neural network structure connected to the pre-trained language model and executing a downstream task. Although these two benchmarks have covered different text types, tasks, difficulties, and scales, the following problems still exist: firstly, the pre-training language model needs Fine-tuning training (Fine-tuning) on a target task, so that whether the pre-training language model learns knowledge in the pre-training process cannot be judged; secondly, the pre-trained language model may rely on a shallow heuristic approach to predict, and although the pre-trained language model performs well in each task and has performance exceeding that of human, the pre-trained language model does not necessarily really understand the language. Thus, benchmarking is the fine-tuning capability of the pre-trained language model, rather than the generic language understanding capability. That is, the evaluation method is easily affected by model fine tuning (for example, parameters of the pre-trained language model are adjusted) and the classifier, so the knowledge evaluation result of the pre-trained language model is greatly interfered.

In response to the above problems, researchers have proposed the use of probe Tasks (Probing Tasks) for knowledge detection and diagnosis. For example, a sentence-level probe task is proposed to evaluate the surface knowledge, syntactic knowledge, and semantic knowledge of a pre-trained language model. And the method also provides a method for decomposing the structured prediction task into a universal format by using the edge probe task and testing the syntax and semantic knowledge of partial segments of the sentence. Different from the evaluation benchmark, the probe task does not need to finely adjust the pre-training language model, does not interfere with the information of the pre-training language model, but needs to additionally train the probe classifier. Therefore, the knowledge cannot be judged whether the knowledge is coded in the pre-training language model or exists in the probe classifier or is subjected to the combined action of the two, so that the current probe task cannot test the knowledge of the pre-training language model.

Aiming at the technical problem that the method for evaluating the pre-training language model in the prior art needs to perform fine adjustment on the pre-training language model or perform additional training on a probe classifier so as to interfere the knowledge evaluation result of the pre-training language model, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the disclosure provides an evaluation method, an evaluation device and a storage medium for a pre-training language model, so as to at least solve the technical problem that the method for evaluating the pre-training language model in the prior art needs to perform fine tuning on the pre-training language model or perform additional training on a probe classifier, so that the knowledge evaluation result of the pre-training language model is interfered.

According to an aspect of the embodiments of the present disclosure, there is provided a method for evaluating a pre-trained language model, including: determining a probe task for evaluating the specified language knowledge capability of the pre-training language model; acquiring a probe question related to a probe task, wherein the probe question comprises at least one statement for evaluating the specified language knowledge ability, and the statement comprises a test text for evaluating the specified language knowledge ability; generating a set of vectors associated with the sentence using the pre-trained language model, the set of vectors including context vectors corresponding to text of the sentence; and evaluating the specified language knowledge capability according to the context vector corresponding to the test text by using a preset probe model, wherein the pre-trained language model and the probe model are not required to be trained aiming at the evaluation of the specified language knowledge capability.

According to another aspect of the embodiments of the present disclosure, there is also provided a storage medium including a stored program, wherein the method described above is performed by a processor when the program is executed.

According to another aspect of the embodiments of the present disclosure, there is also provided an evaluation apparatus for a pre-trained language model, including: the probe task determination module is used for determining a probe task for evaluating the specified language knowledge capability of the pre-training language model; the probe question acquisition module is used for acquiring a probe question related to a probe task, wherein the probe question comprises at least one sentence for evaluating the specified language knowledge capability, and the sentence comprises a test text for evaluating the specified language knowledge capability; a vector set generation module for generating a vector set associated with the sentence using the pre-trained language model, the vector set including context vectors corresponding to text of the sentence; and the capability evaluation module is used for evaluating the specified language knowledge capability according to the context vector corresponding to the test text by utilizing the preset probe model, and does not need to train the pre-trained language model and the probe model aiming at the evaluation of the specified language knowledge capability.

According to another aspect of the embodiments of the present disclosure, there is also provided an evaluation apparatus for a pre-trained language model, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: determining a probe task for evaluating the specified language knowledge capability of the pre-training language model; acquiring a probe question related to a probe task, wherein the probe question comprises at least one statement for evaluating the specified language knowledge ability, and the statement comprises a test text for evaluating the specified language knowledge ability; generating a set of vectors associated with the sentence using the pre-trained language model, the set of vectors including context vectors corresponding to text of the sentence; and evaluating the specified language knowledge capability according to the context vector corresponding to the test text by using a preset probe model, wherein the pre-trained language model and the probe model are not required to be trained aiming at the evaluation of the specified language knowledge capability.

Therefore, when the technical solution of this embodiment is used for constructing a probe task for evaluating the language knowledge ability of a pre-trained language model, a probe question related to the probe task is first constructed, and a sentence related to the probe question includes a test text (for example, a mask mark or an ambiguous word) for evaluating the language knowledge ability. Then, the technical solution of this embodiment inputs the sentence related to the probe problem to the pre-training language model to obtain a vector set corresponding to the sentence. Then, the technical solution of this embodiment evaluates the language capability by using the probe model according to the context vector corresponding to the test text. Since the language knowledge capability of the pre-trained language model is evaluated according to the context vector corresponding to the test text, the pre-trained language model does not need to be trained (i.e., fine-tuned) in the evaluation process, and the probe model does not need to be additionally trained, so that the evaluation of the pre-trained language model is not interfered, and the technical problem that the knowledge evaluation result of the pre-trained language model is interfered because the pre-trained language model needs to be finely tuned or the probe classifier needs to be additionally trained in the method for evaluating the pre-trained language model in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:

FIG. 1 is a schematic diagram of a solution for evaluating a pre-trained language model according to the prior art

FIG. 2 is a block diagram of a hardware structure of a computing device for implementing the method according to embodiment 1 of the present disclosure;

FIG. 3 is a schematic diagram of a pre-trained language model evaluation system according to embodiment 1 of the present disclosure;

FIG. 4 is a flowchart illustrating a method for evaluating a pre-trained language model according to a first aspect of embodiment 1 of the present disclosure;

FIG. 5 is a flowchart illustrating the evaluation of a pre-trained language model according to the first aspect of embodiment 1 of the present disclosure;

FIG. 6 is a schematic diagram of an evaluation device of a pre-trained language model according to embodiment 2 of the present disclosure; and

fig. 7 is a schematic diagram of an evaluation device of a pre-trained language model according to embodiment 3 of the present disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It is to be understood that the described embodiments are merely exemplary of some, and not all, of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

According to the present embodiment, there is provided a method embodiment of a method for pre-training a language model for evaluation, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

The method embodiments provided by the present embodiment may be executed in a mobile terminal, a computer terminal, a server or a similar computing device. FIG. 2 illustrates a block diagram of a hardware architecture of a computing device for implementing a method for profiling a pre-trained language model. As shown in fig. 2, the computing device may include one or more processors (which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory for storing data, and a transmission device for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 2 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computing device may also include more or fewer components than shown in FIG. 2, or have a different configuration than shown in FIG. 2.

It should be noted that the one or more processors and/or other data processing circuitry described above may be referred to generally as "data processing circuitry" in this embodiment. The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuitry may be a single, stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computing device. As referred to in the disclosed embodiments, the data processing circuit acts as a processor control (e.g., selection of a variable resistance termination path connected to the interface).

The memory may be configured to store software programs and modules of application software, such as program instruction/data storage devices corresponding to the evaluation method of the pre-training language model in the embodiments of the disclosure, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, implements the above-mentioned evaluation method of the pre-training language model of the application program. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include memory located remotely from the processor, which may be connected to the computing device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device is used for receiving or transmitting data via a network. Specific examples of such networks may include wireless networks provided by communication providers of the computing devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computing device.

It should be noted here that in some alternative embodiments, the computing device shown in fig. 2 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that FIG. 2 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computing devices described above.

FIG. 3 is a schematic diagram of a pre-training language model evaluation system according to the embodiment. Referring to fig. 3, the system 100 includes: a pre-training language model input interface 110 and a probe module 120. The pre-training language model input interface 110 is configured to receive a pre-training language model to be evaluated and information of language knowledge capability to be evaluated, and the probe module 120 selects a corresponding probe task to evaluate the pre-training language model according to the received information of language knowledge capability to be evaluated.

And with further reference to fig. 3, the probe module 120 includes a plurality of probe tasks for evaluating different linguistic knowledge capabilities of the pre-trained linguistic model. For example, the probe module 120 includes a syntactic knowledge probe task for evaluating syntactic knowledge abilities of the pre-trained language model, a semantic knowledge probe task for evaluating semantic knowledge abilities of the pre-trained language model, a factual knowledge probe task for evaluating factual knowledge abilities of the pre-trained language model, and an inference ability probe task for evaluating inference abilities of the pre-trained language model.

By way of example, the present embodiment consistently designs the syntactic knowledge probe task for the subject-predicate of english words, because it reflects important syntactic properties, including the tense of verbs, the single plural number of nouns, the person's name, and the comparative and highest level of adjectives. Further, the present embodiment defines the task as selection topics, each topic having 2-4 options, each option representing a different form of the target word. Table 1 shown below contains specific examples, and the model needs to find the corresponding correct answers, daughters, bless, and open, respectively, according to contextual clues, such as two, soul, and stepped.

As an example, the embodiment defines the semantic knowledge probe task from the perspective of word sense disambiguation, and allows the pre-trained language model to determine whether an ambiguous word has the same meaning in different sentences. Specifically, the present embodiment defines the task as a semantic knowledge test problem. The semantic knowledge testing problem proposed by the embodiment is composed of three example sentences of the same polysemous word, wherein the target word in one sentence has a different meaning from the other two sentences. As shown in table 1, the significance of apeal is different in the three sentences. In sentences a and b, a serious or urgent request is referred to, and in sentence c, an attraction or interest is referred to. The pre-trained language model should find out which two sentences use the same semantic item from the three example sentences.

As an example, in the embodiment, a factual knowledge probe task is defined as a gap filling task, a sentence with certain factual knowledge is selected, an entity in the sentence is hollowed to generate a problem, and a pre-training language model needs to be completed on the sentence. The example in Table 1 shows that the nationality of the famous writer Jack London needs to be obtained, and the pre-trained language model needs to correctly fill in the missing part.

By way of example, the present embodiment converts a reference resolution problem into a choice question, thereby defining an inferential capability probe task. As shown in table 1, the pre-trained language model needs to understand the meaning of fit, and can deduce that the cup is bigger than the suitcase according to the fact that the cup is not suitable for putting in the suitcase, and finally, the size of each of the cup and the suitcase is obtained, and the corresponding selection is completed.

Table 1: task examples

In summary, the probe task proposed by this embodiment includes three types of problems: the method comprises the steps of selecting questions for testing syntactic knowledge and reasoning ability, testing vector similarity problems of semantic knowledge, and testing filling in gaps of fact knowledge. Both the choice question and the fill-in-the-blank question can be solved using the same method, the only difference being whether a candidate is given. These three types of problems can be solved directly by the pre-trained language model without additional fine-tuning or training. In constructing a data set, the present invention carefully selects the vocabulary and source corpus to ensure that: (1) the target words can be retrieved from a word list of a mainstream pre-training language model; (2) knowledge can be extracted from natural text, rather than template generation; and (3) the data set is large-scale and balanced. Table 2 shows an overview of the data set constructed in this embodiment, and the specific construction method is as follows.

Constructing a syntactic knowledge probe data set: to construct the target vocabulary, the present embodiment first obtains the common parts in the BERT and RoBERTa vocabularies. All words are verified by WordNet, an Enchant spelling check tool and some English vocabulary resources, and irregular words and sub-words are removed. The embodiment then uses the Pattern tool to identify the morphological changes of the word, including the single complex number of the noun, the verb inflection, and the comparison and highest level of the adjective, so that the word can be restored to the original form. The present embodiment then performs data sampling. In consideration of data balance, 1062 groups of words are finally selected in the embodiment, and the total number of the words is 3186. Each group of words consists of 2, 3 or 4 words of the same original vocabulary form. For each target word, this embodiment randomly extracts 10 sentences from a coha (chinese of national american) balanced corpus while ensuring that the target word appears only once in the sentence, and the sentence length is between 10 and 60 words, and the length is at 5 intervals, such as [10,14], [15,19] or [20,24], etc. Then, the present embodiment replaces the target word in the sentence with [ MASK ], and uses the remaining words in the same group as the target word as the distracter, and finally 31,860 groups of data are obtained to form a syntactic knowledge probe data set.

Constructing a semantic knowledge probe data set: the present embodiment uses a vocabulary in the syntactic knowledge probe dataset to retrieve paraphrases and example sentences of words from the oxford dictionary, retaining ambiguous words that have at least two semantic items. For each semantic item of an ambiguous word, the present embodiment selects two example sentences as positive samples, and the example sentences of the remaining semantic items as negative samples. While ensuring that the target word appears only once in the illustrative sentence, the length of each illustrative sentence is limited to between 10 and 25 words. 35,735 sets of data on 4,676 words were finally obtained to form a semantic knowledge probe dataset.

Constructing a factual knowledge probe data set: the invention constructs a factual knowledge probe dataset by using a FewRel and KBP37 large-scale relational extraction dataset. The knowledge triplet information in the two data sets is artificially labeled and contains original natural sentences, so that the knowledge triplet information can be used as a high-quality data source. On the basis, the embodiment deletes the marked entity which comprises the sentences of pronouns and numbers, and enables the entity to appear in the sentences only once; then the relationship type with the sample number larger than 200 is reserved to ensure that the relationship for testing is common and general; finally, the entities in the sentence are replaced by [ MASK ]. This example obtained 5742 problems from FewRel, involving 17 relationships; 7190 questions were obtained from KBP37, which involved 10 relationships, which were combined to form a factual knowledge probe dataset.

Constructing an inference capability probe data set: the method uses WinoGrade as a data source to construct a reasoning ability probe data set, systematic deviation is removed from the data set, namely, a part of correct answer can be guessed without understanding text, and the method is suitable for evaluating the reasoning ability of a model. The invention replaces pronouns in sentences with [ MASK ], and directly uses candidate items in source data as options, thereby finally obtaining 3248 problems.

Table 2: probe dataset overview

Thus, in the above manner, the construction of the probe data set of the probe question for each probe task is completed.

In addition, it should be noted that the pre-training language model evaluation system shown in fig. 3 is applicable to the hardware structure described in fig. 2 above.

Under the above operating environment, according to the first aspect of the present embodiment, a method for evaluating a pre-trained language model is provided, and the method is implemented by the pre-trained language model evaluation system 100 shown in fig. 3. Fig. 4 shows a flow diagram of the method, which, with reference to fig. 4, comprises:

s402: determining a probe task for evaluating the specified language knowledge capability of the pre-training language model;

s404: acquiring a probe question related to a probe task, wherein the probe question comprises at least one statement for evaluating the specified language knowledge ability, and the statement comprises a test text for evaluating the specified language knowledge ability; and

s406: generating a set of vectors associated with the sentences using the pre-trained language model, the set of vectors including context vectors respectively corresponding to text of the sentences; and

s408: and evaluating the specified language knowledge capability according to the context vector corresponding to the test text by utilizing a preset probe model, wherein the probe model is a calculation model which does not need to update parameters.

Specifically, referring to fig. 3 and 5, when a worker evaluates a pre-trained language model through the pre-trained language model evaluation system 100, the pre-trained language model to be evaluated may be received through the pre-trained language model input interface 110, and the pre-trained language model may be evaluated through the probe module 120. The process in which the probe module 120 evaluates the training language model is illustrated with reference to fig. 5.

First, the probing module 120 determines the corresponding probing task according to the linguistic knowledge ability to be evaluated inputted by the staff. For example, a worker requests evaluation of one or more of syntactic knowledge ability, semantic knowledge ability, factual knowledge ability, and reasoning ability of a pre-trained language model. So that the probe module 120 can evaluate more than one-to-one linguistic knowledge ability. Wherein for each item of the specified linguistic knowledge capability evaluation process, the probe module 120 first determines a probe task for the currently evaluated linguistic knowledge capability (i.e., the specified linguistic knowledge capability) (S402).

The probe module 120 then obtains the probe question associated with the probe task (S404), and if the probe module 120 evaluates the factual knowledge capabilities of the pre-trained language model, as described above and shown in fig. 5, then obtains the gap filling question associated with the probe task. If the probe module 120 evaluates the syntactic knowledge ability or reasoning ability of the pre-trained language model, the choice questions related to the probe task are obtained. If the probe module 120 evaluates the semantic knowledge ability of the pre-trained language model, the problem of vector similarity associated with the probe task is obtained. As shown in tables 1 and 2 above, the probe question includes one or more sentences related to the linguistic knowledge ability to be evaluated, and the sentences include MASK marks [ MASK ] for hiding words at corresponding positions or ambiguous words (e.g., "appeal") having different word senses, that is, the test text described above.

Referring to fig. 5, the probe module 120 encodes a sentence of the probe question by using the pre-trained language model, so as to generate a set of vectors associated with the sentence, wherein the set of vectors includes context vectors corresponding to texts in the sentence (S406).

Then, as further shown in FIG. 5, the probe module 120 continues to evaluate the language knowledge capability of the pre-trained language model according to the context vector corresponding to the MASK mark [ MASK ] or the ambiguous word by using the probe model corresponding to the corresponding probe task (S408). Moreover, the probe model does not need to be updated, so that the probe model does not need to be trained. The specific process of evaluating the language knowledge ability using the probe model will be described in detail below.

As described in the background section, existing benchmarks for evaluating a pre-trained language model evaluate the fine-tuning capabilities of the pre-trained language model, rather than the general language understanding capabilities. In the existing probe, because the probe classifier needs to be trained additionally in the evaluation process, the knowledge coding in the pre-training language model or the knowledge coding in the probe classifier cannot be judged, or the knowledge coding in the probe classifier and the knowledge coding in the pre-training language model cannot be judged, so that the knowledge of the pre-training language model cannot be tested by the current probe task.

In view of this, when the technical solution of this embodiment is used to construct a probe task for evaluating the language knowledge ability of a pre-trained language model, a probe question related to the probe task is first constructed, and a sentence related to the probe question includes a test text (for example, a mask mark or an ambiguous word) for evaluating the language knowledge ability. Then, the technical solution of this embodiment inputs the sentence related to the probe problem to the pre-training language model to obtain a vector set corresponding to the sentence. Then, the technical solution of this embodiment evaluates the language knowledge capability by using the probe model according to the context vector corresponding to the test text. Since the language knowledge capability of the pre-trained language model is evaluated according to the context vector corresponding to the test text, the pre-trained language model does not need to be trained (i.e., fine-tuned) in the evaluation process, and the probe model does not need to be additionally trained, so that the evaluation of the pre-trained language model is not interfered, and the technical problem that the knowledge evaluation result of the pre-trained language model is interfered because the pre-trained language model needs to be finely tuned or the probe classifier needs to be additionally trained in the method for evaluating the pre-trained language model in the prior art is solved.

Optionally, the probe model includes a problem solving function corresponding to the probe problem. Wherein the problem solving function takes context vectors corresponding to the test text as input parameters and the parameters of the problem solving function are fixed. And wherein, the operation of evaluating the knowledge ability of the specified language according to the context vector corresponding to the test text by using the preset probe model comprises the following steps: inputting the context vector corresponding to the test text into a problem solving function for calculation; and evaluating the knowledge capability of the specified language according to the calculation result of the problem solving function.

Specifically, according to the method of this embodiment, in the process of implementing the syntactic knowledge probe task, the inference probe task, and the factual knowledge probe task, the probe module 120 may calculate, according to the context vector corresponding to the mask identifier, a probability value that the word hidden by the mask identifier is each word in the preset lexicon through a problem solving function of the probe model. The probe module 120 then further determines a prediction result of the corresponding probe question according to the calculated probability and evaluates the syntactic knowledge ability, reasoning ability, and factual knowledge ability of the pre-trained language model according to the prediction result.

In addition, during the process of performing the semantic knowledge probe task, the probe module 120 may calculate the similarity (e.g., distance or cosine similarity, etc.) between the context vectors of the ambiguous words according to the context vectors of the ambiguous words through the problem solving function of the probe model. The probe module 120 then further evaluates the semantic knowledge capabilities of the pre-trained language model based on the calculated similarities.

And specific evaluation processes of the specific problem solving functions are described in detail below.

Therefore, according to the method of the embodiment, the language knowledge capability of the pre-trained language model can be evaluated according to the context vector corresponding to the test text through a relatively simple problem solving function with fixed parameters. Because a simpler problem solving function is adopted to construct the probe model, the probe model does not need to be trained in the evaluation process of the pre-trained language model, and therefore the interference of the pre-trained language model evaluation caused by the training of the probe classifier is avoided.

Optionally, the operation of evaluating the knowledge capability of the specified language according to the calculation result of the problem solving function includes: judging whether the calculation result of the problem solving function is correct or not according to a preset reference answer corresponding to the probe problem to obtain a judgment result corresponding to the probe problem; determining the accuracy of the pre-training language model relative to the probe task according to the judgment results corresponding to the plurality of probe problems related to the probe task; and evaluating the specified language knowledge capability of the pre-training language model according to the accuracy.

Specifically, referring to the descriptions of table 1 and table 2 and the related contents, the technical solution of the present embodiment constructs corresponding probe data sets for each different probe task. Each probe data set contains a large number of questions and corresponding reference answers.

Thus, for example, when the probe module 120 evaluates the syntactic knowledge capabilities of the pre-trained language model, multiple choice questions can be selected from the syntactic probe dataset. And then, predicting each choice question by using the probe model, and judging whether the prediction of the probe model on each choice question is correct or not. Therefore, the probe module 120 can determine the accuracy of the pre-training language model under the syntactic knowledge probe task according to the determination result of each choice question. The probe module 120 further evaluates the syntactic knowledge capability of the pre-trained language model according to the correctness of the pre-trained language model under the syntactic knowledge probe task.

Similarly, the probe module 120 evaluates the reasoning ability, the factual knowledge ability, and the semantic knowledge ability of the pre-trained language model. And evaluating the pre-training language model by using the problems in the data set of each probe task respectively, thereby determining the accuracy of the pre-training language model for problem prediction under each probe task and determining each language knowledge capability of the pre-training language model according to the determined accuracy. For example, the higher the accuracy, the stronger the corresponding language knowledge ability to interpret the pre-trained language model. Conversely, the lower the accuracy, the weaker the corresponding language knowledge capability of the pre-trained language model.

Optionally, the probe question includes a statement, and the test text included in the statement is a mask flag indicating that the word corresponding to the position of the mask flag is hidden. And wherein, the operation of inputting the context vector corresponding to the test text into the problem solving function for calculation and evaluating the knowledge ability of the specified language according to the calculation result of the problem solving function comprises the following steps: determining a predicted word corresponding to the mask identifier according to the context vector corresponding to the mask identifier by using a problem solving function; determining whether the predicted word is a word masked by the mask identifier; and evaluating the specified language knowledge capability of the pre-training language model according to the judged judgment result.

Specifically, referring to table 1, the problem of the data set of the syntactic knowledge probe task, the inference probe task, and the factual knowledge probe task includes a MASK flag [ MASK ] indicating that words at corresponding positions are buried, requiring the pre-training language model to predict the buried words according to context. Therefore, when the probe module 120 evaluates the syntactic knowledge ability, reasoning ability, or factual knowledge ability of the pre-trained language model, the statement of the corresponding probe question including the MASK mark [ MASK ] is input to the pre-trained language model, and thus the pre-trained language model generates a vector set corresponding to the statement.

The probe module 120 then determines the predicted word corresponding to the MASK token [ MASK ] based on the context vector corresponding to the MASK token [ MASK ] using the problem solving function corresponding to the probe task. The probe module 120 then determines whether the predicted word determined by the pre-training language model is correct according to a preset reference answer corresponding to the probe question.

The probe module 120 then evaluates the linguistic knowledge capabilities of the pre-trained linguistic model based on the determination. For example, as described above, the probe module 120 predicts a plurality of probe questions of the probe task corresponding to the specified language knowledge capability by using the pre-trained language model, determines the accuracy of prediction of the plurality of probe questions by the pre-trained language model by using the determination result related to each probe question, and evaluates the language knowledge capability of the pre-trained language model according to the determined accuracy.

Optionally, the operation of determining the predicted word corresponding to the mask identifier from the context vector corresponding to the mask identifier using a problem solving function includes: determining the probability of the context vector corresponding to the mask mark relative to each word in a preset word bank by using a problem solving function; and determining a predicted word from a plurality of candidate words corresponding to the mask identification according to the determined probability, wherein the plurality of candidate words are words in a preset word bank.

Specifically, referring to table 1, the problems corresponding to the syntactic probe task and the inference probe task are selection problems, and therefore the probe models thereof are processed in a similar manner. The present embodiment therefore proposes to predict choice questions in the following way:

for example, for the sentence "By 1980, the y had two [ MASK ], and the least's career wa search", in the problem of syntactic probe task, the probe module 120 first generates a set of vectors corresponding to the sentence By using the pre-trained language model, as follows:

[h₁,h₂,h₃,…,h_T]＝f_enc(w₁,w₂,w₃,…,w_T) (1)

for example, where w₁～w_TRepresenting the text sequence contained in the sentence, f_encRepresenting the encoding operations of the pre-trained language model.

Then, the probe module 120 extracts MASK marks [ MASK ] according to the vector set obtained by the formula (1)]Is determined by the context vector h corresponding to the location of (a)_target。

The probe module 120 then uses the context vector h according to equation (2)_targetThrough a linear layer, willEnvironmental vector h_targetTransformation to another feature space.

h_dense＝W₁h_target+b₁ (2)

Wherein W₁As a weight matrix, b₁Is a bias vector.

Then, the probe module 120 pairs the vector h according to the GELU activation function shown in formula (3)_denseThe treatment is performed so as to increase the non-linear expression capability of the probe model. I.e. h_denseAll elements of which are less than 0 take a value of 0.

h_act＝max(0,h_dense) (3)

Then, the probe module 120 uses the following formula to align the vector h_actNormalization is performed to help obtain a gaussian-distributed word-embedded representation with a mean of 0 and a variance of 1, which helps to constrain the word-ambiguity space.

h_hidden＝LayerNorm(h_act) (4)

Wherein the LayerNorm function is defined with reference to the following equation:

calculating a vector h_actOf each element of (1), wherein d_modelIs a vector h_actEqual to the number of elements of the context vector, h_iIs a vector h_actOf (2) is used.

Calculating a vector h_actThe variance of each of the elements of (a),

where the parameters γ and β can be trained in advance along with the network of pre-trained language models.

The probe module 120 then pairs the vector h using the softmax function_hiddenIs processed fromAnd obtaining the probability corresponding to each word in the word stock:

[p₁,p₂,p₃,......,p_v]＝softmax(W₂h_hidden+b₂) (8)，

wherein p is₁To p_vRepresenting MASK identification [ MASK ]]The probability that the hidden word is an individual word in the lexicon. Wherein W₂To transform the matrix so that the number of elements of the transformed vector is equal to the number of words in the vocabulary, b₂Is a bias vector.

Thus, in this manner, the probe module 120 marks [ MASK ] according to the MASK using the problem solving function (i.e., equations (2) through (8))]Corresponding context vector h_targetCalculating the MASK mark [ MASK ]]Probability p that the concealed word is each word in the lexicon₁To p_v。

Then, referring to Table 1, since the problem corresponds to the candidate daughter/-s, the probe module 120 selects the candidate with the highest probability value from the two candidates as the word predicted by the pre-trained language model and hidden by the MASK mark [ MASK ] (i.e., the predicted word).

Although the above is illustrated with the problem "By 1980, the y had two [ MASK ], and leave's career waters searching", as an example, other choice questions and other probe tasks constructed in the form of choice questions are also applicable in the present invention.

Therefore, by the method, the probe classifier does not need to be trained additionally, and the language knowledge capability of the pre-trained language model can be evaluated only by performing linear transformation on the context vector corresponding to the mask mark and performing the processing of the statistical function. Therefore, the interference caused by the additional training of the probe classifier is eliminated, and meanwhile, the language knowledge ability of the pre-trained language model is evaluated in a mode of objectively reflecting the language knowledge ability of the pre-trained language model.

Optionally, the language knowledge capability is designated as a syntactic knowledge capability, and wherein the mask identifies the masked words for evaluating the syntactic knowledge capability of the pre-trained language model; or assigning the language knowledge capabilities as reasoning capabilities, and wherein the mask identifies the masked words for evaluating the reasoning capabilities of the pre-trained language model. As shown in Table 1 above, the problems of the syntactic knowledge probing task and the inference probing task are both in the form of choice questions, and thus are applicable to the above-described method for evaluating the linguistic knowledge capability of the pre-trained language model.

Optionally, the operation of determining the predicted word corresponding to the mask identifier from the context vector corresponding to the mask identifier using a problem solving function includes: determining the probability of the context vector corresponding to the mask mark relative to each word in a preset word bank by using a problem solving function; and determining predicted words directly from a preset word bank according to the determined probability.

Referring to table 1, the problem for the factual knowledge probing task is a gap filling problem, so this embodiment proposes to directly calculate MASK flag [ MASK ] according to the above formulas (2) to (8)]Probability p that the concealed word is each word in the lexicon₁To p_v. And predicting the words with the highest probability value in the word library as MASK identification [ MASK ]]The word that was masked.

And optionally, designating the language knowledge capability as a factual knowledge capability, and wherein the mask identifies the factual knowledge capability of the masked words for evaluating the pre-trained language model.

As shown in Table 1 above, the problem of the factual knowledge probing task is in the form of a gap filling, and thus is applicable to the above-described method for evaluating the linguistic knowledge capabilities of a pre-trained linguistic model.

Further, although in the present embodiment, the problem of the syntactic knowledge probe task and the inference probe task is the choice question form, and the problem of the factual knowledge probe task is the fill-in-blank question. But the specific problem form is not fixed. For example, the problem of the syntactic knowledge probing task can also be in the form of a gap filling problem, and the problem of the factual knowledge probing task can also be in the form of a choice problem, which is not described in detail herein.

Alternatively, wherein the language knowledge capability is specified as a semantic knowledge capability, the probe question includes a plurality of sentences, and the test text is an ambiguous word set in the plurality of sentences, respectively. And wherein the operation of generating a set of vectors associated with the sentence using the pre-trained language model comprises: a plurality of vector sets respectively associated with the plurality of sentences are generated using the pre-trained language model. And wherein, the context vector corresponding to the test text is input into the problem solving function for calculation, and the operation of evaluating the knowledge ability of the specified language according to the calculation result of the problem solving function comprises the following steps: extracting polysemous word vectors corresponding to polysemous words contained in a plurality of vector sets; calculating the similarity between the extracted polysemous word vectors by using a problem solving function; determining sentences with similar word senses of the polysemous words according to the similarity between the extracted polysemous word vectors; and judging whether the sentences with the similar word senses of the determined polysemous words are correct or not, and evaluating the semantic knowledge capability of the pre-trained language model according to the judged judgment result.

Specifically, referring to table 1, the semantic knowledge probe task in the present embodiment is provided with a plurality of sentences (e.g., sentences a to c shown in table 1) each containing an ambiguous word (e.g., "apeal"), and the pre-trained language model predicts which sentences have more similar semantics of the ambiguous word.

To address such a problem, the probe module 120 first generates a set of vectors corresponding to each statement using a pre-trained language model. Taking the sentences a to c shown in table 1 above as an example, the probe module 120 first generates vector sets corresponding to the sentences a to c, respectively, using the pre-trained language model according to formula (1). The probe module 120 then corresponds to the sentence aExtracts a context vector h corresponding to the ambiguous word "apeal" in the sentence a_aExtracting a context vector h corresponding to the ambiguous word "apeal" in the sentence b from the vector set corresponding to the sentence b_bAnd extracting a context vector h corresponding to the ambiguous word "apeal" in the sentence c from the vector set corresponding to the sentence c_c。

Then, the probe module 120 calculates cosine vector similarities cos (a, b), cos (b, c) and cos (a, c) between the vectors by using the following formula (9):

then, the probe module 120 determines two sentences with the most similar semantics of the ambiguous word "apeal" in the sentences a-c according to the calculated similarity. For example, if cos (a, b) is highest, it indicates that the semantics of the ambiguous word "apeal" in the pre-trained language model prediction sentence a and sentence b are closest; if cos (b, c) is the highest, the semantics of the ambiguous word "apeal" in the sentence b and the sentence c are the closest to each other; if cos (a, c) is the highest, it indicates that the pre-trained language model predicts the semantics of the ambiguous word "apeal" in statement a and statement c to be the closest.

The probe module 120 then determines whether the prediction of the pre-trained language model is accurate according to the reference answers, and evaluates the semantic knowledge capability of the pre-trained language model according to the determination result. Specifically, for example, the probe module 120 may predict a plurality of problems in the semantic knowledge probe task by using the pre-trained language model, and evaluate the semantic knowledge capability of the pre-trained language model according to the prediction accuracy of the pre-trained language model.

Therefore, through the mode, the probe task can be realized only through simpler operation, so that the interference caused by additional training of the probe classifier is eliminated, and meanwhile, the language knowledge capability of the pre-trained language model is evaluated in a mode of objectively reflecting the language knowledge capability of the pre-trained language model. Furthermore, although the above description is made in terms of cosine vector similarity, other ways for reflecting the similarity may be used, for example, the distance between the context vectors a-c may be used instead of the cosine vector similarity described above.

Further, referring to fig. 2, according to a second aspect of the present embodiment, there is provided a storage medium. The storage medium comprises a stored program, wherein the method of any of the above is performed by a processor when the program is run.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

Fig. 6 shows an evaluation apparatus 600 for a pre-trained language model according to the first aspect of the present embodiment, the apparatus 600 corresponding to the method according to the first aspect of the embodiment 1. Referring to fig. 6, the apparatus 600 includes: a probe task determination module 610 for determining a probe task for evaluating the specified language knowledge capability of the pre-trained language model; a probe question acquiring module 620, configured to acquire a probe question related to a probe task, where the probe question includes at least one sentence for evaluating the specified language knowledge capability, and the sentence includes a test text for evaluating the specified language knowledge capability; a vector set generating module 630 for generating a set of vectors associated with the sentence using the pre-trained language model, the set of vectors including context vectors corresponding to text of the sentence; and a capability evaluation module 640, configured to evaluate the capability of the specified language knowledge according to the context vector corresponding to the test text by using a preset probe model, and to evaluate the capability of the specified language knowledge without training the pre-trained language model and the probe model.

Optionally, the probe model includes a problem solving function corresponding to the probe question, wherein the problem solving function has a context vector corresponding to the test text as an input parameter and a parameter of the problem solving function is fixed. And wherein the competency evaluation module 640 comprises: the function calculation submodule is used for inputting the context vector corresponding to the test text into the problem solving function for calculation; and the capability evaluation submodule is used for evaluating the knowledge capability of the specified language according to the calculation result of the problem solving function.

Optionally, the ability assessment module comprises: the answer judging unit is used for judging whether the calculation result of the problem solving function is correct or not according to a preset reference answer corresponding to the probe question to obtain a judgment result corresponding to the probe question; the accuracy determining unit is used for determining the accuracy of the pre-training language model relative to the probe task according to the judgment results corresponding to the plurality of probe problems related to the probe task; and the first ability evaluating unit is used for evaluating the specified language knowledge ability of the pre-training language model according to the accuracy.

Optionally, the probe question includes a statement, and the test text included in the statement is a mask flag indicating that the word corresponding to the position of the mask flag is hidden. And wherein the function computation submodule comprises: and the word prediction unit is used for determining a predicted word corresponding to the mask mark according to the context vector corresponding to the mask mark by using the problem solving function. The capability evaluation sub-module comprises: a word judgment unit configured to judge whether the predicted word is a word concealed by the mask flag; and the first ability evaluating unit is used for evaluating the specified language knowledge ability of the pre-training language model according to the judged judgment result.

Optionally, the word prediction unit comprises: the first probability determining subunit is used for determining the probability of the context vector corresponding to the mask code identifier relative to each word in the preset word bank by using a problem solving function; and a first word prediction subunit, configured to determine, according to the determined probability, a predicted word from a plurality of candidate words corresponding to the mask identifier, where the plurality of candidate words are words in a preset lexicon.

Optionally, the language knowledge capability is designated as a syntactic knowledge capability, and wherein the mask identifies the masked words for evaluating the syntactic knowledge capability of the pre-trained language model; or assigning the language knowledge capabilities as reasoning capabilities, and wherein the mask identifies the masked words for evaluating the reasoning capabilities of the pre-trained language model.

Optionally, the word prediction unit comprises: the second probability determining subunit is used for determining the probability of the context vector corresponding to the mask code identifier relative to each word in the preset word bank by using the problem solving function; and the second word prediction subunit determines the predicted words directly from the preset word bank according to the determined probability.

Optionally, the language knowledge capability is designated as a factual knowledge capability, and wherein the mask identifies the factual knowledge capability of the masked words for evaluating the pre-trained language model.

Optionally, wherein the language knowledge capability is specified as a semantic knowledge capability, the probe question includes a plurality of sentences, and the test text is an ambiguous word arranged in the plurality of sentences, respectively, and wherein the vector set generating module 630 includes a vector set generating submodule for generating a plurality of vector sets associated with the plurality of sentences, respectively, using the pre-trained language model. And wherein the function computation submodule comprises: the polysemous word vector extraction unit is used for extracting polysemous word vectors which are contained in the vector sets and correspond to polysemous words; and a similarity calculation unit for calculating a similarity between the extracted ambiguous word vectors using a solution function. The capability evaluation sub-module comprises: a sentence determining unit for determining sentences with similar word senses of the polysemous words according to the similarity between the extracted polysemous word vectors; and the second ability evaluating unit is used for judging whether the sentences with similar word senses of the determined polysemous words are correct or not and evaluating the semantic knowledge ability of the pre-training language model according to the judged judgment result.

Example 3

Fig. 7 shows an evaluation apparatus 700 for a pre-trained language model according to the first aspect of the present embodiment, the apparatus 700 corresponding to the method according to the first aspect of the embodiment 1. Referring to fig. 7, the apparatus 700 includes: a processor 710; and a memory 720, coupled to the processor, for providing instructions to the processor to process the following process steps: determining a probe task for evaluating the specified language knowledge capability of the pre-training language model; acquiring a probe question related to a probe task, wherein the probe question comprises at least one statement for evaluating the specified language knowledge ability, and the statement comprises a test text for evaluating the specified language knowledge ability; generating a set of vectors associated with the sentence using the pre-trained language model, the set of vectors including context vectors corresponding to text of the sentence; and evaluating the specified language knowledge capability according to the context vector corresponding to the test text by using a preset probe model, wherein the pre-trained language model and the probe model are not required to be trained aiming at the evaluation of the specified language knowledge capability.

Optionally, the probe model includes a problem solving function corresponding to the probe question, wherein the problem solving function has a context vector corresponding to the test text as an input parameter and a parameter of the problem solving function is fixed. And wherein, the operation of evaluating the knowledge ability of the specified language according to the context vector corresponding to the test text by using the preset probe model comprises the following steps: inputting the context vector corresponding to the test text into a problem solving function for calculation; and evaluating the knowledge capability of the specified language according to the calculation result of the problem solving function.

Alternatively, the language knowledge capability is designated as a semantic knowledge capability, the probe question includes a plurality of sentences, and the test text is an ambiguous word set in each of the plurality of sentences. And wherein the operation of generating a set of vectors associated with the sentence using the pre-trained language model comprises: generating a plurality of vector sets respectively associated with a plurality of sentences by utilizing a pre-training language model, inputting context vectors corresponding to test texts into a problem solving function for calculation, and evaluating the specified language knowledge capability according to the calculation result of the problem solving function, wherein the operation comprises the following steps: extracting polysemous word vectors corresponding to polysemous words contained in a plurality of vector sets; calculating the similarity between the extracted polysemous word vectors by using a problem solving function; determining sentences with similar word senses of the polysemous words according to the similarity between the extracted polysemous word vectors; and judging whether the sentences with the similar word senses of the determined polysemous words are correct or not, and evaluating the semantic knowledge capability of the pre-trained language model according to the judged judgment result.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An evaluation method for a pre-trained language model is characterized by comprising the following steps:

determining a probe task for evaluating the specified language knowledge capability of the pre-training language model;

acquiring a probe question related to the probe task, wherein the probe question comprises at least one statement for evaluating the specified language knowledge capability, and the statement comprises a test text for evaluating the specified language knowledge capability;

generating a set of vectors associated with the sentence using the pre-trained language model, the set of vectors including context vectors corresponding to text of the sentence; and

and evaluating the specified language knowledge ability according to the context vector corresponding to the test text by utilizing a preset probe model, wherein the pre-trained language model and the probe model do not need to be trained aiming at the evaluation of the specified language knowledge ability.

2. The method of claim 1, wherein the probe model comprises a problem solving function corresponding to the probe question, wherein the problem solving function takes context vectors corresponding to the test text as input parameters and parameters of the problem solving function are fixed, and wherein

The operation of evaluating the specified language knowledge ability according to the context vector corresponding to the test text by utilizing a preset probe model comprises the following steps:

inputting the context vector corresponding to the test text into the problem solving function for calculation; and

and evaluating the knowledge capability of the specified language according to the calculation result of the problem solving function.

3. The method according to claim 2, wherein the operation of evaluating the knowledge capabilities of the specified language based on the result of the solution function comprises:

judging whether the calculation result of the problem solving function is correct or not according to a preset reference answer corresponding to the probe problem to obtain a judgment result corresponding to the probe problem;

determining the accuracy of the pre-training language model relative to the probe task according to the judgment results corresponding to the plurality of probe problems related to the probe task; and

and evaluating the specified language knowledge capability of the pre-training language model according to the accuracy.

4. The method according to claim 2 or 3, wherein the probe question includes a sentence, and the test text included in the sentence is a mask flag indicating that the word corresponding to the position of the mask flag is hidden, and wherein the operation of inputting the context vector corresponding to the test text into the solution function for calculation and evaluating the knowledge capability of the specified language according to the calculation result of the solution function comprises:

determining a predicted word corresponding to the mask identifier according to the context vector corresponding to the mask identifier by using the problem solving function;

determining whether the predicted word is the masked word; and

and evaluating the specified language knowledge capability of the pre-training language model according to the judged judgment result.

5. The method of claim 4, wherein the operation of determining, using the solution function, a predicted word corresponding to the mask token from the context vector corresponding to the mask token comprises:

determining the probability of the context vector corresponding to the mask identification relative to each word in a preset word bank by using the problem solving function; and

determining the predicted word from a plurality of candidate words corresponding to the mask identification according to the determined probabilities, wherein the plurality of candidate words are words in the preset lexicon, and wherein

The specified language knowledge capabilities are syntactic knowledge capabilities, and wherein the mask identifies that the masked words are used to evaluate the syntactic knowledge capabilities of the pre-trained language model; or

The specified language knowledge capabilities are reasoning capabilities, and wherein the mask identifies the masked words for evaluating the reasoning capabilities of the pre-trained language model.

6. The method of claim 4, wherein the operation of determining, using the solution function, a predicted word corresponding to the mask token from the context vector corresponding to the mask token comprises:

determining said predicted word directly from said predetermined lexicon in dependence on said determined probability, and wherein

The specified language knowledge capabilities are factual knowledge capabilities, and wherein the mask identifies the masked words for evaluating the factual knowledge capabilities of the pre-trained language model.

7. The method of claim 2 or 3, wherein the specified linguistic knowledge capability is a semantic knowledge capability, the probe question includes a plurality of sentences, and the test text is an ambiguous word arranged in the plurality of sentences, respectively, and wherein,

an operation of generating a set of vectors associated with the sentence using the pre-trained language model, comprising: generating a plurality of sets of vectors respectively associated with the plurality of sentences using the pre-trained language model, and wherein

Inputting the context vector corresponding to the test text into the problem solving function for calculation, and evaluating the knowledge capability of the specified language according to the calculation result of the problem solving function, wherein the operation comprises the following steps:

extracting polysemous word vectors which are contained in the vector sets and correspond to the polysemous words;

calculating the similarity between the extracted polysemous word vectors by using the problem solving function;

determining sentences with similar word senses of the polysemous words according to the similarity between the extracted polysemous word vectors; and

and judging whether the sentences with the similar word senses of the determined polysemous words are correct or not, and evaluating the semantic knowledge capability of the pre-training language model according to the judged judgment result.

8. A storage medium comprising a stored program, wherein the method of any one of claims 1 to 7 is performed by a processor when the program is run.

9. An evaluation device for a pre-trained language model, comprising:

the probe task determination module is used for determining a probe task for evaluating the specified language knowledge capability of the pre-training language model;

the probe question acquisition module is used for acquiring a probe question related to the probe task, wherein the probe question comprises at least one sentence for evaluating the specified language knowledge capability, and the sentence comprises a test text for evaluating the specified language knowledge capability;

a vector set generation module to generate a vector set associated with the sentence using the pre-trained language model, the vector set including context vectors corresponding to text of the sentence; and

and the capability evaluation module is used for evaluating the specified language knowledge capability according to the context vector corresponding to the test text by utilizing a preset probe model, and the pre-trained language model and the probe model do not need to be trained aiming at the evaluation of the specified language knowledge capability.

10. An evaluation device for a pre-trained language model, comprising:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: