CN117009471A

CN117009471A - Interpreted text generation model training method, interpreted text generation method and device thereof

Info

Publication number: CN117009471A
Application number: CN202211187946.4A
Authority: CN
Inventors: 李沁桐; 闭玮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2023-11-07

Abstract

Embodiments of the present disclosure provide an interpreted text generation model training method, an interpreted text generation method, an apparatus, a device, and a computer-readable storage medium. The method provided by the embodiment of the disclosure refines information related to the data sample from the candidate interpretation text by further optimizing the candidate interpretation text generated based on the data sample, and simultaneously optimizing an information bottleneck target and a language modeling target which are related to the information based on information compression and information, ignores information not related to the data sample, trains model parameters in a generating manner, and accordingly determines an interpretation text generation model capable of generating interpretation text with higher quality. The method of the embodiment of the disclosure can simultaneously optimize and complete the tasks of eliminating redundant irrelevant information of the interpretation text and retaining relevant information of the data sample, and can obtain the interpretation text with higher quality without label supervision according to the characteristics of the interpretation text without any high-quality interpretation text label.

Description

Interpreted text generation model training method, interpreted text generation method and device thereof

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to an interpreted text generation model training method, an interpreted text generation method, and apparatuses, devices, and storage media thereof.

Background

In the field of artificial intelligence natural language processing, a large-scale pre-training language model (Pretrained Language Model, PLM) based on a deep learning algorithm can learn abundant language modeling information from a massive training corpus, and large-scale parameters contained in the PLM store the language modeling knowledge, so that the PLM obtains excellent performance on various downstream tasks of natural language processing.

However, how to accurately interpret the decision process behind the "black box feature" of the deep language model in a human readable manner remains an open question, e.g., in the field of robotic questioning and answering in natural language processing, for "what weather is likely to be late? "this question, how to interpret the answer" rainy day "is a reasonable answer. Text interpretation in natural language processing can be summarized into three categories: (1) A word, phrase or sentence selected from the data instance; (2) a structurally distinct combination rule; (3) free text with open character. Among other things, free-text-form interpretation is more expressive and often more readable, and free-text-form interpretation in natural language processing is often defined as a fluent text description that helps humans understand the cause of model decisions, for helping users understand "why this output is given to this input".

Therefore, in order to reveal the underlying cause of behavior of neuro-language models with "black box model" properties, there is a need for an efficient and accurate method of interpreted text generation so that natural language interpreted text can be generated in a human-readable manner for any given data sample.

Disclosure of Invention

To solve the above-described problems, the present disclosure refines information related to a data sample in a candidate interpretation text by further optimizing for a rough candidate interpretation text generated based on the data sample, and ignores information not related to the data sample, thereby generating a higher quality interpretation text.

Embodiments of the present disclosure provide an interpreted text generation model training method, an interpreted text generation method, an apparatus, a device, and a computer-readable storage medium.

The embodiment of the disclosure provides an interpretation text generation model training method, which comprises the following steps: acquiring a data sample, and acquiring a candidate interpretation text based on the data sample; performing information compression processing based on the candidate interpretation text to determine an information compression representation corresponding to the candidate interpretation text, and determining an information compression loss based on the candidate interpretation text and the information compression representation; determining an information-dependent loss based on information-dependent between the information compression representation and the data sample; obtaining a final interpretation text by decoding the information compression representation, and determining language modeling loss based on the candidate interpretation text and the final interpretation text; and determining the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss.

The embodiment of the disclosure provides an interpretation text generation method, which is used for acquiring a data instance; generating a final interpretation text by an interpretation text generation model based on the data instance; the interpretation text generation model is trained according to the interpretation text generation model training method.

The embodiment of the disclosure provides an interpretation text generation model training device, comprising: a data acquisition module configured to acquire a data sample and obtain a candidate interpretation text based on the data sample; an information compression module configured to perform information compression processing based on the candidate interpretation text to determine an information compression representation corresponding to the candidate interpretation text, and to determine an information compression loss based on the candidate interpretation text and the information compression representation; an information correlation module configured to determine an information correlation loss based on information correlation between the information compression representation and the data sample; an information decoding module configured to obtain a final interpretation text by performing decoding processing on the information compression representation, and determine a language modeling loss based on the candidate interpretation text and the final interpretation text; and a model determination module configured to determine the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss.

An embodiment of the present disclosure provides an interpretation text generating apparatus, including: one or more processors; and one or more memories, wherein the one or more memories have a computer executable program stored therein that, when executed by the processor, performs the interpreted text generation model training method and/or the interpreted text generation method as described above.

Embodiments of the present disclosure provide a computer readable storage medium having stored thereon computer executable instructions for implementing an interpreted text generation model training method and/or an interpreted text generation method as described above when executed by a processor.

Embodiments of the present disclosure provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform an interpreted text generation model training method and/or an interpreted text generation method according to embodiments of the present disclosure.

Compared with the traditional interpretation text generation method, the method provided by the embodiment of the disclosure avoids the dependence of the traditional supervised training on large-scale manual annotation data, does not need any interpretation text with high quality, and can generate interpretation text with higher quality without annotation supervision, thereby solving the problems of high cost of manual annotation, difficulty in unification of quality and the like.

The method provided by the embodiment of the disclosure refines information related to the data sample from the candidate interpretation text by further optimizing the candidate interpretation text generated based on the data sample, and simultaneously optimizing an information bottleneck target and a language modeling target which are related to the information based on information compression and information, ignores information not related to the data sample, trains model parameters in a generating manner, and accordingly determines an interpretation text generation model capable of generating interpretation text with higher quality. The method of the embodiment of the disclosure can simultaneously optimize and complete the tasks of eliminating redundant irrelevant information of the interpretation text and retaining relevant information of the data sample, and can obtain the interpretation text with higher quality without label supervision according to the characteristics of the interpretation text without any high-quality interpretation text label.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are used in the description of the embodiments will be briefly described below. It should be apparent that the drawings in the following description are only some exemplary embodiments of the present disclosure, and that other drawings may be obtained from these drawings by those of ordinary skill in the art without undue effort.

FIG. 1 is a schematic diagram illustrating a scenario in which an input data sample is processed and an interpretation text generation result is returned, according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating supervised learning based on annotated data sets according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating unsupervised learning based on a pre-processing language model according to an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating an interpretation text generation model training method in accordance with an embodiment of the present disclosure;

FIG. 5 is a schematic flow diagram illustrating an interpretation text generation model training method in accordance with an embodiment of the disclosure;

fig. 6 is a flowchart illustrating an explanation text generation method according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating an interpreted text generation model training apparatus according to an embodiment of the present disclosure;

FIG. 8 shows a schematic diagram of an interpreted text generating device according to an embodiment of the present disclosure;

FIG. 9 illustrates a schematic diagram of an architecture of an exemplary computing device, according to an embodiment of the present disclosure; and

fig. 10 shows a schematic diagram of a storage medium according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more apparent, exemplary embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.

In the present specification and drawings, steps and elements having substantially the same or similar are denoted by the same or similar reference numerals, and repeated descriptions of the steps and elements will be omitted. Meanwhile, in the description of the present disclosure, the terms "first," "second," and the like are used merely to distinguish the descriptions, and are not to be construed as indicating or implying relative importance or order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

For purposes of describing the present disclosure, the following presents concepts related to the present disclosure.

The interpretation text generation method of the present disclosure may be artificial intelligence (Artificial intelligence, AI) based. Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. For example, for an artificial intelligence based interpretation text generation method, it is able to generate a fluent interpretation text description in a manner similar to human interpretation of sentences in a sentence-based sense. Artificial intelligence is used for researching the design principles and implementation methods of various intelligent machines, so that the interpretation text generation method has the functions of simultaneously optimizing and completing the interpretation text redundancy elimination irrelevant information and retaining data sample relevant information to generate higher-quality interpretation texts.

The interpreted text generation method of the present disclosure may be based on natural language processing (Nature Language processing, NLP). Natural language processing is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like. In the interpretation Text Generation method of the present disclosure, text Generation (Text Generation) technology in natural language processing may be based. Text generation is the construction process of phrases, sentences, and short text in natural language, and can be considered as an inverse process of machine analysis of natural language. The importance of text generation is that man-machine dialog is not really achievable until the computer can freely generate natural language. The complexity of text generation is that it encompasses various knowledge of grammar, semantics, text structure, language vocabulary, language environment involved in conversation, etc., and these knowledge interact, are in close relationship with each other. The interpreted text generation method of the present disclosure is directed to generating comprehensive and fine natural language interpreted text in a human-readable manner for any given data sample using text generation techniques.

Furthermore, the interpretation text generation method of the present disclosure may be based on mutual information in the information theory (Mutual Information). Mutual information is an information measure in information theory, which can be seen as the amount of information contained in one random variable about another random variable, or as the uncertainty that one random variable reduces due to knowing another random variable. It is also a measure of the degree of difference between the product of the joint distribution of two random variables and the respective marginal distribution.

The interpretation text generation method of the present disclosure may also be based on the principle of information bottlenecks (information bottleneck, IB). The information bottleneck principle is a widely adopted deep neural network training target and theoretical analysis method. The information bottleneck principle is a new theory proposed by computer scientists and neuroscientists to explain how machine learning works. The theory holds that deep neural networks learn according to a process called "information bottleneck", namely: the computer neural network eliminates irrelevant noisy input data just like squeezing information through a bottleneck, and only the most relevant characteristic data is reserved. At present, the information bottleneck principle solves the applicability problem of the network size well through verification, and compared with other methods, the generalization performance of the deep learning network is better defined. In the interpretation text generation method of the present disclosure, a balance can be found between task related (generalized) information and task unrelated (compressed) information using the information bottleneck principle, thereby removing redundant unrelated information in the interpretation text and retaining data sample related information.

In view of the foregoing, embodiments of the present disclosure relate to techniques of artificial intelligence, natural language processing, and the like, and embodiments of the present disclosure will be further described below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram illustrating a scenario in which an input data sample is processed and an interpretation text generation result is returned according to an embodiment of the present disclosure. Fig. 2 is a schematic diagram illustrating supervised learning based on annotated data sets, according to an embodiment of the present disclosure. Fig. 3 is a schematic diagram illustrating unsupervised learning based on a pre-processing language model according to an embodiment of the present disclosure.

As shown in fig. 1, a data sample (e.g., including information such as a data sample, a hint text, etc.) to be processed may be sent by a user terminal to a server for the server to generate interpreted text based on the data sample. The server may then return the generated interpreted text to the user terminal over the network for display to the user.

Alternatively, the user terminal may specifically include, but is not limited to, a smart phone, a tablet computer, a laptop portable computer, a desktop computer, a vehicle-mounted terminal, a wearable device, and the like. The user terminal may also be a client that installs a browser or various applications, including system applications and third party applications. Alternatively, the network may be an internet of things (Internet of Things) based on the internet and/or a telecommunication network, which may be a wired network or a wireless network, for example, it may be a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a cellular data communication network, or an electronic network capable of implementing an information exchange function, where the user terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms.

In the server shown in fig. 1, the received data sample input information may be utilized to generate a data sample output and, while generating output text, generate a user-readable interpretation as support information. For example, in a typical application scenario, such as in a smart authoring system, the back logic and supporting information (e.g., the reference source of the generated sentence, the role of the generated sentence, etc.) to author each sentence may be generated at the same time to help the user learn authoring from the interpreted text, and the user may also purposefully modify the machine-generated sentence based on the interpreted text.

Currently, for interpreted text generation, a new task can be introduced based on the superior performance of a large-scale pre-trained language model (PLM) trained on massive data on text generation: the prompt type generating task can generate high-quality text based on the prompt text without training a language model. The rise of the hinting generation task has prompted many efforts to generate interpreted text via a hinting model, i.e., to add hinting text (e.g., "because") as input to the PLM following the data sample to be interpreted to generate interpreted text. Currently, the method of generating interpreted text on data samples can mainly include two types, i.e., supervised learning based on labeled data sets and unsupervised learning based on a pre-processed language model, as shown in fig. 2.

As shown in fig. 2, a first type of method of generating interpreted text on a data sample is based on supervised learning with annotated data sets. The interpretation text written by human is collected as training data through manual annotation for the data sample, and the training language model generates the interpretation text according to the given data sample, but the method relies on high-quality annotation data, is limited by data scale and annotation errors, and is difficult to generalize to different tasks and fields.

As shown in FIG. 3, a second type of method for generating interpreted text on data samples is unsupervised learning based on a large-scale pre-trained language model. Thanks to the generation capability of a large-scale pre-trained language model, without any training, a series of short sentences are generated by guiding the language model through combining a data sample and a special prompt text as input, relatively good quality candidate interpretation texts can be obtained without human labeling, then, in order to further improve the interpretation quality, scoring evaluation data of human beings aiming at interpretation texts with different qualities can be collected in advance, an automatic scoring device is trained based on a language model (such as RoBERTa or T5) related to natural language understanding, and a plurality of candidate interpretation texts generated by the language model are simply filtered and screened to select the interpretation text with the highest score by the automatic scoring device, but the retrieval method cannot deal with the situation that the quality of the candidate interpretation texts is poor.

As mentioned above, current work is still limited by high quality data and interpretation of the guidance training. Although the first type of method can learn and explain the characteristics of the text through training data, the method relies on high-quality labeling data, is limited by the problems that the data size is difficult to expand, labeling errors possibly exist and the like, and is difficult to generalize to different tasks and fields. Although the interpretation text generated by the large-scale pre-training language model used by the second class of method is mostly fluent and natural, the interpretation sentence generated by directly using the prompt text still has the problems of noise, inconsistent logic, redundant expression, difficult quality control and the like, and the back logic of the data sample cannot be completely interpreted, so that the interpretation sentence still has a gap from the natural language interpretation text with qualified specifications. Although the large-scale pre-trained language model relieves the dependence of manually-marked data, interpretation text directly generated by the language model without special training is difficult to directly adopt, and a certain distance still exists from high-quality interpretation satisfactory to human beings.

In the embodiments of the present disclosure, it is assumed that a qualified interpretation text should consider three components: data sample input, data sample output, and their relationship descriptions. In addition to grammar and facts, good interpretation text should also support statements in sample input and output and contain sufficient information to describe the relationships, so that one of the important features of a qualified interpretation text is sufficiency, i.e. contains sufficient information, to adequately describe the mapping knowledge from input to output. In addition, in embodiments of the present disclosure, it may be further considered that another feature of qualifying interpretation text is compactness, i.e., one data instance may be interpreted in a different manner, even for a different logical chain, but not all candidate interpretation texts have a high confidence or likelihood, i.e., interpretation texts need to be selectively generated for specific data sample inputs and data sample outputs. In general, the interpretation text should be coherent, concise, and selective, rather than listing all possible supporting evidence.

The present disclosure is based on the insight that an interpretation text generation method is provided, which considers that from the point of view of information theory, an interpretation text regeneration framework based on information bottlenecks is proposed, wherein rough candidate interpretation texts generated based on given data samples are further optimized to generate interpretation texts of higher quality.

Specifically, the method provided by the embodiment of the disclosure refines information related to the data sample from the candidate interpretation text by further optimizing the candidate interpretation text generated based on the data sample, and simultaneously optimizing an information bottleneck target and a language modeling target based on information compression and information correlation, and ignores information not related to the data sample, so as to train model parameters in a generated manner, thereby determining an interpretation text generation model capable of generating interpretation text with higher quality. The method of the embodiment of the disclosure can simultaneously optimize and complete the tasks of eliminating redundant irrelevant information of the interpretation text and retaining relevant information of the data sample, and can obtain the interpretation text with higher quality without label supervision according to the characteristics of the interpretation text without any high-quality interpretation text label.

Fig. 4 is a flowchart illustrating an interpreted text generation model training method 400 according to an embodiment of the present disclosure. Fig. 5 is a schematic flow diagram illustrating an interpretation text generation model training method according to an embodiment of the disclosure. As described above, text interpretations in natural language processing can be summarized into three categories, where free-text-form interpretations are more expressive and generally more readable, and thus free-text-form interpreted text generation is primarily considered in embodiments of the present disclosure to generate a fluent text description that aids humans in understanding the cause of model decisions.

In step 401, a data sample may be obtained and candidate interpretation text is obtained based on the data sample.

In an embodiment of the present disclosure, as shown in fig. 5, the input of the interpretation text generation model to be trained of the present disclosure may include data samples, while the output may include higher quality interpretation text after model optimization. Wherein, optionally, the input of the interpretation text generation model to be trained of the present disclosure may further comprise prompt text for prompting a large scale pre-training language model (PLM) to generate preliminary candidate interpretation text based on the data sample.

According to an embodiment of the present disclosure, the data sample may include word sequence concatenation of a data sample input and a data sample output. Alternatively, the data sample input and data sample output may be sentence pairs (e.g., question-answer pairs) having a particular relationship, and the interpretation text generation model of the present disclosure is used to generate the interpretation text describing the particular relationship.

As an example, in the present disclosure, any one data sample (e.g., a question-answer pair) may be represented by Y e Y, which may be a word sequence concatenation consisting of a data sample input m (e.g., a question) and a data sample output n (e.g., an answer), i.e., y= [ m; n ], wherein [; and represents a sequence splicing operation. For example, for a question-answer pair "what weather is likely to be late? "and" rainy days ", data sample input m of data sample may be" why weather is likely to be late? "and the data sample output n of the data sample may be" rainy days ", the sentences in the question-answer pair may be represented in the data sample as word sequences and sequence spliced. Further, in the present disclosure, the interpretation text generated by the interpretation text generation model of the present disclosure for describing the specific relationship between the data sample input m and the data sample output n in the data sample y may be represented by the interpretation text t.

As described above, the superior performance of existing large-scale pre-trained language models in text generation introduces a hinting-type generation task, i.e., high quality text can be generated without training the language model. The rise of the prompt generation task has prompted many efforts to generate interpreted text via a prompt model, i.e., to add prompt text as input to the PLM following the data sample to be interpreted, to generate interpreted text. In embodiments of the present disclosure, the data samples may be analyzed based on the PLM to generate preliminary rough candidate interpretation text, and final interpretation text with higher quality may be generated by further optimizing the candidate interpretation text.

According to an embodiment of the present disclosure, obtaining candidate interpretation text based on the data sample may include: generating a candidate interpreted text by a pre-trained language model based on the data sample, wherein the pre-trained language model may utilize at least one decoding method to obtain the candidate interpreted text, the at least one decoding method may include at least one of greedy decoding, random sampling decoding.

As shown in fig. 5, the PLM may generate preliminary candidate interpretation text based on the data sample and the hint text. Alternatively, the PLM may generate a plurality of candidate interpretation texts based on the data sample and the hint text to generate a final interpretation text based on the plurality of candidate interpretation texts, so as to avoid the problem that the quality of the final interpretation text is impaired due to poor effect of generating a single candidate interpretation text. In the embodiments of the present disclosure, described taking the processing of a single candidate interpretation text as an example, the case of a plurality of candidate interpretation texts can be similarly derived.

Alternatively, the candidate interpretation text to be optimized output by the PLM may be represented by X e X, which in this disclosure may be obtained by prompting the PLM with a prompt text (e.g., prompt text "because of (because)") based on the data sample y. For example, in PLM, for the input data sample y and the prompt text "house" (i.e., [ y; house ]), multiple candidate interpretation text candidates x may be generated by multiple decoding methods, for example, by 1 greedy decoding and 4 random sampling decoding, to obtain 5 preliminary interpretation text candidates x. The objective of the present disclosure is to generate a final interpretation text t of higher quality than the candidate interpretation text x, which may be a compressed version of x, for interpreting, supporting the data sample y, the relation between the data sample y, the candidate interpretation text x and the final interpretation text t will be described in detail later.

Of course, it should be understood that the above-described question-answer pairs are merely used as examples and not limitations in the present disclosure, and the methods of the present disclosure are not limited to the field and format of data sets, and that any method may be employed to optimize the preliminary candidate interpretation text thereof, as long as it is text-type data (e.g., the sentence pairs may also be, for example, natural language inference sentence pairs, etc.).

Next, in the interpretation text generation model training method of the present disclosure, parameters in the interpretation text generation model may be trained by generating by further optimizing the candidate interpretation text. As shown in fig. 5, for the candidate interpretation text, two part processing of information compression and information correlation may be included to generate an optimized compressed representation of information that requires a balance to be found between task related (generalized) information and task independent (compressed) information using the information bottleneck principle.

Alternatively, in addition to the PLM described above, the interpreted text generation model of the present disclosure may include three main components: the encoder, data sample encoder and decoder are explained as shown in fig. 5. Wherein the data sample encoder and the interpretation encoder may be adapted to encode candidate interpretation text generated by the data sample and the large scale pre-trained language model, respectively, into a high dimensional vector representation, and the interpretation decoder is adapted to decode the learned compressed representation of information into a discrete word sequence as the final interpretation text. In embodiments of the present disclosure, PLM, data sample encoder and interpretation encoder parameters in the trained interpretation text generation model are fixed and do not participate in parameter optimization during training in order to efficiently train and not lose language modeling capabilities of the pre-trained language model itself. Thus, the interpretation text generation model training of the present disclosure may primarily involve three tasks: information compression tasks, information-related tasks, and language modeling tasks.

Accordingly, in steps 402 and 403, a balance may be found between task independent (compressed) information and task dependent (generalized) information, to which the information compression task and the information dependent task respectively relate, based on the information bottleneck principle. Alternatively, assume that for variable y (e.g., a related question-answer pair, which may also be a sample of other text tasks (such as a sentence pair of a natural language inference task)), toAnd a variable x corresponding to the variable y (e.g., preliminary candidate interpretation text for question-answer pairs), the present disclosure is directed to learning a random map p from the variable x _θ (t|x) to derive the maximized compressed representation variable t from x, while maximizing the random mapping p _φ (y|t) such that t contains as much information as possible related to y. In this disclosure, t represents the ideal interpretation text of data sample y. Thus, given the candidate interpreted text variable x, the data sample variable y, and the distributions X, Y and T of the compressed representation variable T, the goal of the information bottleneck is to minimize the following equation:

where I (I) represents mutual information between the two distributions and β represents the Lagrangian multiplier. In embodiments of the present disclosure, the goal of the information bottleneck is to prevent T from containing too much redundant information about X (i.e., (X; T)), and at the same time encourage T to extract enough relevant information from X to predict Y (i.e., (Y; T)).

In step 402, an information compression process may be performed based on the candidate interpreted text to determine an information compression representation corresponding to the candidate interpreted text, and an information compression loss may be determined based on the candidate interpreted text and the information compression representation.

According to an embodiment of the present disclosure, performing information compression processing based on the candidate interpretation text to determine an information compression representation corresponding to the candidate interpretation text may include: encoding the candidate interpretation text to generate a candidate interpretation text vector; vector dimension reduction is performed on the generated candidate interpreted text vector to generate a corresponding compressed representation vector.

As described above, the candidate interpreted text may be encoded by an interpreted encoder in the interpreted text generation model to map it to a high-dimensional feature space, i.e., to generate a high-dimensional vector representation of the candidate interpreted text (candidate interpreted text vector). For example, the pre-trained language model GPT-2 (or other similar pre-trained language model) may be used as an interpretation encoder to sequence wordsx is encoded into a d-dimensional vector sequence (i.e., any one of the vectors in the vector sequence x). Next, the generated candidate interpreted text vectors may be vector reduced to compress the information in the candidate interpreted text to generate corresponding compressed representation vectors (e.g., for any one of the vectors in the vector sequence x- >The corresponding compressed representation vector is +.>) Such that the generated compressed representation vector has less information related to the candidate interpreted text vector. The information loss of the information compression processing from the candidate interpretation text vector to the generated compression representation vector is the information compression loss in the disclosure.

According to an embodiment of the disclosure, the candidate interpreted text vector may be subject to an interpretation candidate distribution, and the compressed representation vector may be subject to the compressed representation distribution. Wherein, alternatively, the interpretation candidate distribution may correspond to the distribution X of the candidate interpretation text variables X described above, and the compressed representation distribution corresponds to the distribution of the compressed representation vector, i.e. the distribution T of the compressed representation variables T.

According to an embodiment of the present disclosure, determining an information compression loss based on the candidate interpreted text and the information compression representation may include: determining mutual information between the candidate interpreted text vector and the compressed representation vector based on the interpreted candidate distribution and the compressed representation distribution; the information compression loss is determined based on mutual information between the candidate interpreted text vector and the compressed representation vector, which may be used to measure a correlation between the information compression representation and the candidate interpreted text.

As described above, the information compression loss in the present disclosure may be represented by mutual information I (X; T) between the candidate interpretation text and the information compression representation, which may represent a reduction in uncertainty of the information compression representation in case of known candidate interpretation text, which may be used to measure the correlation between the information compression representation and the candidate interpretation text, that is, the more the mutual information I (X; T) between the candidate interpretation text and the information compression representation is larger if the correlation between the information compression representation and the candidate interpretation text is stronger in case of known candidate interpretation text. In the interpretation text generation model training method of the present disclosure, however, it is desirable that the correlation between the candidate interpretation text and the information compression representation is as small as possible (i.e., the mutual information I (X; T) between the candidate interpretation text and the information compression representation is as small as possible) so that the information compression representation can contain less redundant information (or, in other words, information not related to the data sample) in the candidate interpretation text.

Alternatively, the mutual information I (X; T) between the candidate interpreted text and the compressed representation of information may be processed using a mutual information calculation formula in the information theory. For example, for any two random variables a and B (which obey the distributions A and B, respectively), their mutual information I (A; B) calculation formula can be expressed as follows:

Thus, I (X; T) can be expanded based on formula (2) above to yield:

wherein,express [ ·]Mathematical expectation (hereinafter, p) about variable x _θ (t|x) represents a mapping from the interpretation candidate distribution X to the compressed representation distribution T, θ being a parameter to be learned in the information compression process, the compressed representation vector T can be made as small as possible by minimizing the mutual information termIncluding information in the candidate interpreted text vector x.

Specifically, for any one vector in the candidate interpreted text vector sequence xBy performing vector dimension reduction, the corresponding compressed representation vector +.>Let t in the present disclosure _i ～p _θ (t|x) follow an isotropic Gaussian distribution +.>I.e. for the compressed representation vector t _i Can calculate the mean vector mu and the standard deviation vector sigma ² The following are provided:

μ＝W ₁ t _i +b ₁ (4)

log(σ ² )＝W ₂ t _i +b ₂ (5)

alternatively, in embodiments of the present disclosure, a heavy parameter trick may be used to map p from the map _θ (t|x) sample compression representation variable t for back propagation of parameters that optimize the interpretation text generation model.

Alternatively, for the case in formula (3) above(wherein x' represents the compressed information mapped to the same compressed representation vector t) _i Is a candidate interpreted text vector x of (1) _i Since (x ', t) is difficult to directly obtain, because it is difficult to traverse all candidate interpreted text vectors for a complete x', in embodiments of the present disclosure, a variation distribution +_ may be employed >Instead of p _θ (t) wherein μ 'and σ' ² Is a standard gaussian distribution of parameters to be learned.Thus, based on the variation distribution r _ψ (T) the upper bound of I (X; T) can be determined, namely:

where θ and ψ represent the parameter sets that need to be optimized at the information compression stage in the interpreted text generation model training process of the present disclosure, e.g., θ includes but is not limited to W in equations (4) and (5) above ₁ 、b ₁ 、W ₂ 、b ₂ Equal parameters, and ψ includes but is not limited to μ ' and σ ' described above ' ² And the like.

Therefore, by converting the minimization of the above-mentioned mutual information item I (X; T) to the minimization of the upper bound of the above-mentioned mutual information item I (X; T), the compressed representation vector T can likewise be made to contain as little information as possible in the candidate interpreted text vector X to remove redundant information in the candidate interpreted text that is not related to the data sample.

In step 403, information-related losses may be determined based on information-related between the information compression representation and the data sample.

According to an embodiment of the present disclosure, determining an information-related penalty based on an information-related between the information-compressed representation and the data sample may include: encoding the data samples to generate data sample vectors; and determining an information correlation loss based on mutual information between the compressed representation vector and the data sample vector, the information correlation loss being used to measure a correlation between the information compressed representation and the data sample.

As described above, the data samples may be encoded by a data sample encoder in the interpreted text generation model to map them to a high-dimensional feature space, i.e., to generate a high-dimensional vector representation of the data samples (i.e., a data sample vector). For example, a pre-trained language model GPT-2 (or other similar pre-trained language model) may be used as an interpretation encoder, similar to the interpretation encoder, to encode the word sequence y into a d-dimensional vector sequence (i.e., any one of the data sample vector sequences y)). Next, information correlation processing may be performed on the compressed representation vector and the data sample vector based on the compressed representation vector to determine a correlation between the compressed representation vector and the data sample vector, such that the generated compressed representation vector has more information related to the data sample vector by maximizing its correlation. The information loss between the data sample vector and the generated compressed representation vector is the information related loss in the disclosure.

Alternatively, the data sample vector may follow the distribution Y of the data sample variables Y described above. As described above, the information-related loss in the present disclosure may be represented by mutual information I (Y; T) between the information compression representation and the data sample, which may represent a reduction in uncertainty of the information compression representation in case of a known data sample, which may be used to measure the correlation between the information compression representation and the data sample, that is, the more the mutual information I (Y; T) between the information compression representation and the data sample is larger if the correlation between the information compression representation and the data sample is stronger in case of a known data sample. In the interpretation text generation model training method of the present disclosure, it is desirable that the correlation between the information compression representation and the data sample is as large as possible (i.e., the mutual information I (Y; T) between the information compression representation and the data sample is as large as possible) so that the information compression representation can contain more information related to the data sample in the candidate interpretation text.

Alternatively, the mutual information I (Y; T) between the information compression representation and the data samples may be similarly processed using mutual information calculation formulas in the information theory. For example, I (Y; T) can be expanded based on formula (2) above to yield:

where γ represents a parameter to be learned of the information-related processing section in the model training process, and p (y) is a constant independent of the parameter γ, and is therefore negligible in the model training process. Thus, by maximizing the mutual information item I (Y; T) it is possible to make the compressed representation vector T contain as much information as possible in the candidate interpreted text vector x that is relevant to the data sample vector Y.

Optionally, in an embodiment of the disclosure, forIt is difficult to traverse all candidate interpreted text vectors x to maximize this term, so the interpreted text generation model training method of the present disclosure employs a variation distribution q _φ (y|t) to approximate p _γ (y|t). Alternatively, in embodiments of the present disclosure, a language model may be used as q _φ To predict the data sample vector y based on the compressed representation vector t. Thus, the lower bound of I (Y; T) can be determined by the method described above:

where γ and φ represent parameter sets that need to be optimized at the information-related stage in the interpretation text generation model training process of the present disclosure.

Therefore, by converting the maximization of the above-mentioned mutual information item I (Y; T) to the maximization of the lower bound of the above-mentioned mutual information item I (Y; T), the compressed representation vector T can likewise be made to contain as much as possible of the information related to the data sample in the candidate interpreted text vector x, in order to increase the correlation between the generated interpreted text and the original data sample.

As described above, the processing is performed in steps 402 and 403 for the information compression task and the information related task based on the information bottleneck principle, wherein the generated information compression representation needs to contain as much information related to the original data sample in the candidate interpretation text as possible on the one hand and as little information in the candidate interpretation text as possible on the other hand, and a balance is found between the task related (generalized) information and the task independent (compressed) information by optimizing and completing the task of interpreting the text de-redundancy independent information and the data sample related information simultaneously based on the optimization processing of equation (1).

In addition to the information bottleneck objectives described above, the interpretation text generation model training method of the present disclosure also takes the language modeling objective as another optimization objective, so that the final interpretation text output by the interpretation text generation model can retain a pattern (e.g., a line style, etc.) similar to the original candidate interpretation text to some extent.

In step 404, a final interpretation text may be obtained by performing a decoding process on the information compression representation, and language modeling loss may be determined based on the candidate interpretation text and the final interpretation text.

According to embodiments of the present disclosure, the final interpretation text may be used to interpret the relationship between the data sample input and the data sample output. As described above, the interpretation text generation model of the present disclosure is used to interpret an input sentence pair, and the final interpretation text output thereof may be descriptive text of a specific relationship between the input sentence pair (including data sample input and data sample output).

According to an embodiment of the present disclosure, the interpretation text generation model may comprise an interpretation decoder for obtaining a final interpretation text by performing a decoding process on the information compression representation. Alternatively, the decoding process of the compressed representation of information may be performed by an interpretation decoder in the interpretation text generation model of the present disclosure, as shown in FIG. 5, to generate final interpretation text from the compressed representation vector.

According to an embodiment of the present disclosure, determining a language modeling penalty based on the candidate interpretation text and the final interpretation text may include: determining the language modeling penalty for measuring similarity between the final interpretation text and the candidate interpretation text based on cross entropy penalty of the candidate interpretation text and the final interpretation text.

Optionally, in order to generate the final interpretation text after the information bottleneck optimization, a pre-training language model (e.g., pre-training language model GPT-2) may be optimized by cross entropy loss in the interpretation text generation model training method of the present disclosure as an interpretation decoder in the interpretation text generation model.

Specifically, a lossy version of the candidate interpretation text may be generated from the compressed representation vector sequence t, i.e. the final interpretation text t after the information screening, i.e.:

p _δ (t|t)＝LM(t|t) (9)

where δ represents parameters of an interpretation decoder that need to be optimized at the information decoding stage in the interpretation text generation model training process of the present disclosure, LM represents the processing of the interpretation decoder, which converts the compressed representation vector sequence t into the final interpretation text t, x represents the candidate interpretation text vector sequence,representing language modeling penalty.

Alternatively, the language modeling penalty may be a loss of information between the generated final interpretation text and the original candidate interpretation text. By minimizing this language modeling penalty, it is possible to make the generated final interpretation text as similar as possible to the candidate interpretation text after the information filtering in the candidate interpretation text is completed, thereby enabling the final interpretation text to retain a pattern (e.g., a line style, etc.) similar to the original candidate interpretation text to some extent. For example, the language modeling penalty may be a cross entropy penalty of the candidate interpreted text and the final interpreted text, as shown in equation (10) above.

Alternatively, in embodiments of the present disclosure, in order not to lose the language modeling capability of the pre-trained language model itself, an efficient prefix optimization approach may be employed, i.e., the compressed representation vector sequence t is input as a prefix to the interpretation decoder (i.e., the prefix store of the pre-trained language model), and in each decoding step in the interpretation decoder, the prefix store of the pre-trained language model is weighted summed for decoding the target interpretation text t.

It should be appreciated that while the various encoders and decoders included in the interpreted text generation model of the present disclosure are each given by way of example of a GPT-2 language model, the present disclosure is not particularly limited to a particular language model based on deep neural networks, and thus other efficient language model structures (e.g., GPT-3 or OPT, etc.) may be employed by these encoders and decoders. Therefore, the language model structure adopted by the encoder and the decoder can be expanded or simplified according to the limit of the memory occupation and the requirement of the language modeling performance in practical application.

As described above, according to embodiments of the present disclosure, the interpreted text generation model may include undetermined parameters related to the information compression loss, the information-related loss, and the language modeling loss. That is, in the interpreted text generation model training process, model parameters (e.g., parameters involved in the processing section corresponding to gray filling in fig. 5) may be optimized based on information loss in the information compression task, the information-related task, and the language modeling task, thereby determining the interpreted text generation model.

Next, in step 405, the interpreted text generation model may be determined based on the information compression loss, the information-related loss, and the language modeling loss.

According to an embodiment of the present disclosure, determining the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss may include: determining the undetermined parameters of the interpreted text generation model based on joint optimization of the information compression loss, the information-related loss, and the language modeling loss to determine the interpreted text generation model. Alternatively, the entire interpreted text generation model may be co-optimized in conjunction with the loss functions of the various stages involved in the interpreted text generation model training process of the present disclosure, such that the finally determined interpreted text generation model is capable of simultaneously optimizing the above-described information bottleneck objectives and language modeling objectives.

According to an embodiment of the present disclosure, determining the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss may include: establishing a joint loss function of the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss, wherein the joint loss function may be positively correlated with the information compression loss and the language modeling loss, and the joint loss function may be negatively correlated with the information-related loss; and determining the undetermined parameters of the interpreted text generation model by optimizing the joint loss function to determine the interpreted text generation model.

Alternatively, the joint optimization of the information compression loss, the information-related loss, and the language modeling loss may be based on a joint loss function of the information compression loss, the information-related loss, and the language modeling loss. That is, parameters to be optimized in the interpretation text generation model are determined by simultaneously optimizing a plurality of targets (for example, the above-described information bottleneck target and language modeling target).

Alternatively, because the interpretation text generation model training method of the present disclosure aims at generating interpretation text with higher quality, where less redundant information and more data sample related information in the original candidate interpretation text need to be included, and the generated final interpretation text also needs to remain similar to the candidate interpretation text to some extent, in embodiments of the present disclosure, the established joint loss function may be positively correlated with the information compression loss and the language modeling loss, and negatively correlated with the information correlation loss.

For example, based on the information bottleneck loss function and language modeling loss of formulas (1) and (10), the joint loss function of the interpretation text generation model of the present disclosure can be expressed as:

Wherein the joint lossThe loss function involves 3 optimization terms, corresponding to an information compression task (I (X; T)), an information dependent task (I (Y; T)), and a language modeling task, respectively

Of course, it should be understood that the representation of the joint loss function shown in equation (11) above is used in this disclosure by way of example only and not limitation, and that other representations that can achieve similar optimization objectives can be used in the interpretation text generation model training method of this disclosure as well.

Thus, by performing the optimization process on the above-described joint loss function, the undetermined parameters of the interpretation text generation model of the present disclosure can be determined, thereby determining the interpretation text generation model for which the final training is completed. Wherein it should be appreciated that in the training process of the interpretation text generation model, equation (11) above may be optimized based on a large number of data samples and their corresponding candidate interpretation text.

As described above, by the interpretation text generation model training method of the present disclosure, tasks of interpretation text redundancy-free irrelevant information and retention data sample relevant information can be optimized and completed simultaneously. In the method for training the interpretation text generation model, from the aspect of information theory, an interpretation text regeneration framework based on an information bottleneck is provided, for a rough candidate interpretation text which is preliminarily generated by a pre-training language model for a given data sample, information related to the data sample in the candidate interpretation text is extracted by simultaneously optimizing an information bottleneck target and a language modeling target, information not related to the data sample is ignored, and model parameters are trained in a generation mode, so that the interpretation text with higher quality can be generated.

It can be seen that the interpretation text generation model training method avoids the dependence of the traditional supervised training on large-scale manual annotation data, does not need any interpretation text with high quality, but obtains a preliminary interpretation text by using a pre-trained language model to infer based on a data sample and a prompt text, and filters irrelevant information and retains relevant information by using an information bottleneck theory according to the characteristics of the interpretation text, thereby obtaining the interpretation text with higher quality without supervision, and solving the problems of high cost, difficulty in unification of quality and the like of manual annotation.

The interpretation text generation model training method overcomes the situation that the candidate interpretation texts are difficult to search due to poor quality in the method for directly classifying and selecting the inferred results of the pre-training language model, the training model generation mode helps the model not to be limited to a single candidate interpretation text, and the encoder and decoder stored parameters used by the interpretation text generation model infer how to perform operations such as adding, deleting, modifying and checking on the candidate interpretation texts through optimized learning of a large amount of data, so that the generated interpretation text generation model is more flexible and generalized.

Fig. 6 is a flowchart illustrating an interpretation text generation method 600, according to an embodiment of the present disclosure. Wherein, as shown in fig. 6, in step 601, a data instance may be acquired; in step 602, final interpreted text may be generated by an interpreted text generation model based on the data instance; wherein the interpreted text generation model may be trained in accordance with the method as described with reference to steps 201-205 of fig. 2.

As described above, the interpreted text generation method 600 is a specific application to the interpreted text generation model generated by the interpreted text generation model training method 400, in which, according to any one given data instance and its corresponding generated candidate interpreted text, it can be encoded into a high-dimensional vector representation by the data instance encoder and the interpreted encoder, respectively, and then a higher quality final interpreted text can be generated using the interpreted decoder. The interpretation text generation using the interpretation text generation method 600 is very important for trusted artificial intelligence, and not only can help people understand the reasons behind machine behavior decisions in a readable manner, but also can help people enhance data according to interpretation logic and content, help machines learn from the data better, and improve the interpretability and other performances of the machines in the text generation field.

It should be appreciated that the interpretation text generation method 600 of the present disclosure is not limited to the field and format of the input data samples, and that the method of the present disclosure may be employed to optimize the preliminary candidate interpretation text thereof as long as it is text-type data. The interpretation text generation method can be regarded as a text generation framework with enhanced information bottleneck, and can also be applied to other unsupervised text generation fields, such as abstract generation. In general, the interpretation text generation method disclosed by the disclosure is a relatively general and generalized framework, can be applied to items and product applications of various text generation types including an intelligent writing system, an intelligent dialogue system, an intelligent question-answering system and the like, and can generate user-readable interpretations as supporting information while generating texts, so that the confidence of a model can be improved, the performance of the model can be further improved according to the content of the interpretation text, and the user experience is improved. For example, in a typical application scenario (such as an intelligent authoring system), the method 600 for generating an interpretation text of the present disclosure may generate back logic and supporting information (e.g., a reference source of a generated sentence pattern, an effect of a generated sentence content, etc.) for writing the sentence while generating each sentence, so as to help a user learn authoring according to the interpretation logic, and the user may also purposefully modify a sentence generated by a machine, so that the reliability and interactivity of the intelligent authoring system can be effectively improved, and since the quality of interpretation can also reflect the generation performance of the system, a developer may continue to perfect the generation performance of the authoring system according to the interpretation logic.

Fig. 7 is a schematic diagram illustrating an interpreted text generation model training apparatus 700 according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the interpretation text generating apparatus 700 may include a data acquisition module 701, an information compression module 702, an information correlation module 703, an information decoding module 704, and a model determination module 705.

The data acquisition module 701 may be configured to acquire a data sample and obtain candidate interpretation text based on the data sample. Alternatively, the data acquisition module 701 may perform the operations described above with reference to step 201.

In embodiments of the present disclosure, the input of the interpretation text generation model to be trained of the present disclosure may include data samples, while the output may include higher quality interpretation text after model optimization. Wherein, optionally, the input of the interpretation text generation model to be trained of the present disclosure may further comprise prompt text for prompting a large scale pre-training language model (PLM) to generate preliminary candidate interpretation text based on the data sample. Alternatively, the data samples may include word sequence concatenation of data sample inputs and data sample outputs. Alternatively, the data sample input and data sample output may be sentence pairs (e.g., question-answer pairs) having a particular relationship, and the interpretation text generation model of the present disclosure is used to generate the interpretation text describing the particular relationship.

Optionally, in embodiments of the present disclosure, the data samples may be analyzed based on the PLM to generate preliminary rough candidate interpretation text, and final interpretation text with higher quality may be generated by further optimizing the candidate interpretation text. For example, the PLM may generate a plurality of candidate interpretation texts based on the data sample and the hint text to generate a final interpretation text based on the plurality of candidate interpretation texts, so as to avoid the problem that the quality of the final interpretation text is impaired due to poor generation of a single candidate interpretation text. In the embodiments of the present disclosure, described taking the processing of a single candidate interpretation text as an example, the case of a plurality of candidate interpretation texts can be similarly derived.

Next, in the interpretation text generation model training method of the present disclosure, parameters in the interpretation text generation model may be trained by generating by further optimizing the candidate interpretation text. For the candidate interpretation text, two part processing of information compression and information correlation may be included, which is performed by the information compression module 702 and the information correlation module 703, respectively, to generate an optimized information compression representation, the generation of which requires that a balance be found between task related (generalized) information and task independent (compressed) information using the information bottleneck principle.

The information compression module 702 may be configured to perform an information compression process based on the candidate interpretation text to determine an information compression representation corresponding to the candidate interpretation text, and to determine an information compression loss based on the candidate interpretation text and the information compression representation. Alternatively, the information compression module 702 may perform the operations described above with reference to step 202.

Alternatively, in the information compression module 702, the candidate interpreted text may be encoded by an interpreted encoder in the interpreted text generation model to map it to a high-dimensional feature space, i.e., to generate a high-dimensional vector representation of the candidate interpreted text (candidate interpreted text vector). Next, vector dimension reduction may be performed on the generated candidate interpreted text vectors to compress the information in the candidate interpreted text to generate corresponding compressed representation vectors such that fewer information related to the candidate interpreted text vectors are included in the generated compressed representation vectors. The information loss of the information compression processing from the candidate interpretation text vector to the generated compression representation vector is the information compression loss in the disclosure.

Alternatively, the information compression loss in the present disclosure may be represented by mutual information between the candidate interpretation text and the information compression representation, which may represent a reduction in uncertainty of the information compression representation in case of known candidate interpretation text, which may be used to measure the correlation between the information compression representation and the candidate interpretation text, that is, in case of known candidate interpretation text, if the correlation between the information compression representation and the candidate interpretation text is stronger, the more the reduction in uncertainty of the information compression representation is, the larger the mutual information between the candidate interpretation text and the information compression representation is. In the interpretation text generation model training method of the present disclosure, however, it is desirable that the correlation between the candidate interpretation text and the information compression representation is as small as possible (i.e., the mutual information between the candidate interpretation text and the information compression representation is as small as possible) so that the information compression representation can contain less redundant information (or, stated differently, information not related to the data sample) in the candidate interpretation text.

The information correlation module 703 may be configured to determine an information correlation loss based on an information correlation between the information compressed representation and the data sample. Alternatively, the information correlation module 703 may perform the operations described above with reference to step 203.

Alternatively, the data samples may be encoded by a data sample encoder in the interpreted text generation model to map them to a high-dimensional feature space, i.e., to generate a high-dimensional vector representation of the data samples (i.e., a data sample vector). Next, information correlation processing may be performed on the compressed representation vector and the data sample vector based on the compressed representation vector to determine a correlation between the compressed representation vector and the data sample vector, such that the generated compressed representation vector has more information related to the data sample vector by maximizing its correlation. The information loss between the data sample vector and the generated compressed representation vector is the information related loss in the disclosure.

Alternatively, the information-related loss in the present disclosure may be represented by mutual information between the information compression representation and the data sample, which may represent a reduction in uncertainty of the information compression representation in case of a known data sample, which may be used to measure the correlation between the information compression representation and the data sample, that is, in case of a known data sample, the more the correlation between the information compression representation and the data sample is, the more the reduction in uncertainty of the information compression representation is, the more the mutual information between the information compression representation and the data sample is. In the interpretation text generation model training method of the present disclosure, it is desirable that the correlation between the information compression representation and the data sample is as large as possible (i.e., the mutual information between the information compression representation and the data sample is as large as possible) so that the information compression representation can contain more information related to the data sample in the candidate interpretation text.

As described above, by processing the information compression task and the information correlation task based on the information bottleneck principle in the information compression module 702 and the information correlation module 703, wherein the generated information compression representation needs to contain as much information related to the original data sample in the candidate interpretation text as possible on the one hand and as little information in the candidate interpretation text as possible on the other hand, by such optimization processing, it is possible to optimize and complete the interpretation text redundancy-removing irrelevant information and the task of retaining the data sample-related information at the same time, finding a balance between the task-related (generalized) information and the task-unrelated (compressed) information.

The information decoding module 704 may be configured to obtain a final interpretation text by performing a decoding process on the information compression representation, and determine a language modeling penalty based on the candidate interpretation text and the final interpretation text. Alternatively, the information decoding module 704 may perform the operations described above with reference to step 204.

As described above, the interpretation text generation model of the present disclosure is used to interpret an input sentence pair, and the final interpretation text output thereof may be descriptive text of a specific relationship between the input sentence pair (including data sample input and data sample output). Alternatively, the decoding process of the compressed representation of information may be performed in the information decoding module 704 by an interpretation decoder in the interpretation text generation model of the present disclosure to generate final interpretation text from the compressed representation vector.

Alternatively, the language modeling penalty may be a loss of information between the generated final interpretation text and the original candidate interpretation text. By minimizing this language modeling penalty, it is possible to make the generated final interpretation text as similar as possible to the candidate interpretation text after the information filtering in the candidate interpretation text is completed, thereby enabling the final interpretation text to retain a pattern (e.g., a line style, etc.) similar to the original candidate interpretation text to some extent. For example, the language modeling penalty may be a cross entropy penalty of the candidate interpreted text and the final interpreted text.

As described above, according to embodiments of the present disclosure, the interpreted text generation model may include undetermined parameters related to the information compression loss, the information-related loss, and the language modeling loss. That is, in the interpreted text generation model training process, model parameters may be optimized based on information loss in the information compression task, the information-related task, and the language modeling task, thereby determining the interpreted text generation model.

The model determination module 705 may be configured to determine the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss. Alternatively, the model determination module 705 may perform the operations described above with reference to step 205.

Alternatively, the entire interpreted text generation model may be co-optimized in conjunction with the loss functions of the various stages involved in the interpreted text generation model training process of the present disclosure, such that the finally determined interpreted text generation model is capable of simultaneously optimizing the above-described information bottleneck objectives and language modeling objectives. For example, the joint optimization of the information compression loss, the information-related loss, and the language modeling loss may be based on a joint loss function of the information compression loss, the information-related loss, and the language modeling loss. That is, parameters to be optimized in the interpretation text generation model are determined by simultaneously optimizing a plurality of targets (for example, the above-described information bottleneck target and language modeling target).

Thus, by performing the optimization process on the above-described joint loss function, the undetermined parameters of the interpretation text generation model of the present disclosure can be determined, thereby determining the interpretation text generation model for which the final training is completed. Wherein it should be appreciated that the above joint loss function may be optimized from a large number of data samples and their corresponding candidate solution text during the training of the solution text generation model.

As described above, by the interpretation text generation model training apparatus of the present disclosure, tasks of interpretation text redundancy-removing irrelevant information and data sample-preserving relevant information can be simultaneously optimized and completed, and an interpretation text generation model capable of generating a higher quality interpretation text can be trained.

According to still another aspect of the present disclosure, there is also provided an interpretation text generating apparatus. Fig. 8 shows a schematic diagram of an explanatory-text generating device 2000 according to an embodiment of the present disclosure.

As shown in fig. 8, the interpreted text generating device 2000 may include one or more processors 2010, and one or more memories 2020. Wherein said memory 2020 has stored therein computer readable code which, when executed by said one or more processors 2010, can perform an interpretation text generation method as described above.

The processor in embodiments of the present disclosure may be an integrated circuit chip having signal processing capabilities. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and may be of the X86 architecture or ARM architecture.

In general, the various example embodiments of the disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of the embodiments of the present disclosure are illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

For example, a method or apparatus according to embodiments of the present disclosure may also be implemented by means of the architecture of computing device 3000 shown in fig. 9. As shown in fig. 9, computing device 3000 may include a bus 3010, one or more CPUs 3020, a Read Only Memory (ROM) 3030, a Random Access Memory (RAM) 3040, a communication port 3050 connected to a network, an input/output component 3060, a hard disk 3070, and the like. A storage device in the computing device 3000, such as a ROM 3030 or a hard disk 3070, may store various data or files for explaining the processing and/or communication use of the text generation method provided by the present disclosure and program instructions executed by the CPU. The computing device 3000 may also include a user interface 3080. Of course, the architecture shown in FIG. 8 is merely exemplary, and one or more components of the computing device shown in FIG. 9 may be omitted as may be practical in implementing different devices.

According to yet another aspect of the present disclosure, a computer-readable storage medium is also provided. Fig. 10 shows a schematic diagram 4000 of a storage medium according to the present disclosure.

As shown in fig. 10, the computer storage medium 4020 has stored thereon computer readable instructions 4010. When the computer-readable instructions 4010 are executed by the processor, an interpretation text generation method according to an embodiment of the present disclosure described with reference to the above figures can be performed. The computer readable storage medium in embodiments of the present disclosure may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (ddr SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link Dynamic Random Access Memory (SLDRAM), and direct memory bus random access memory (DRRAM). It should be noted that the memory of the methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory. It should be noted that the memory of the methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

Embodiments of the present disclosure also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. A processor of a computer device reads the computer instructions from a computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs an interpretation text generating method according to an embodiment of the present disclosure.

It is noted that the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises at least one executable instruction for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The exemplary embodiments of the present disclosure described in detail above are illustrative only and are not limiting. Those skilled in the art will understand that various modifications and combinations of these embodiments or features thereof may be made without departing from the principles and spirit of the disclosure, and such modifications should fall within the scope of the disclosure.

Claims

1. An interpreted text generation model training method, comprising:

acquiring a data sample, and acquiring a candidate interpretation text based on the data sample;

performing information compression processing based on the candidate interpretation text to determine an information compression representation corresponding to the candidate interpretation text, and determining an information compression loss based on the candidate interpretation text and the information compression representation;

determining an information-dependent loss based on information-dependent between the information compression representation and the data sample;

obtaining a final interpretation text by decoding the information compression representation, and determining language modeling loss based on the candidate interpretation text and the final interpretation text; and

determining the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss.

2. The method of claim 1, wherein the interpreted text generation model includes undetermined parameters related to the information compression loss, the information-related loss, and the language modeling loss;

wherein determining the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss comprises:

determining the undetermined parameters of the interpreted text generation model based on joint optimization of the information compression loss, the information-related loss, and the language modeling loss to determine the interpreted text generation model.

3. The method of claim 2, wherein performing information compression processing based on the candidate interpretation text to determine an information compression representation corresponding to the candidate interpretation text comprises:

encoding the candidate interpretation text to generate a candidate interpretation text vector;

vector dimension reduction is performed on the generated candidate interpreted text vector to generate a corresponding compressed representation vector.

4. A method according to claim 3, wherein the candidate interpreted text vector obeys an interpreted candidate distribution, and the compressed representation vector obeys the compressed representation distribution;

Wherein determining an information compression loss based on the candidate interpreted text and the information compression representation comprises:

determining mutual information between the candidate interpreted text vector and the compressed representation vector based on the interpreted candidate distribution and the compressed representation distribution;

the information compression loss is determined based on mutual information between the candidate interpreted text vector and the compressed representation vector, the information compression loss being used to measure a correlation between the information compression representation and the candidate interpreted text.

5. The method of claim 3, wherein determining an information-bearing loss based on an information-bearing between the information-compressed representation and the data sample comprises:

encoding the data samples to generate data sample vectors; and

an information correlation loss is determined based on mutual information between the compressed representation vector and the data sample vector, the information correlation loss being used to measure a correlation between the information compressed representation and the data sample.

6. The method of claim 2, wherein the interpreted text generation model comprises an interpreted decoder for obtaining a final interpreted text by decoding the information compression representation;

Wherein determining a language modeling penalty based on the candidate interpreted text and the final interpreted text comprises:

determining the language modeling penalty for measuring similarity between the final interpretation text and the candidate interpretation text based on cross entropy penalty of the candidate interpretation text and the final interpretation text.

7. The method of claim 2, wherein determining the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss comprises:

establishing a joint loss function of the interpreted text generation model based on the information compression loss, the information related loss, and the language modeling loss, wherein the joint loss function is positively correlated with the information compression loss and the language modeling loss, and the joint loss function is negatively correlated with the information related loss; and

determining the undetermined parameters of the interpreted text generation model by optimizing the joint loss function to determine the interpreted text generation model.

8. The method of claim 1, wherein the data sample comprises a word sequence concatenation of a data sample input and a data sample output, the final interpretation text being used to interpret a relationship between the data sample input and the data sample output.

9. The method of claim 8, wherein obtaining candidate interpretation text based on the data sample comprises:

generating a candidate interpretation text by a pre-training language model based on the data sample, wherein the pre-training language model utilizes at least one decoding method to obtain the candidate interpretation text, and the at least one decoding method comprises at least one of greedy decoding and random sampling decoding.

10. An interpretation text generation method, comprising:

acquiring a data instance; and

generating a final interpretation text by an interpretation text generation model based on the data instance;

wherein the interpreted text generation model is trained in accordance with the method of claim 1.

11. An interpreted text generation model training apparatus comprising:

a data acquisition module configured to acquire a data sample and obtain a candidate interpretation text based on the data sample;

an information compression module configured to perform information compression processing based on the candidate interpretation text to determine an information compression representation corresponding to the candidate interpretation text, and to determine an information compression loss based on the candidate interpretation text and the information compression representation;

An information correlation module configured to determine an information correlation loss based on information correlation between the information compression representation and the data sample;

an information decoding module configured to obtain a final interpretation text by performing decoding processing on the information compression representation, and determine a language modeling loss based on the candidate interpretation text and the final interpretation text; and

a model determination module configured to determine the interpreted text generation model based on the information compression loss, the information-related loss, and the language modeling loss.

12. The apparatus of claim 11, wherein the interpreted text generation model includes undetermined parameters related to the information compression loss, the information-related loss, and the language modeling loss;

13. An interpreted text generating apparatus comprising:

one or more processors; and

one or more memories in which a computer executable program is stored which, when executed by the processor, performs the method of any of claims 1-10.

14. A computer program product stored on a computer readable storage medium and comprising computer instructions which, when executed by a processor, cause a computer device to perform the method of any of claims 1-10.

15. A computer readable storage medium having stored thereon computer executable instructions for implementing the method of any of claims 1-10 when executed by a processor.