CN116187324B

CN116187324B - Method, system and medium for generating cross-language abstract for long text of source language

Info

Publication number: CN116187324B
Application number: CN202310474999.2A
Authority: CN
Inventors: 陈雨龙; 白雪峰; 张岳
Original assignee: Westlake University
Current assignee: Westlake University
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-08-22
Anticipated expiration: 2043-04-28
Also published as: CN116187324A

Abstract

The application relates to a method, a system and a medium for generating a cross-language abstract for a long text of a source language. The method comprises the steps that a processor receives long text of a source language to be summarized, which is input by a user; generating a source language abstract for display using the trained first learning model based on the received long text of the source language; and generating a target language abstract by using a trained second learning model based on the long text of the source language in combination with the generated source language abstract. The method enables the pre-generated source language abstract data and the long text of the source language to support each other, and can obtain the target language abstract which meets the requirements of users more accurately, is more relevant to the long text of the source language and has higher fidelity.

Description

Method, system and medium for generating cross-language abstract for long text of source language

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a method, system, and medium for generating a cross-language abstract for a long text in a source language.

Background

Given a long text in a source language, a cross-language summarization system aims at generating a concise summary text in a specific target language that expresses important content of the original text. The existing cross-language abstract system mainly comprises two types: pipeline method (Pipeline method) and End-to-End method (End-to-End method).

The pipeline method decomposes the cross-language abstract into two independent subtasks of a single-language abstract and machine translation, and the method respectively trains 2 models, namely: a whisper abstract model and a machine translation model. Given a source language text to be abstracted as input, the pipeline method firstly uses a source language single language abstract model to generate a source language abstract, and then uses a machine translation model to translate the generated source language abstract to obtain a target language abstract text. Such pipeline methods are typically affected by error propagation, such as: the abstract model generates an abstract containing errors, the abstract containing errors is input into the translation model as input text, and the errors are then saved in the translated abstract, and translation errors at other positions are caused. In addition, since the translation model only takes the abstract of the source language as input, the input information of the text to be abstracted of the source language cannot be directly obtained, and finally the translated sentence contains the phenomenon that the meaning of the text to be abstracted of the source language is inconsistent with that of the text to be abstracted of the source language.

The existing end-to-end method takes a source language text to be abstracted (input) and a target language abstract (output) as training samples to train a model, and an end-to-end system obtained by training is a black box model, namely: in the reasoning process, the trained end-to-end system takes the text to be abstracted in the source language as a single input, directly outputs the target language abstract, and the middle process of the system executing the task of cross-language abstract cannot be understood, can not be controlled and can not be interfered, so even if the model of the system possibly utilizes the source language abstract data to a certain extent in the multi-task training process, the reasoning effect of the system is often poor because the source language abstract corresponding to the text to be abstracted is not utilized in the model application process.

Disclosure of Invention

The present application has been made to solve the above-mentioned problems occurring in the prior art. The present application is intended to provide a method, system and medium for generating a cross-language summary for long text in a source language, which is capable of providing a cross-language summary that is smoother, more accurate, more relevant to the long text in the source language, more faithful, and conforms to the language style desired by the user, as desired by the user, given the long text in the source language, and improving the understandability and interpretability of the cross-language summary process for the user (especially for ordinary users who are non-algorithmic experts).

According to a first aspect of the present application, there is provided a method of generating a cross-language summary for a long text in a source language, comprising receiving, by a processor, a long text in the source language in which the summary is to be generated, input by a user; generating, by the processor, a source language summary for display using the trained first learning model based on the received long text of the source language; and generating, by the processor, a target language digest using the trained second learning model based on the source language digest generated from the long text of the source language.

According to a second aspect of the present application, there is provided a system for generating a cross-language summary for long text in a source language, the system comprising an interface configured to: and acquiring long texts of the source language to be summarized, which are input by the user. The system further includes a processor configured to perform the steps of the method of generating a cross-language summary for long text in a source language according to various embodiments of the application. The system also includes a display configured to display the generated target language abstract or to display the generated source language abstract and the generated target language abstract against.

According to a third aspect of the present application, there is provided a non-transitory computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement the steps of a method of generating a cross-language summary for long text in a source language according to various embodiments of the present application.

According to the method, the system and the medium for generating the cross-language abstract for the long text of the source language, when the cross-language abstract is generated for the long text of the source language, the single-language abstract model is firstly utilized to generate the source language abstract and is used as the input of the cross-language abstract model together with the long text of the source language, so that the generated source language abstract data and the long text of the source language support each other, and more accurate and more relevant target language abstracts with higher fidelity can be obtained, wherein the target language abstracts are more required by users and are more relevant to the long text of the source language than the pipeline or end-to-end cross-language abstract methods in the prior art. Further, the source language abstract is generated for display, so that a user can obtain the source language abstract as intermediate information, and the understandability and the interpretability of the cross-language abstract processing result of the target language abstract are improved accordingly.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application, as claimed.

Drawings

In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. The accompanying drawings illustrate various embodiments by way of example in general and not by way of limitation, and together with the description and claims serve to explain the disclosed embodiments. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. Such embodiments are illustrative and not intended to be exhaustive or exclusive of the present apparatus or method.

FIG. 1 illustrates a flow diagram of a method of generating a cross-language summary for long text in a source language in accordance with an embodiment of the present application.

FIG. 2 illustrates another flow diagram of a method of generating a cross-language summary for long text in a source language in accordance with an embodiment of the application.

FIG. 3 is a flow chart of a verification of a source language digest against a target language digest according to an embodiment of the application.

FIG. 4 illustrates a schematic diagram of generating a source language abstract using a trained first learning model, according to an embodiment of the application.

Fig. 5 shows a schematic diagram of a second learning model employing a word-level concatenation method according to an embodiment of the present application.

Fig. 6 shows a schematic diagram of a second learning model employing a vector level stitching method according to an embodiment of the present application.

Fig. 7 shows a typical structural schematic of an encoder-decoder model in the first learning model and the second learning model according to an embodiment of the present application.

FIG. 8 shows a schematic diagram of performance analysis of a method according to an embodiment of the application on different data sets.

FIG. 9 illustrates a partial block diagram of a system for generating a cross-language summary for long text in a source language in accordance with an embodiment of the application.

Detailed Description

The present application will be described in detail below with reference to the drawings and detailed description to enable those skilled in the art to better understand the technical aspects of the present application. Embodiments of the present application will be described in further detail below with reference to the drawings and specific examples, but not by way of limitation.

The terms "first," "second," and the like, as used herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises" and the like means that elements preceding the word encompass the elements recited after the word, and not exclude the possibility of also encompassing other elements. The order in which the steps of the methods described in connection with the figures are performed is not intended to be limiting. As long as the logical relationship between the steps is not affected, several steps may be integrated into a single step, the single step may be decomposed into multiple steps, or the execution order of the steps may be exchanged according to specific requirements.

It should also be understood that the term "and/or" in the present application is merely an association relationship describing the associated object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In the present application, the character "/" generally indicates that the front and rear related objects are an or relationship.

FIG. 1 illustrates a flow diagram of a method of generating a cross-language summary for long text in a source language in accordance with an embodiment of the present application. It should be noted that the term "long text" is not used herein to define the length of the text to be summarized, which is a concept opposite to the summary text that is shorter in length, and the specific length may be, for example, hundreds of words, thousands of words or longer, and the length of the text does not affect the applicability of the method according to the embodiment of the present application, so the present application is not limited thereto.

In the case that a cross-language summary is to be generated for a long text in the source language, as shown in fig. 1, the long text in the source language for which the summary is to be generated may first be received by the processor in step 101, which is input by the user. Taking a cross-language abstract task on a non-public dataset DialogSumX (a dialogue of a daily scene, 140 words per corpus) as an example, a piece of source language text to be abstract is given in a normal case, and a piece of target language abstract text is required to be generated. In practice, the long text of the source language may be a dialogue, such as a dialogue of a daily scene, a dialogue of a conference scene, a dialogue of a length of several thousands of words, or the like, for example, with a length of several tens to several hundred words. In other embodiments, the long text of the source language may be a non-dialog segment, or may include multiple segments of the same type or different types, and the present application is not limited to the specific form of the long text of the source language.

Next, in step 102, source language summary text may be generated by the processor for display using the trained first learning model based on the received long text of the source language. In some embodiments, the first learning model is a single language abstract model, and in particular, may be based on different pre-trained language models, such as an encoder model (e.g., BERT, etc.), a decoder model (e.g., GPT, etc.), and an encoder-decoder model (e.g., BART, etc.), which the present application is not limited to. In some embodiments, the first learning model may be trained by selecting an automatic summary training dataset in the same language as the source language, e.g., in the case where the source language is english, the first learning model may be trained on a single language summary dataset such as SAMSum corpus (a manually annotated dialogue dataset for abstract summaries), CNN/DailyMail dataset (a partial extraction news corpus dataset), NYT Annotated Corpus (a partial extraction corpus dataset), newsroom (an extraction + generation corpus dataset). In other embodiments, AMI (a long meeting summary data set), XSum (an extremely generated news summary data set), dialogSum (a dialogue summary data set in a real scene), and the like may be further included, where the present application is not particularly limited, as long as the trained first learning model can complete the task of automatically generating summary text in the same language based on long text in the source language.

On this basis, in step 103, the processor further generates a target language abstract using the trained second learning model based on the source language abstract generated by the long text of the source language. In some embodiments, the second learning model is a cross-language abstract model, and in particular, a multilingual large-scale language model such as mbrt 50, mT5, etc. may be used, which is not limited by the present application, but in the training process of the second learning model, the input of the model is not only long text of the source language to be abstracted, but also long text of the source language and the source language abstract generated in step 102. When the second learning model is trained, a cross-language automatic abstract training data set which can cover the language direction of future cross-language abstract application or the similar language direction thereof can be selected, wherein the language direction refers to which source language is from which target language, such as English-Chinese, english-French and the like. In addition, in the training process of the second learning model, a separator (for example, may be respectively referred to as a second separator) may be used to separate the pre-generated source language abstract and the long text of the source language input by the second learning model, so that the second learning model effectively learns association and interaction relation between the texts of different parts separated by the second separator in the training process, and the like, so that the trained second learning model can effectively and accurately apply the two parts of the text input according to the learned mode, and comprehensively consider and give more accurate target language abstract in association with each other.

According to the method for generating the cross-language abstract for the long text of the source language, disclosed by the embodiment of the application, the information of the source language abstract which is generated in advance before the generation of the target language abstract can be fully utilized, and the information and the long text of the source language are used as the second learning model for executing the task of the cross-language abstract, so that the target language abstract generated by the second learning model is more accurate, is more relevant to the long text of the source language and has higher fidelity, and the requirements of users can be better met.

Unlike the summarization tasks on the DialogSumX dataset described previously, in some embodiments it is also possible that the user gives a Query task in the target language (i.e., query) and specifies the source language text to be summarized, requiring the generation of target language summary text related to the target language Query. In such an application scenario, as shown in fig. 2, the long text of the source language to be summarized, which is input by the user, may be received by the processor in step 201, respectively, and the query task of the target language, which is input by the user, may be received by the processor in step 202, and translated into the query task of the source language, where the tool adopted in the translation is not limited by the present application.

Next, in step 203, the source language summary text may be further generated by the processor for display using the trained first learning model based on the received long text in the source language and the query task in the source language. By way of example only, the query task may include a topic, a scope, a question that is desired to be solved, and the like, so, when the first learning model is trained, in order to enable the trained first learning model to accurately identify the query task and accurately complete the query task-based abstract, the query task needs to be input into the first learning model together during training, and a long text of the query task and a source language needs to be spaced and identified by a special mark such as a separator (for example, may be referred to as a third separator), so that the first learning model can learn association and interaction relations among different parts separated by the separator during training, and the like, and thus, when the query task and the source language text are input into the trained first learning model together, and the same separator as during training is inserted between the first learning model and the first learning model, the first learning model can accurately output the source language abstract matched with the query task.

Then, in step 204, the processor may further generate a target language abstract using the trained second learning model based on the long text of the source language, the query task of the target language, and the generated source language abstract. Similar to step 203, in the training process of the second learning model, the long text of the source language input as the second learning model, the query task of the target language, and the pre-generated source language abstract are also required to be separated by different separators (for example, may be referred to as a fourth separator and a fifth separator, respectively), so that the second learning model effectively learns the association and interaction relationship between the texts of different parts separated by the respective separators in the training process, and the like, so that the trained second learning model can effectively and accurately apply the three-part text as the input according to the learned mode, and comprehensively consider and associate with each other to give a more accurate target language abstract with higher matching degree with the query task. The remaining operations in step 204 are similar to step 103 of fig. 1, except that query tasks in the target language are added, and are not repeated here.

The forms adopted by the second separator, the third separator, the fourth separator, the fifth separator and the like can be set according to specific needs, and the application is not limited to this, so long as the second separator, the third separator, the fourth separator, the fifth separator and the like are not repeated, so that the first learning model or the second learning model can distinguish text parts with different meanings.

In some embodiments, in the case of displaying the generated source language abstract, a user's corrective action on the displayed source language abstract may also be received, which corrective action may be based at least on the deviation between the source language abstract and the query task in the source language. By way of example only, such deviations include, but are not limited to, mismatches in content, language styles, etc. of the source language abstract and the query task, e.g., the query task requires the generation of a more formal meeting report, while the generated source language abstract is more in a spoken language style, then the words of the source language abstract may be manually modified by the user to match the language style to the query task, etc., and are not listed herein. After the source language abstract is corrected as required, the target language abstract can be further generated by utilizing a trained second learning model based on the long text of the source language and the corrected source language abstract. Therefore, through manual intervention on the pre-generated source language abstract, the errors of the source language abstract can be identified and corrected in an earlier link, so that the controllability of the finally generated target language abstract is improved, and the second learning model is helped to generate the target language abstract which is more accurate and has higher matching degree with the query task.

In some embodiments, the target language digest may be checked based on the user indication. FIG. 3 is a flow chart of a verification of a source language digest against a target language digest according to an embodiment of the application.

As shown in fig. 3, in step 301, the generated target language digest is displayed.

In step 302, an interactive operation of the user on the displayed target language abstract is received, and in step 303, it is determined whether the received interactive operation of the user indicates verification, and in the case that the received interactive operation of the user indicates verification (yes in step 303), the generated source language abstract is displayed against the generated target language abstract.

In some embodiments, in the case where the generated source language abstract is displayed against the generated target language abstract, user modification operations on the displayed source language abstract may be further received, including but not limited to modifying errors present in words and sentences in the source language abstract, adjusting the order of languages, and correcting mismatches in content, language style, etc. of the source language abstract and query task. Then, based on the long text of the source language and the corrected source language abstract, an updated target language abstract is generated by using a trained second learning model, and the corrected source language abstract and the updated target language abstract can be displayed in a contrasting mode again, and the correction process can be repeatedly and iteratively executed until the user is satisfied if necessary. In the process of contrast display, the user can also more clearly see the instant change of the target language abstract caused by the modification of the source language abstract, and whether the modification achieves the corresponding effect, thereby being beneficial to obtaining more accurate target language abstract as soon as possible.

The manner of operation of the first learning model and the second learning model will be specifically described with reference to fig. 4 and 5.

In an embodiment of the present application, the trained first learning Model may be any single language abstract Model capable of generating a source language abstract based on long text of the source language, only one of which is shown in fig. 4, i.e., the first learning Model 40 is implemented based on an Encoder-Decoder Model.

As shown in fig. 4, the trained first learning model 40 includes an encoding unit 401 and a decoding unit 402, and generates a source language digest (toRepresentation) the received character to be abstracted comprising +.>，/>，/>，…，/>Internal source language (insrcIndicated) long text->(number of characters is +.>Individual) as an input text of the encoding section 401, viaThe encoding unit 401 encodes the outputHThe above procedure can be expressed as formula (1):

formula (1)

Wherein, the liquid crystal display device comprises a liquid crystal display device,the operation of the encoding unit 401 for encoding is shown, and the outputHAfter encoding the encoded part 401 +.>The final hidden layer representation.

Further, the output H of the encoding unit 401 and the output character string of the decoding unit 402 itself ，/>，…，/>Together as an input to the decoding unit 402 until the decoding unit 402 outputs a decoding terminator, wherein,bosin order to decode the start symbol,eosfor decoding the terminator, the slave character which is finally outputted by the decoding section 402 +.>To a decoding terminatoreosIs->，/>，…，eosNamely source language abstract->. In other embodiments, the source language abstract may even be presented entirely manuallyThe present application is not limited thereto as long as it can obtain the source language abstract +.>And (3) obtaining the product.

Fig. 5 shows a schematic diagram of a second learning model employing a word-level concatenation method according to an embodiment of the present application. As shown in FIG. 5, the source language abstract is obtained using the method of FIG. 4 as an exampleThereafter, the target language digest +/may be further generated using the trained second learning model 50>. Similar to the first learning model 40, the second learning model 50 includes an encoding section 501 and a decoding section 502, wherein the encoding section 501 is +.>And Source language Abstract->Both are used as inputs, just as examples, for example long text of the source language can be +.>Is->Spliced to form a spliced text, and the spliced text is used as an input of the encoding part 501, and the encoding part 501 encodes the spliced text to obtain an output +. >. The above process can be expressed as formula (2-1) and formula (2-2):

formula (2-1）

Formula (2-2)

Wherein, the liquid crystal display device comprises a liquid crystal display device,Sthe text that is formed by the concatenation is represented,indicates the operation of encoding by the encoding section 501, outputting +.>Text after encoding for the encoding section 501SThe final hidden layer representation.

Further, the decoding section 502 outputs the output of the encoding section 501And the output string of the decoding section 502 itself +.>，/>，/>，…，/>As an input, until the decoding section 502 outputs a decoding terminator, wherein,bosin order to decode the start symbol,eosfor decoding the terminator, the slave character which is finally outputted by the decoding section 502 +.>To a decoding terminatoreosPreceding character string->，，/>，…，/>Namely, the target language abstract->Wherein->The text length (number of words in a string) representing the target language abstract. The form of the separator 503 in fig. 5 is not limited as long as the encoder 501 can recognize and can be used to distinguish the source language abstract +.>And long text of the source language to be abstracted +.>And (3) obtaining the product.

The above-described method of stitching at the word level with separator in between can be implemented by simply abstracting the source language obtained in advance without changing the model structure of the second learning model 50 at allLong text +.>The quality of the generated target language abstract can be greatly improved by splicing, including the fact that the matching degree of the target language abstract and the query task is higher, which is not tried in other prior art before the application, and the performance improvement caused by the splicing is unexpected.

In other embodiments, a pre-generated source language digest is utilizedIn this case, a vector level stitching method may be used. Fig. 6 shows a schematic diagram of a second learning model employing a vector level stitching method according to an embodiment of the present application. As shown in fig. 6, in the second learning model 60, long text based on the source language +.>Encoding is performed by the first encoding part 601 to obtain a first implicit feature vector +.>Based on source language abstract->Encoding is performed by means of the second encoding part 601' to obtain a second implicit feature vector +.>. Then, the first implicit feature vector ++outputted to the encoding section 601 by the fusion section 603>And a second implicit feature vector +.>Fusion processing of feature vector levels is performed, for example, +.>And->Tandem to obtain a fusion feature vector +.>. The above procedure can be represented by the formula (3-1) -the formula (3-3):

formula (3-1)

Formula (3-2)

Formula (3-3)

Wherein, the liquid crystal display device comprises a liquid crystal display device,represents the encoding operation of the encoding section 601, +.>Represents the encoding operation of the encoding section 601' -, a ∈>Representation pair->And->And performing fusion processing of the feature vector level. In the pair->And->The fusion of the feature vector levels can be performed by any suitable method, except that +. >And->In addition to concatenation, the two vector dimensions may be unified and then added, or a more complex attention feature fusion (Attentional Feature Fusion Attention) may be employed, not shown. By way of example only, equation (3-3) may also be expressed as equation (3-3') in the case of a tandem splice approach:

formula (3-3')

Wherein, the liquid crystal display device comprises a liquid crystal display device,the representation will->And->And (5) performing series splicing.

In addition, it is noted that,and->Different model structures and/or different model parameters can be used, so that the coding part 601 and the coding part 601' can be optimally selected to be more suitable for the source language abstract +.>And long text of the source language->Is a model of (a). Experiments prove that the vector level splicing method shown in fig. 6, similar to the word level splicing method shown in fig. 5, can greatly improve the quality of the generated target language abstract, namely, the coding part in the second learning model can realize larger performance improvement by only slightly modifying the coding part, which is not tried in other prior art before the application, and the effect is unexpected.

The first decoding section 602 in fig. 6 may employ a model structure similar to the decoding section 402 in fig. 4 and the decoding section 502 in fig. 5 to fuse feature vectorsAnd its own output as input to generate a target language digest +.>Wherein->Is +.>Comprises->，/>，…，/>。

Fig. 7 exemplarily shows a typical structure of an encoder-decoder model in the case where the first learning model and the second learning model employ the encoder-decoder model according to an embodiment of the present application. As shown in fig. 7, in the encoder-decoder model 70, the encoding section 701 includes groups of encoders 701a-701m connected in series, and the decoding section 702 includes groups of decoders 702a-702n connected in series. Wherein each encoder 701a-701m includes at least an encoded self-attention layer and an encoded feedforward neural network layer connected in sequence, and each decoder 702a-702n includes at least a decoded self-attention layer, an encoded-decoded attention layer and a decoded feedforward neural network layer connected in sequence. In some embodiments, the individual encoders and decoders may also be adjusted based on the model structure shown in FIG. 7, e.g., additional attention structures may be added to the individual encoders for better alignment of long text in the source language To perform encoding, a decoding control module can be added in each decoder (e.g. according to source language abstract +.>Word level definition for the decoded generation of the target language digest), and the like, without limitation by the present application.

In some embodiments, both the first learning model and the second learning model may be trained using a minimized negative log likelihood.

Taking the first learning model as an example, when the source language to be abstracted isWhen training is performed on a first training data set consisting of long text of a language and corresponding single-language abstract of a source language, model parameters of a first learning model can be performed firstlyThen in the long text +.>As input, abstracts +.>As the true value of the output, by minimizing negative log-likelihood ++>To>Optimizing, in particular, the first in the prediction source language digestiWhen words are separated, the training target of the first learning model is shown in a formula (4):

formula (4)

Wherein, the liquid crystal display device comprises a liquid crystal display device,is at->And->Under the condition of->Is a conditional probability distribution of (c).

In some embodiments, the training target may be solved by using a gradient descent method, or other suitable solving methods may be used, which the present application is not limited to.

Similarly, for the second learning model, training may be performed on a second training data set consisting of long text of the source language to be abstracted, corresponding source language single language abstracts, and target language abstracts. First, model parameters are carried out on a second learning modelIs then used to initialize the long text +.>And Source language Single language abstract->Together as input to the second learning model with the corresponding target language abstract +.>As a true value of the model output, the model parameter can similarly still be scaled by minimizing the negative log-likelihood>Optimizing, i.e. in the predicted target language digest +.>When words are used, the training target of the second learning model is shown in a formula (5):

formula (5)

In other embodiments, the first learning model and the second learning model may also be trained using other loss functions, such as confusion, cross entropy loss functions, and the like, as the application is not limited in this regard.

It should be noted that although the first learning model and the second learning model may each employ a pre-training model structure similar to the encoder-decoder model 70, parameters of the pre-training model and the pre-training targets thereof may be set according to respective needs, in particular, the pre-training model employed by the first learning model uses a negative log likelihood of minimizing sentence reconstruction as a training target (self-supervised) when pre-training, and the pre-training model employed by the second learning model uses a negative log likelihood of minimizing multilingual translation as a training target (supervised) when pre-training, which may enable the first learning model to be more accurately adapted to the source language abstract task, and the second learning model to be more adapted to the cross-language abstract task.

FIG. 8 shows a schematic diagram of performance analysis of a method according to an embodiment of the application on different data sets. In the method column of fig. 8, 2-Step represents a method of generating a cross-language abstract using an embodiment of the application, and the second learning model uses a word-level concatenation method. Two non-public data sets DialogSumX (daily dialogue, 140 words long) and QMSumX (conference dialogue, 3k-7k words long) were used in the experiment, wherein for the DialogSumX data set, the experimental mode is to select a long text of the source language to be summarized from the DialogSumX data set, and a model or system adopting a 2-Step method or other methods is required to generate a target language summary. For a QMSumX dataset, the experimental approach is to perform a cross-language summary task on the dataset based on a query in a target language given by the user, namely: given long text in the source language to be summarized, and a query task in the target language, a model or system employing a 2-Step method or other method is required to generate a target language summary related to the query task in the target language. Thus, on the QMSumX dataset, the present application will employ the method as shown in fig. 2, namely: firstly, translating a received query task in a target language input by a user into a query task in a source language, then taking the query task and a long text in the source language as inputs to generate a source language abstract, and then combining the long text in the source language and the query task in the target language with the generated source language abstract to generate the target language abstract.

Other cross-lingual summary methods in FIG. 8 for performance comparison with the methods of the application include: S-T: firstly, a source language abstract is carried out, and then the source language abstract is translated into a pipeline method of a target language abstract; E2E: an end-to-end method for generating a target language abstract directly based on a long text of a source language; E2M: a variation of the end-to-end method that models sequentially generate a source language digest and a target language digest using the same decoder at decoding time; T-S-T: firstly translating a query task in a target language into a query task in a source language, then taking the query task and a long text in the source language as input, carrying out source language abstracts, and then translating the generated source language abstracts to generate the target language abstracts.

In addition, the experiment of fig. 8 includes 3 cross-lingual abstract language directions, namely: english is given as the source language, and Chinese, french, and Ukrait are given as the target language. The experiments used Rough and BERTSCore as automatic assessment indices, giving F1 scores for Rough-1 (R1), rough-2 (R2) and Rough-L (RL), and BERTSCore (BS), respectively. The above-mentioned calculation method of the automatic evaluation index is well known to those skilled in the art, and is not described in detail herein. As can be seen from the experimental results shown in FIG. 8, the 2-Step method according to the embodiment of the application is significantly better than other cross-language abstract methods under different task scenes and language scenes, and the best evaluation score is obtained as a whole.

Specifically, in the S-T pipeline method, because of the lack of a context of a long text of a source language to be abstracted, it is often unavoidable to generate ambiguity such as "Bob is going to the bank" in translating the source language abstract into a target language, where "bank" is "bank" or "river bank", and it is impossible to distinguish only by the source language abstract. In contrast, the 2-Step method according to the embodiment of the application directly digests the long text of the source language instead of translating the source language digest, so that on one hand, the problem of error propagation in the source language digest can be effectively avoided; on the other hand, for the second learning model, the long text of the source language to be abstracted actually provides rich context information for the source language abstract, so that the second learning model can be helped to better understand the semantics of words in the source language abstract when performing cross-language abstract tasks, and particularly, when a word ambiguity exists between the source language and the target language, the second learning model can be helped to select more proper word senses, for example, for a word of "bank", the second learning model can completely distinguish that the semantics of the word of "bank", "river bank" or other more proper word senses through the context of the long text of the source language.

In addition, compared with the end-to-end method such as E2E and E2M, the 2-Step method according to the embodiment of the present application additionally uses the source language digest as input for the second learning model in executing the cross-language digest task, and since the source language digest and the target language digest to be finally generated are closely related in content and semantics, the source language digest actually provides explicit guiding information for the second learning model to execute the cross-language digest task, whose functions include, but are not limited to, telling the second learning model which content is important information, and taking only the cross-language digest task based on the query as an example, the source language digest can help the second learning model to better understand the query task, efficiently exclude irrelevant text and sentences according to the query intention, and more reasonably select and organize to generate the target language digest, so that performance indexes such as correlation, matching degree between the generated target language digest and the query task can be greatly improved. Therefore, the method according to the embodiment of the application more effectively utilizes the monolingual abstract data aligned with the target language abstract, avoids data waste and improves the reasoning effect of the model and the system.

The method according to embodiments of the present application additionally allows the user to make the necessary modifications to the source language digest as an intermediate representation, including but not limited to modifying errors in the source language digest, adding, deleting parts of the content, adjusting parts that do not match the query task, etc., and without limiting the extent of modification, even completely replacing the content of the source language digest, if necessary. In other embodiments, a subject word (Topic words), phrase, sentence, etc. may be used as the source language abstract independently or additionally, that is, the generated source language abstract may be replaced by the above abstract information which is not necessarily complete, or the above abstract information may be added as a supplement to the generated source language abstract, which is not limited in specific manner. By the method, the controllability of the finally generated target language abstract can be improved, and the purpose of the interpretable cross-language abstract can be achieved by checking and adjusting the source language abstract serving as guiding information particularly when the generated target language abstract is wrong, for example, whether the source language abstract and the target language abstract have the same deviation can be checked and found by checking the source language abstract and the target language abstract, so that whether the deviation is generated in a source language abstract link or a cross-language abstract link can be judged, and further improvement of the model and system performance can be facilitated.

It should be noted that, the application scope of the method according to the embodiment of the present application is not limited to the 3 cross-language abstract language directions shown in fig. 8, and the method according to the embodiment of the present application is applicable to various cross-language abstract language directions, where the performance improvement particularly significantly includes: the source language includes various languages of the Hilman family, and correspondingly, the target language may include a language of the Tibetan family, the Altai family, the Indian family, the Iran family, the Latin family, the Stoffal family, the Porro family, the damascan family, the flash Mitt family, or the Mimo-containing family. For the cross-language abstract language directions, as the mapping relation of the ambiguous words exists between the source language and the target language, the method according to the embodiment of the application can provide the context information for helping word sense resolution and selection for the source language abstract, so that the method has better performance than other methods. For similar reasons, for long texts in the source language containing more specialized words, the method according to the embodiment of the present application can also help the resolution of the specialized words by providing rich context for the source language abstract, thus greatly improving the performance of the model and the system.

In addition, the method according to the embodiment of the application is particularly suitable for the scene that the word number of the long text of the source language to be abstracted is longer, for example, reaches the level of thousands of words or more than thousands of words, because the method of the application provides the source language abstract as guiding information, realizes the alignment with the abstract level of the target language abstract, changes the task mode to be trained by the second learning model, and reduces the multi-task compounding degree, thereby enabling the multi-task to have better performance during reasoning.

In addition, the method according to the embodiment of the present application is also particularly suitable for a scenario where the source language and the target language belong to different script systems, for example, the source language is Latin script (e.g., english), the target language is Mongolian script (e.g., mongolian), or the source language is Cyrillic script (e.g., russian), the target language is Han ideograph (e.g., modern chinese characters), and so on, which are not listed herein. As the task of cross-language abstract between different script systems has higher requirement on the multi-task compounding degree of the model, the method of the application takes the source language abstract which is generated in advance and is subjected to necessary correction as the guiding information, so that the multi-task compounding degree of the second learning model can be greatly reduced, and the trained second learning model can have better performance.

There is further provided, in accordance with an embodiment of the present application, a system for generating a cross-language summary for long text in a source language, and fig. 9 is a block diagram illustrating a portion of the system for generating a cross-language summary for long text in a source language, in accordance with an embodiment of the present application.

The system 90 in fig. 9 may be a special purpose computer or a general purpose computer. As shown in fig. 9, the system 90 may include an interface 901, a processor 902, and a display 903.

In some embodiments, the interface 901 may be configured to obtain long text of a source language in which a summary is to be generated, as input by a user. In some embodiments, interface 901 may include a network adapter, a cable connector, a serial connector, a USB connector, a parallel connector, a high-speed data transmission adapter (such as fiber optic, USB 3.0, a lightning interface, etc.), a wireless network adapter (such as a WiFi adapter), a telecommunications (3G, 4G/LTE, etc.) adapter, etc., as the application is not limited in this respect. The system 90 may transmit the retrieved long text of the source language of the summary to be generated, etc., entered by the user, to the processor 902, the display 903, etc., and the like via the interface 901. In some embodiments, interface 901 may also receive, for example, a first and/or second trained learning model from, for example, a learning network training device (not shown), to which the present application is not limited.

In some embodiments, the processor 902 may be configured to perform the steps of the method of generating a cross-language summary for long text in a source language according to various embodiments of the present application. In some embodiments, the processor 902 may be a processing device including more than one general purpose processing device, such as a microprocessor, central Processing Unit (CPU), graphics Processing Unit (GPU), or the like. More specifically, the processor may be a Complex Instruction Set Computing (CISC) microprocessor, a Reduced Instruction Set Computing (RISC) microprocessor, a Very Long Instruction Word (VLIW) microprocessor, a processor running other instruction sets, or a processor running a combination of instruction sets. The processor may also be one or more special purpose processing devices such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), a system on a chip (SoC), or the like.

In some embodiments, the display 903 may be configured to display, for example, the generated target language digest, or the generated source language digest and the generated target language digest against the display. In some embodiments, the display 903 may include a Liquid Crystal Display (LCD), a light emitting diode display (LED), a plasma display, or any other type of display, and provides a Graphical User Interface (GUI) presented on the display for user input and image/data display.

In other embodiments, system 90 may further include an input device (not shown), such as a keyboard, mouse, ball, etc., and/or a device, such as a microphone, etc., capable of converting speech into electrical signals, in order to enable a user to input long text in the source language for which the summary is to be generated, to input a query task in the target language, to perform interactive operations on the displayed target language summary (including instructing verification of the target language summary, etc.), to modify operations on the displayed source language summary, etc., thereby enabling system 90 to support the various operations described above by the user. In other embodiments, these input devices may also be integrated with the display 903, for example, performing various operations on the surface of the display 903, which is a touch screen, which is not a limitation of the present application.

In other embodiments, the system 90 may also include a memory (not shown) for storing trained first and second learning models, associated data, etc., such as parameter configurations of the first and second learning models, learned feature maps, and data, etc., such as data received, used, or generated while executing a computer program. In some embodiments, the memory may also store computer-executable instructions, such as one or more processing programs, to implement the steps of the method of generating a cross-language summary for long text in a source language in accordance with various embodiments of the present application. In some embodiments, the processor 902 may be communicatively coupled to a memory and configured to execute computer-executable instructions stored thereon to perform a method of generating a cross-language summary for long text in a source language, such as in accordance with embodiments of the present application.

In some embodiments, the system 90 may further include a learning network training unit (not shown) for training the first learning model and the second learning model, in which case the interface 901 may be further configured to obtain data such as training data sets required for training the first learning model and the second learning model, which is not described herein.

Embodiments of the present application also provide a non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement the steps of the method of generating a cross-language summary for long text in a source language according to the various embodiments of the present application.

In some embodiments, the non-transitory computer readable medium described above may be a medium such as Read Only Memory (ROM), random Access Memory (RAM), phase change random access memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), electrically Erasable Programmable Read Only Memory (EEPROM), other types of Random Access Memory (RAM), flash memory disk or other forms of flash memory, cache, registers, static memory, compact disk read only memory (CD-ROM), digital Versatile Disk (DVD) or other optical storage, magnetic cassettes or other magnetic storage devices, or any other possible non-transitory medium which is used to store information or instructions that can be accessed by a computer device.

The above description is intended to be illustrative and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in combination with each other. Other embodiments may be used by those of ordinary skill in the art in view of the above description. Moreover, in the foregoing detailed description, various features may be grouped together to simplify the present application. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, the inventive subject matter may lie in less than all features of a single disclosed embodiment. Thus, the claims are hereby incorporated into the detailed description as examples or embodiments, with each claim standing on its own as a separate embodiment, and it is contemplated that these embodiments may be combined with each other in various combinations or permutations. The scope of the application should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims

1. A method of generating a cross-language summary for a long text in a source language, comprising, by a processor:

receiving a long text of a source language to be summarized, which is input by a user;

Generating a source language abstract for display using the trained first learning model based on the received long text of the source language;

splicing the long text of the source language with the generated source language abstract, inputting the spliced text into a coding part of a trained second learning model, generating and displaying a target language abstract by using a decoding part of the trained second learning model, or coding the long text of the source language by using a first coding part of the trained second learning model to obtain a first implicit feature vector, coding the generated source language abstract by using a second coding part of the trained second learning model to obtain a second implicit feature vector, connecting the first implicit feature vector and the second implicit feature vector in series to obtain a fused feature vector, and generating and displaying the target language abstract by using a first decoding part of the trained second learning model based on the fused feature vector;

receiving interactive operation of a user on the displayed target language abstract;

in the case that the received interactive operation instruction of the user is checked, the generated source language abstract is compared with the generated target language abstract to be displayed;

Receiving a user correction operation on the displayed source language abstract under the condition that the generated source language abstract is displayed in contrast to the generated target language abstract;

generating an updated target language abstract by using a trained second learning model based on the long text of the source language and the corrected source language abstract; and is also provided with

And displaying the updated target language abstract in comparison with the modified source language abstract.

2. The method of claim 1, further comprising,

receiving a query task in a target language input by a user by a processor, and translating the query task in the target language into a query task in a source language;

generating, based on the received long text in the source language, a source language summary for display using the trained first learning model further comprises: generating a source language abstract for display by using a trained first learning model based on the received long text of the source language and the query task of the source language;

generating a target language abstract based on the long text of the source language in combination with the generated source language abstract using a trained second learning model further comprises: and generating a target language abstract by utilizing a trained second learning model based on the long text of the source language and the query task of the target language in combination with the generated source language abstract.

3. The method as recited in claim 2, further comprising:

in the case of displaying the generated source language abstract, receiving a user's corrective action on the displayed source language abstract, the corrective action being based at least on the deviation between the source language abstract and the query task of the source language;

and generating a target language abstract by using a trained second learning model based on the long text of the source language and the corrected source language abstract.

4. The method of claim 1 or 2, wherein the source language comprises various languages of the japanese family, and the target language comprises a language of the Tibetan family, allterrain family, indian family, illian family, latin family, siraitia family, porro family, damascan family, flash milt family, midget family.

5. The method of claim 4, wherein the source language comprises english and the target language comprises chinese.

6. The method of claim 1 or 2, wherein the source language and the target language each belong to different script systems.

7. The method according to claim 1 or 2, wherein the first learning model and the second learning model are each implemented using an encoder-decoder model.

8. The method of claim 7, wherein in the encoder-decoder model, a series connected set of encoders and a series connected set of decoders are included, wherein each encoder includes at least a coded self-attention layer and a coded feedforward neural network layer connected in sequence, and each decoder includes at least a decoded self-attention layer, a coded-decoded attention layer, and a decoded feedforward neural network layer connected in sequence.

9. The method of claim 1 or 2, further comprising, when the first learning model and the second learning model employ a pre-training model:

the pre-training model of the first learning model takes the negative log likelihood of minimizing sentence reconstruction as a training target during pre-training;

the pre-training model of the second learning model takes the negative log likelihood of the minimum multilingual translation as a training target during pre-training.

10. The method of claim 1 or 2, wherein the first learning model and the second learning model are each trained using a minimized negative log likelihood.

11. A system for generating a cross-language summary for a long text in a source language, comprising:

An interface configured to: acquiring a long text of a source language to be summarized, which is input by a user;

a processor configured to: performing a method of generating a cross-language summary for a long text in a source language according to any one of claims 1-10; and

a display configured to: the generated target language abstract is displayed, or the generated source language abstract and the generated target language abstract are displayed against each other.

12. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement a method of generating a cross-language summary for long text in a source language according to any of claims 1-10.