CN114625866A - Method, device, equipment and medium for training abstract generation model - Google Patents

Method, device, equipment and medium for training abstract generation model Download PDF

Info

Publication number
CN114625866A
CN114625866A CN202210238223.6A CN202210238223A CN114625866A CN 114625866 A CN114625866 A CN 114625866A CN 202210238223 A CN202210238223 A CN 202210238223A CN 114625866 A CN114625866 A CN 114625866A
Authority
CN
China
Prior art keywords
abstract
sample
correct
training
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210238223.6A
Other languages
Chinese (zh)
Inventor
刘维
吴焕钦
牟文晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210238223.6A priority Critical patent/CN114625866A/en
Publication of CN114625866A publication Critical patent/CN114625866A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a method, a device, equipment and a medium for training a summary generation model, relates to the technical field of artificial intelligence, and can be applied to various scenes including but not limited to cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like. The method comprises the following steps: performing multiple rounds of iterative training on the abstract generation model to be trained based on the training sample set, and inputting the training samples into a coding sub-model in a round of iterative process to obtain coded text features, correct abstract features and error abstract features; determining a first loss value based on the similarity of the correct abstract characteristic and the error abstract characteristic with the text characteristic, and carrying out parameter adjustment on the coding sub-model based on the first loss value; and inputting the text features into the decoding submodel to obtain a prediction abstract, and carrying out parameter adjustment on the coding submodel and the decoding submodel on the basis of a second loss value determined by the prediction abstract and the correct sample abstract. The embodiment of the application can improve the fact consistency of the abstract generation model.

Description

Method, device, equipment and medium for training abstract generation model
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for training a summary generation model.
Background
With the continuous development of natural language processing technology, abstract generation models based on natural language processing technology are widely applied. The abstract generation model aims to acquire key information of a text and generate a short abstract containing the key information.
The summary generation model generally employs a sequence-to-sequence model that includes an encoder for text encoding text into vectors and a decoder for extracting semantic information from the encoded vectors to generate a text summary.
Although the abstract generation model is convenient for quickly generating the abstract of the text, the situation that the generated abstract does not accord with the fact in the text easily occurs; for example: the text contains the fact that "character a directed a movie in 2010 and was lead by character B", but the abstract creation model may generate the fact that "character B directed a movie".
Therefore, how to ensure the fact consistency of the abstract generated by the abstract generation model and the corresponding text is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a method and a device for training a summary generation model, electronic equipment and a storage medium, which are used for realizing the fact consistency of a summary generated by the summary generation model and a corresponding text.
In one aspect, an embodiment of the present application provides a method for training a summary generation model, including:
carrying out multi-round iterative training on the abstract generating model to be trained based on a training sample set, and outputting the trained abstract generating model, wherein each training sample comprises: sample text, and corresponding correct sample summary and incorrect sample summary; wherein, in a round of iteration process, the following operations are executed:
inputting the obtained training sample into the coding sub-model to obtain coded text features, correct abstract features and error abstract features;
determining a first loss value based on the similarity of the correct abstract feature and the wrong abstract feature with the text feature respectively, and carrying out parameter adjustment on the coding sub-model based on the first loss value;
inputting the text features into the decoding submodel to obtain a prediction abstract, determining a second loss value based on the prediction abstract and the correct sample abstract, and performing parameter adjustment on at least one of the encoding submodel and the decoding submodel based on the second loss value.
In one aspect, an embodiment of the present application provides an apparatus for training a digest generation model, where the digest generation model includes an encoding sub-model and a decoding sub-model, the apparatus includes:
the training module is used for carrying out multi-round iterative training on the abstract generating model to be trained based on a training sample set and outputting the trained abstract generating model, and each training sample comprises: sample text, and corresponding correct sample summary and incorrect sample summary; in one iteration process, the following operations are executed:
inputting the obtained training sample into the coding sub-model to obtain coded text features, correct abstract features and error abstract features;
determining a first loss value based on the similarity of the correct abstract feature and the wrong abstract feature with the text feature respectively, and carrying out parameter adjustment on the coding sub-model based on the first loss value;
inputting the text features into the decoding submodel to obtain a prediction abstract, determining a second loss value based on the prediction abstract and the correct sample abstract, and performing parameter adjustment on at least one of the encoding submodel and the decoding submodel based on the second loss value.
In one possible embodiment, the training module, when determining the second loss value based on the prediction summary and the correct sample summary, is further configured to:
determining a correct prediction probability value based on the prediction digest and the correct sample digest, and determining an incorrect prediction probability value based on the prediction digest and the incorrect sample digest;
determining the second loss value based on the correct prediction probability value and the incorrect prediction probability value.
In a possible embodiment, the method further comprises an obtaining module, configured to obtain a summary of the error samples in each training sample by:
searching a keyword set from a correct sample abstract in the training sample;
aiming at a plurality of keywords in the keyword set, respectively executing the following operations: if the replacement probability of one keyword reaches a preset probability, replacing the keyword with a corresponding replacement word; wherein the replacement word is selected from sample text in the training sample;
and taking the replaced correct sample abstract as an error sample abstract in the training sample.
In a possible embodiment, when searching for the keyword set from the correct sample summary in the training sample, the obtaining module is further configured to:
obtaining context sentences corresponding to at least one abstract sentence in the correct sample abstract from sample texts in the training samples;
iteratively performing, for the at least one summary statement, a plurality of times: respectively cutting the at least one abstract statement to obtain word segments corresponding to the at least one abstract statement, splicing the at least one word segment into abstract segments, and taking each word segment as a new abstract statement;
for the obtained plurality of summary segments, the following operations are respectively executed: determining an evaluation value of one abstract section based on the abstract section and at least one context statement corresponding to the correct sample abstract;
selecting a target summary segment from the plurality of summary segments based on the evaluation value of each of the plurality of summary segments, and composing a plurality of words in the target summary segment into the keyword set.
In a possible embodiment, when determining the evaluation value of the one digest segment based on the at least one context statement corresponding to the one digest segment and the correct sample digest, the obtaining module is further configured to:
for the at least one context statement, performing the following operations, respectively: inputting a context statement and the abstract segment into an autoregressive language model to obtain a first language prediction probability;
obtaining a relevance value of the summary segment based on the obtained at least one first language prediction probability;
inputting the abstract fragment into the autoregressive language model to obtain a second language prediction probability, and obtaining a compression ratio value of the abstract fragment based on the second language prediction probability;
and taking the correlation value and the compression ratio value of the summary segment as the evaluation value of the summary segment.
In a possible embodiment, when selecting the target summary segment from the plurality of summary segments based on the evaluation values of the plurality of summary segments, the obtaining module is further configured to:
selecting a plurality of candidate abstract segments with compression ratio values meeting preset conditions from the plurality of abstract segments;
and taking the candidate summary segment with the maximum correlation value in the plurality of candidate summary segments as the target summary segment.
In one aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory, where the memory stores program code, and when the program code is executed by the processor, the processor is caused to perform the steps of any one of the above methods for training a digest generation model.
In one aspect, an embodiment of the present application provides a computer storage medium storing computer instructions, which when executed on a computer, cause the computer to perform any of the above-mentioned steps of the method for training a digest generation model.
In one aspect, embodiments of the present application provide a computer program product, which includes computer instructions stored in a computer-readable storage medium; when the processor of the computer device reads the computer instructions from the computer readable storage medium, the processor executes the computer instructions, so that the computer device executes the steps of any one of the above methods for training the abstract generation model.
Due to the adoption of the technical scheme, the embodiment of the application has at least the following technical effects:
in the scheme of the embodiment of the application, a training sample consisting of a sample text, a correct sample abstract and an incorrect sample abstract is adopted to compare and learn the coding sub-model of the abstract generation model; specifically, a training sample is input into a coding submodel, coded text features, correct abstract features and error abstract features are obtained, then, a first loss value is determined based on the similarity between the correct abstract features and the error abstract features and the text features, and the coding submodel is subjected to parameter adjustment based on the first loss value, so that the coding of the correct sample abstract and the coding of a sample text are more similar, and the coding of the error sample abstract and the coding of the sample text are more dissimilar; further, text features are input into a decoding submodel to obtain a prediction abstract, a second loss value is determined based on the prediction abstract and a correct sample abstract, and at least one of the coding submodel and the decoding submodel is subjected to parameter adjustment based on the second loss value; therefore, the trained coding submodel can correctly code the fact information in the text, the fact errors are reduced, and the trained decoding submodel outputs the text abstract which has the fact consistency with the text.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic view of an application scenario of a method for training a summary generation model according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for training a summary generation model according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a training sample provided in an embodiment of the present application;
FIG. 4 is a logic diagram illustrating comparison learning of an encoder according to an embodiment of the present application;
fig. 5 is a logic diagram of a decoder contrast learning according to an embodiment of the present application;
FIG. 6 is a flowchart of a method for constructing a summary of an error sample according to an embodiment of the present application;
FIG. 7 is a logic diagram illustrating a method for constructing a summary of error samples according to an embodiment of the present disclosure;
FIG. 8 is a sample diagram of a constructed error sample summary according to an embodiment of the present application;
FIG. 9 is a logic diagram of a method for training a summary generation model according to an embodiment of the present application;
fig. 10 is a schematic view of an application scenario of a digest generation model according to an embodiment of the present application;
fig. 11 is a block diagram illustrating an apparatus for training a summary generation model according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of another electronic device in this embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
To facilitate better understanding of the technical solutions of the present application for those skilled in the art, the following terms related to the present application are introduced.
And (3) abstract generation model: the method adopts a sequence-to-sequence architecture, also called an encoder and a decoder architecture, wherein the encoder is used for performing text coding on an original text to obtain a vector, and the decoder is used for extracting information from the vector and obtaining semantics so as to generate a text abstract.
Comparative Learning (contrast Learning): is a self-supervised learning method for learning the general features of a data set by letting the model learn similar data or different data without labels.
Autoregressive language model: the next word that is likely to follow is predicted from the context, or the last word that is likely to follow is predicted from the context, i.e., a language model task from left to right or from right to left.
The word "exemplary" is used hereinafter to mean "serving as an example, embodiment, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The terms "first" and "second" are used herein for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature, and in the description of embodiments of the application, unless stated otherwise, "plurality" means two or more.
The embodiment of the application relates to the technical field of Artificial Intelligence (AI), and is designed based on a Natural Language Processing (NLP) technology in the AI.
Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.
Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, which is the fundamental approach for making computers have intelligence, so deep learning is the core of machine learning, which is a technology for realizing machine learning. Machine learning generally includes deep learning, reinforcement learning, migration learning, inductive learning, and the like, and deep learning includes techniques such as a mobile visual Neural network (mobilene), a Convolutional Neural Network (CNN), a deep belief network, a recurrent Neural network, an automatic encoder, and a generation countermeasure network.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language people use daily, so it has a close relation with the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
The abstract generation model and the autoregressive language model in the embodiment of the application are constructed based on machine learning and natural language processing technology.
The following briefly introduces the design concept of the embodiments of the present application:
in the related art, in order to ensure the fact that the abstract generated by the abstract generation model is consistent with the corresponding text, an additional processing module, such as a pre-processing module or a post-processing module, is usually added on the basis of the abstract generation model to improve the encoding process or the decoding process, respectively. The preprocessing module inputs supplementary features for the model, and improves the encoding process, such as additionally adding an encoder to encode the results of relationship extraction and named entity recognition, adding a knowledge graph model to extract text knowledge for encoding, or adding a text implication model to supplement features for the abstract model. The post-processing module is mainly divided into two schemes of correction and sequencing, and the decoding result is improved. The correction scheme considers the fact consistency abstract as a text error correction problem, and additionally designs a sequence-to-sequence text error correction model for the generated abstract result to correct the fact errors in the abstract. The sorting scheme is that a plurality of abstract result candidates are generated through a beam search decoding or entity candidate traversing replacement mode, then a fact consistency sorting model is additionally designed to sort the abstract candidates, and the abstract candidate with the highest score is selected as a final result.
However, the above method needs to add additional processing modules to implement the fact consistency of the abstract generating model, and in general, these additional processing modules are also deep natural language processing models with large parameter quantity, require additional training data, and are not trained end to end with the abstract generating model; this mainly causes two problems: firstly, errors of the modules can be accumulated in the summary result, for example, in some schemes with preprocessing and post-processing modules, the model prediction link is very long, and errors of all the modules can be accumulated to cause the quality of the summary result to be reduced; secondly, the burden of online reasoning is increased by the additional modules, the time delay of online reasoning is improved, and the modules of some schemes are even the same as the abstract model in size, so that the scheme is not beneficial to model deployment and landing.
In view of this, embodiments of the present application provide a method, an apparatus, a device, and a medium for training a digest generation model, in which a training sample composed of a sample text, a correct sample digest, and an incorrect sample digest is used to perform comparison learning on a coding sub-model of the digest generation model, so that a coding of the correct sample digest is more similar to a coding of the sample text, and a coding of the incorrect sample digest is more dissimilar to a coding of the sample text; therefore, the trained encoder can correctly encode the fact information in the text, and reduce the fact errors, thereby realizing the fact consistency of the abstract generation model. According to the embodiment of the application, an additional processing module is not required to be added, and the fact consistency of the abstract generation model is achieved by improving the training process of the encoder.
The preferred embodiments of the present application will be described in conjunction with the drawings of the specification, it should be understood that the preferred embodiments described herein are for purposes of illustration and explanation only and are not intended to limit the present application, and features of the embodiments and examples of the present application may be combined with each other without conflict.
Fig. 1 is a schematic view of an application scenario in the embodiment of the present application. The application scenario diagram includes a plurality of terminal devices 100 and a server 200. The terminal device 100 and the server 200 can communicate with each other through a communication network. Alternatively, the communication network may be a wired network or a wireless network. The terminal device 100 and the server 200 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
In the embodiment of the present application, the terminal device 100 is an electronic device used by a user, and the electronic device includes, but is not limited to, a personal computer, a mobile phone, a tablet computer, a notebook, an electronic book reader, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, and the like; the terminal device 100 may install various applications such as a browser-type application, a video application, an information-type application, and the like. The server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform.
The training operation of the abstract generation model in the embodiment of the application can be executed by terminal equipment or a server; the trained abstract generation model can be deployed on the terminal equipment and can also be deployed on the server.
When the abstract generating model is deployed on the terminal equipment and the training operation of the abstract generating model is executed by the server, the terminal equipment receives the trained abstract generating model sent by the server and installs the trained user preference model on the local area, so that when the terminal equipment acquires the text needing to generate the abstract, the text can be input into the abstract generating model to acquire the text abstract.
When the server trains the abstract generating model, the abstract generating model to be trained is subjected to multi-round iterative training based on a training sample set, and each training sample comprises: sample text, and corresponding correct sample summary and incorrect sample summary; in a round of iteration, the following operations are performed: inputting the obtained training samples into a coding submodel to obtain coded text features, correct abstract features and error abstract features; determining a first loss value based on the similarity of the correct abstract characteristic and the error abstract characteristic with the text characteristic, and carrying out parameter adjustment on the coding sub-model based on the first loss value; inputting the text features into the decoding submodel to obtain a prediction abstract, determining a second loss value based on the prediction abstract and the correct sample abstract, and performing parameter adjustment on at least one of the encoding submodel and the decoding submodel based on the second loss value.
The trained abstract generation model of the embodiment of the application can be used for various abstract generation scenes, such as commodity file generation, automatic comment generation, medical/legal report generation, news abstract generation, information flow article title generation, sports and war generation and the like, so that redundant information filtering of texts is realized, and reading experience and reading efficiency of users are improved.
It should be noted that fig. 1 is an exemplary description of an application scenario of the method for training a summary generation model in the present application, and an application scenario to which the method in the present embodiment may be applied is not limited to this. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, driving assistance and the like.
The following describes a specific implementation of the method for training a summary generation model according to an embodiment of the present application.
The summary generation model in the embodiment of the present application may be any sequence-to-sequence model, and may also be referred to as a model of an encoder-decoder structure, for example, a BART (Bidirectional and Auto-Regressive Transformers) model, and the like.
Fig. 2 is a schematic diagram illustrating a method for training a digest generation model according to an embodiment of the present disclosure, where the method may be performed by a terminal device or a server. The abstract generation model comprises an encoding sub-model and a decoding sub-model, and as shown in fig. 2, the training method may comprise the following steps:
step S201, a training sample set is obtained, each training sample including: sample text, and corresponding correct sample summary and incorrect sample summary.
The sample text in each training sample may be articles of various contents, such as: the various content includes, but is not limited to, information flow content, news content, merchandise content, medical content, legal content, and the like.
When the sample text is news content, the corresponding correct sample abstract may be a news abstract, the fact information in the news abstract is consistent with the news content, and the incorrect sample abstract may be similar to the content of the correct sample abstract, but there is a factual error.
Alternatively, the erroneous sample digest may be generated based on the corresponding correct sample digest, for example: some keywords in the correct sample summary are replaced with other words, which may be similar words to the keywords, for example, similar words to the keywords may be selected from the sample text. The construction process of the error sample summary will be further described in the following embodiments of the present application.
Illustratively, as shown in fig. 3, The content of The sample text is ". A.xenococcus leopard may have kill 15 pest in Nepal in a15-month span,. The polar chip is present with a human being-an injecting of The linear reactors for The defo. The correct sample digest is "A4-year-old board is the latest view of a man-eating leopard,.". He surgery one leg is after the tests of 15 pedals in the past 15 monghes. "; the erroneous sample digest is "A4-year-old book of the latest view of a slot leopard". The He selected one of the book is after the dates of the deletion of 15 words in the past 20 mono. Specifically, "man-proofing" is replaced with "sloth", "subsets" is replaced with "subsets", "15" in "15 months" is replaced with "20", and "loopards" is replaced with "boors".
Step S202, performing multiple rounds of iterative training on the abstract generating model to be trained based on the training sample set, wherein in the process of one round of iteration, the following steps S2021-S2023 are executed:
step S2021, inputting the obtained training samples into a coding sub-model to obtain coded text features, correct abstract features and error abstract features.
The coding sub-model can be a coder and is used for coding the sample text, the correct sample abstract and the error sample abstract respectively to obtain hidden layer representations of the sample text, the correct sample abstract and the error sample abstract after coding respectively, the hidden representations of the sample text are used as text features, the hidden representations of the correct sample abstract are used as correct abstract features, the hidden representations of the error sample abstract are used as error abstract features, and the features can be expressed as vectors.
Step S2022, determining a first loss value based on the similarity between the correct abstract feature and the text feature and the similarity between the error abstract feature and the text feature, and performing parameter adjustment on the encoding submodel based on the first loss value.
The similarity between the correct abstract feature and the text feature is a similarity between vectors, such as a cosine similarity, and the similarity between the incorrect abstract feature and the text feature may also be a cosine similarity. In addition, the similarity between vectors is not limited to the cosine similarity, and other similar algorithms may be adopted, and the cosine similarity is described as an example below.
Specifically, when calculating the similarity between the correct abstract feature and the text feature, the correct abstract feature, and the error abstract feature may be normalized, and assuming that the normalized text feature is represented as H1, the normalized correct abstract feature is represented as H2, and the normalized error abstract feature is represented as H3, the first loss value may be calculated by using a comparison learning loss function of the following equation (1) to implement comparison learning:
Figure BDA0003543202530000121
where γ is a temperature coefficient for control of contrast learning.
The following takes the coding sub-model as an example of the encoder, and the comparative learning process of the encoder is exemplarily described with reference to fig. 4.
As shown in fig. 4, a sample text, a correct sample abstract and an error sample abstract are input into an encoder to obtain encoded text features, correct abstract features and error abstract features, then, similarities of the text features and the correct abstract features are calculated, and the similarities of the text features and the error abstract features are calculated, contrast learning is performed based on the two similarities, that is, the two similarities are substituted into a contrast learning loss function to obtain a first loss value, and parameters of the encoder are adjusted based on the first loss value.
In the embodiment of the application, the coding submodel is compared and learned on the basis of the correct sample abstract and the wrong sample abstract of the sample text, so that the coding submodel can enable the coding of the correct sample abstract and the coding of the sample text to be more similar and enable the coding of the wrong sample abstract and the coding of the sample text to be more dissimilar under the condition of giving the sample text. Therefore, the trained coding sub-model can correctly code the fact information in the text, and the fact errors are reduced.
Step S2023, inputting the text features into the decoding submodel to obtain a prediction abstract, determining a second loss value based on the prediction abstract and the correct sample abstract, and performing parameter adjustment on at least one of the encoding submodel and the decoding submodel based on the second loss value.
After obtaining the prediction digest output by the decoder, the decoding submodel may calculate a second loss value of the prediction digest and the correct sample digest by using a loss function (e.g., a cross entropy loss function) of the decoding submodel, and perform parameter adjustment on the decoding submodel based on the second loss value, and may also perform parameter adjustment on the encoding submodel.
Step S203, when the preset convergence condition is satisfied, the trained abstract generation model is obtained.
For example, the preset convergence condition may be that the first loss value is smaller than a first set value, and the second loss value is smaller than a second set value; the first set value and the second set value may be set as needed, and are not limited herein.
In the scheme of the embodiment of the application, the coding submodel of the abstract generation model is compared and learned by adopting the training sample formed by the sample text, the correct sample abstract and the error sample abstract, so that the coding of the correct abstract is more similar to the coding of the text and the coding of the error abstract is more dissimilar to the coding of the text under the condition of the given text of the trained coding submodel. Therefore, the encoder can correctly encode the fact information in the text, reduce the fact error and further enable the trained decoder to output the text abstract with the fact consistency with the text.
In some embodiments, to further improve the fact consistency of the digest generation model, the decoder may also perform comparison learning, so that the loss of the prediction digest and the correct digest is minimum, and the loss of the prediction digest and the error digest is maximum.
At this time, the determining the second loss value based on the prediction digest and the correct sample digest in the above step S2023 may include the following steps a1-a 2:
step a1, determining a correct prediction probability value based on the prediction digest and the correct sample digest, and determining an incorrect prediction probability value based on the prediction digest and the incorrect sample digest.
The correct loss values of the prediction abstract and the correct sample abstract can be calculated through a loss function of the decoding sub-model, and the correct loss values are converted into correct prediction probability values, and it can be understood that the smaller the correct loss value is, the larger the correct prediction probability value is; likewise, the error loss values of the prediction digest and the error sample digest are calculated by a loss function of the decoder, and the error loss values are converted into error prediction probability values, the larger the error loss value, the smaller the error prediction probability value.
Step a2, determining a second loss value based on the correct prediction probability value and the incorrect prediction probability value.
In this step, the correct prediction probability value and the incorrect prediction probability value may be substituted into a comparative learning loss function of the decoding submodel, and a second loss value may be calculated.
The following takes the decoding submodel as an example of a decoder, and an exemplary comparative learning process of the decoder is described with reference to fig. 4.
As shown in fig. 5, after the encoded text features are input into the decoder, a prediction digest is obtained, an erroneous prediction probability value is calculated based on the prediction digest and the erroneous sample digest, and a correct prediction probability value is calculated based on the prediction digest and the correct sample digest. The contrast learning is realized by setting a contrast learning loss function of the following formula (2), so that the probability of decoding a correct abstract by the trained decoder is maximized, and the probability of decoding an incorrect abstract is minimized.
LDecoder=max(P1-P2+η,0) (2)
Where P1 is the correct prediction probability value of the decoder, P2 is the incorrect prediction probability value of the decoder, and η is the comparative learning interval parameter. By means of the training, the decoder can have the sensing capacity for fact errors, error-prone fact inconsistent results can be distinguished, and the probability of generating correct digests with fact consistency is improved.
The following embodiments of the present application describe the process of constructing the error sample summary in the training sample.
As shown in fig. 6, the process of generating the error sample summary in each training sample may include the following steps S601-S603:
step S601, searching a keyword set from the correct sample abstract in the training sample.
In this step, the correct sample abstract may be iteratively cut for multiple times, each cut obtaining one abstract segment, and finally obtaining multiple abstract segments, and then selecting the most suitable abstract segment from the multiple abstract segments, and using multiple words in the selected abstract segment as a keyword set.
Alternatively, step S601 may include the following steps B1-B4:
and step B1, obtaining the context sentence corresponding to at least one abstract sentence in the correct sample abstract from the sample text in the training sample.
The correct sample summary may include one or more summary statements, each of which may correspond to a contextual statement in the sample text. Specifically, context sentences corresponding to each sentence in the correct sample summary can be searched from the sample text based on a text summary Evaluation index (round-organized Understudy for learning Evaluation, for example: and for each abstract statement in the correct sample abstract, calculating the ROUGE indexes of each text statement in the sample text and the abstract statement respectively, and taking the text statement with the maximum value of the ROUGE index as the context statement of the abstract statement.
Step B2, for at least one abstract statement, iteratively executing the following operations for a plurality of times: respectively cutting at least one abstract statement to obtain word segments corresponding to the at least one abstract statement, splicing the at least one word segment into abstract segments, and taking each word segment as a new abstract statement;
the clipping process may be random clipping, for example, simple rule processing such as removing all stop words, randomly deleting some phrases and clauses, and the like. And respectively carrying out multiple times of iterative clipping processing on at least one summary statement of the correct sample summary, and combining the obtained at least one clipping statement into a summary segment after each time of clipping processing is finished. If the correct sample digest comprises a digest statement, a digest segment may be obtained each time the clipping process is completed. The number of times of the clipping processing may be determined according to the specific situation.
For example, assuming that the correct sample abstract includes an abstract sentence "read franklin self-transmission", i understand how the excellent person has passed the life ", the abstract sentence is cut for the first time to obtain an abstract segment" < franklin self-transmission ", understanding, excellent person, and passing the life", the cut abstract segment is cut again to obtain a second abstract segment "< franklin self-transmission, excellent person", and so on.
Step B3, for the obtained plurality of summary segments, performing the following operations respectively: and determining an evaluation value of the summary segment based on the summary segment and at least one context statement corresponding to the correct sample summary.
The evaluation value of one summary segment may include a correlation value indicating a degree of correlation between the summary segment and each context sentence, and a compression rate value indicating a length of the summary segment.
Optionally, the determining the evaluation value of a summary segment based on a summary segment and at least one context statement corresponding to the correct sample summary in the step B3 may include the following steps B31-B34:
step B31, for at least one context statement, respectively performing the following operations: a context statement and a summary segment are input into an autoregressive language model to obtain a first language prediction probability.
After a text is input into the autoregressive language model, the prediction probability of each word in the text can be obtained, and then the prediction probabilities of all the words are respectively logarithmized and then summed, so that the language prediction probability of the text is obtained.
In order to determine the relevance of a summary segment to at least one context statement, the summary segment is spliced with an upper statement and a lower statement, then the self-regression language model is input to obtain a first language prediction probability, and the first language prediction probability is used as a similarity value of the summary segment and the context statement.
And step B32, obtaining a correlation value of the abstract fragment based on the obtained at least one first language prediction probability.
If the correct sample abstract corresponds to an up-down statement, a first language prediction probability can be obtained and is used as a correlation value of an abstract fragment; if the correct sample summary corresponds to a plurality of context statements, a plurality of first language prediction probabilities may be obtained, and the plurality of first language prediction probabilities may be averaged to serve as a relevance value for a summary segment.
And step B33, inputting the abstract segment into the autoregressive language model to obtain a second language prediction probability, and obtaining a compression ratio value of the abstract segment based on the second language prediction probability.
Where minimizing the language prediction probability of a text fragment is equivalent to maximizing the compression ratio given the text. Therefore, the second language prediction probability can be inverted to obtain the compression ratio value of the abstract fragment; for example: the second language prediction probability is 0.4, and after negation, the second language prediction probability is 1-0.4-0.6, i.e., the compression ratio value is 0.6.
Step B34, the correlation value and the compression ratio value of a summary segment are used as the evaluation value of a summary segment.
And step B4, selecting a target abstract section from the plurality of abstract sections based on the evaluation value of each of the plurality of abstract sections, and combining a plurality of words in the target abstract section into a keyword set.
In this step, a target digest segment whose evaluation value satisfies the condition may be selected from the plurality of digest segments. For example: when the evaluation value of one digest segment includes a correlation value and a compression ratio value, the correlation value can be maximized in the case where the compression ratio value satisfies a preset condition.
Alternatively, the selecting of the target digest segment from the plurality of digest segments based on the evaluation values of each of the plurality of digest segments in the above step S6014 may include the following steps C1-C2:
and C1, selecting a plurality of candidate abstract segments with compression ratio values meeting preset conditions from the plurality of abstract segments.
Wherein, the preset condition may be: and arranging a plurality of abstract fragments from large to small according to the compression ratio value, and then arranging the abstract fragments in the first k.
And C2, taking the candidate summary segment with the largest correlation value in the plurality of candidate summary segments as the target summary segment.
Considering that the higher the compression rate of the summary segment is, the fewer the included keywords are, the less the disturbance to the correct sample summary is, therefore, when selecting the target summary segment (i.e. the keyword set that needs to be replaced), it should be the case that the correlation is maximized in the condition that the compression rate is as high as possible, so that the more similar and difficult to distinguish the constructed error sample summary from the correct sample summary, such error sample summary can provide greater value for the learning of the summary generation model.
Based on the above, the first k summary segments with the largest compression ratio value are selected from the plurality of summary segments as candidate summary segments, and then the target summary segment with the largest correlation value is selected from the k candidate summary segments. In this way, the correlation of the target summary segment can be maximized in the case that the compression rate is as high as possible.
Step S602, for a plurality of keywords in the keyword set, respectively performing the following operations: if the replacement probability of one keyword reaches the preset probability, replacing one keyword with a corresponding replacement word; wherein the replacement word is selected from sample text in the training sample.
For each keyword, a numerical value of 0-1 can be randomly generated, if the generated numerical value reaches a preset numerical value, replacement is needed, otherwise, replacement is not needed; for example: the preset value is 0.1 or 0.2, and is not limited herein.
After the replacement probability of a keyword is determined to reach the preset probability, a replacement word most similar to the word vector of the keyword can be selected from the sample text, and the keyword is replaced by the replacement word.
And step S603, taking the replaced correct sample summary as an incorrect sample summary in the training sample.
The following describes the construction process of the error sample summary in the above embodiment of the present application with reference to fig. 7.
As shown in fig. 7, when constructing an incorrect sample abstract based on a sample text and a corresponding correct sample abstract, the context sentence(s) corresponding to the correct sample abstract is searched from the sample text, and meanwhile, a plurality of abstract segments are generated based on the correct sample abstract; inputting the abstract segments and each context statement into an autoregressive language model aiming at each abstract segment to obtain an evaluation value of the abstract segment; selecting a target abstract segment from the plurality of abstract segments based on the evaluation value of each of the plurality of abstract segments to obtain a keyword set; and replacing each keyword in the keyword set according to the replacement probability, specifically replacing the keyword with a word which is the closest to the word vector of the keyword in the sample text, and replacing the correct sample abstract to obtain an incorrect sample abstract.
Illustratively, as shown in fig. 8, by using the method for constructing the error sample summary in the above embodiment of the present application, the constructed samples of the error sample summary include various error types, the correct summary in fig. 8 is the correct sample summary, and the negative summary is the error sample summary; for example: sample 1 is noun error, sample 2 is numeric error, sample 3 is adjective error, sample 4 is physical error, sample 5 is phrase error, sample 6 is verb error, and so on. The constructed error sample abstract has the characteristic of diversification, covers various types of errors, and is more comprehensive and efficient than manually designed error sample abstract. In addition, the error sample abstract which is quite similar to the correct sample abstract can be constructed in the embodiment of the application, and the constructed error sample abstract is more similar to the correct sample abstract and is more difficult to distinguish, so that the error sample abstract can provide a greater value for learning of the abstract generation model.
Fig. 9 is a logic diagram illustrating a method for training a digest generation model according to an embodiment of the present application.
As shown in fig. 9, after obtaining a sample text and a corresponding correct sample abstract, constructing an incorrect sample abstract based on the sample text and the correct sample abstract, and taking the sample text, the correct sample abstract, and the incorrect sample abstract as a training sample; performing comparison learning on the encoder and the decoder based on the obtained training sample set to obtain a trained abstract generation model; and when the trained abstract generation model is used on line, inputting the on-line text into the abstract generation model to obtain a text abstract corresponding to the on-line text.
The method for training the abstract generation model in the embodiment of the application can be applied to various abstract generation scenes, such as: the method comprises the steps of commodity file generation, automatic comment generation, medical/legal report generation, news abstract generation, information flow article title generation, sports report generation and the like, achieves redundant information filtering of texts, and improves reading experience and reading efficiency of users.
For example, taking the generation of the information flow article title as an example, as shown in fig. 10, the information flow article "a rational word healer wants you to find a little consortium for life in my words.. then, the following shares four books with everybody that we feel to be more helpful to me"... then, "after inputting the trained abstract generation model, the information flow article title" is obtained, and the four books are recommended and well suited for reading by everybody: enrich oneself, can never eat the loss ".
In the following, the abstract generating model trained by the method of the embodiment of the present application (including only performing encoder contrast learning, only performing decoder contrast learning, and performing combined contrast learning of encoding and decoding) is compared with the unmodified abstract generating model (basic scheme) on two common public data sets, as can be seen from tables 1 and 2 below, the abstract generating model trained by the method of the embodiment of the present application is improved in four common reality consistency indexes. Table 1 below shows the comparison results on the public data set 1:
TABLE 1
Figure BDA0003543202530000191
Figure BDA0003543202530000201
Table 2 below is a comparison on public data set 2:
TABLE 2
Figure BDA0003543202530000202
It can be seen that the comparative learning optimization scheme at the encoder and decoder outperforms the base scheme in almost all metrics on both public data sets, and the codec combination optimization scheme can further improve the fact consistency of the digests.
In addition, comparing the encoder comparison learning scheme of the embodiment of the present application with the prior art (on the basis of the digest generation model, a preprocessing module is added for encoding optimization or a post-processing module is added for decoding optimization), the comparison result on the fair-comparison public data set 3 is shown in the following table 3:
TABLE 3
Figure BDA0003543202530000203
Figure BDA0003543202530000211
It can be seen that both the scheme of the embodiment of the present application and the existing decoding scheme are superior to the existing encoding scheme, but the encoding scheme (i.e. encoder contrast learning) of the embodiment of the present application does not need to introduce an additional processing module and does not increase the burden of on-line inference compared with the existing decoding scheme.
The method embodiment of the present application is based on the same inventive concept, and the embodiment of the present application further provides a device for training a summary generation model, and the principle of the device for solving the problem is similar to the method of the embodiment, so that the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 11, an embodiment of the present application provides an apparatus for training a digest generation model, where the digest generation model includes an encoding sub-model and a decoding sub-model, and the apparatus includes:
the training module 111 is configured to perform multiple rounds of iterative training on a to-be-trained digest generation model based on a training sample set, and output a trained digest generation model, where each training sample includes: sample text, and corresponding correct sample summary and incorrect sample summary; wherein, in a round of iteration process, the following operations are executed:
inputting the obtained training sample into a coding sub-model to obtain coded text features, correct abstract features and error abstract features;
determining a first loss value based on the similarity of the correct abstract characteristic and the error abstract characteristic with the text characteristic, and carrying out parameter adjustment on the coding sub-model based on the first loss value;
inputting the text features into the decoding submodel to obtain a prediction abstract, determining a second loss value based on the prediction abstract and the correct sample abstract, and performing parameter adjustment on at least one of the encoding submodel and the decoding submodel based on the second loss value.
In the scheme of the embodiment of the application, the coding submodel of the abstract generation model is contrastively learned by adopting the training sample consisting of the sample text, the correct sample abstract and the error sample abstract, so that the coding of the correct abstract is more similar to the coding of the text and the coding of the error abstract is more dissimilar to the coding of the text under the condition of the given text of the trained coding submodel. Therefore, the encoder can correctly encode the fact information in the text, reduce the fact error and further enable the trained decoder to output the text abstract with the fact consistency with the text.
In one possible embodiment, when determining the second loss value based on the prediction digest and the correct sample digest, the training module is further configured to:
determining a correct prediction probability value based on the prediction summary and the correct sample summary, and determining an incorrect prediction probability value based on the prediction summary and the incorrect sample summary;
a second loss value is determined based on the correct prediction probability value and the incorrect prediction probability value.
In a possible embodiment, the method further comprises an obtaining module, configured to obtain a summary of the error samples in each training sample by:
searching a keyword set from a correct sample abstract in a training sample;
aiming at a plurality of keywords in the keyword set, respectively executing the following operations: if the replacement probability of one keyword reaches the preset probability, replacing one keyword with a corresponding replacement word; wherein the replacement word is selected from sample texts in the training sample;
and taking the replaced correct sample abstract as an error sample abstract in the training sample.
In a possible embodiment, when searching for the keyword set from the correct sample summary in the training sample, the obtaining module is further configured to:
obtaining context sentences corresponding to at least one abstract sentence in a correct sample abstract from a sample text in a training sample;
iteratively performing, for at least one summary statement, a plurality of times: respectively cutting at least one abstract statement to obtain word segments corresponding to the at least one abstract statement, splicing the at least one word segment into abstract segments, and taking each word segment as a new abstract statement;
for the obtained plurality of summary segments, the following operations are respectively executed: determining an evaluation value of a summary segment based on the summary segment and at least one context statement corresponding to the correct sample summary;
selecting a target summary segment from the plurality of summary segments based on the evaluation value of each of the plurality of summary segments, and composing a plurality of words in the target summary segment into a keyword set.
In a possible embodiment, when determining the evaluation value of a summary segment based on the summary segment and at least one context statement corresponding to the correct sample summary, the obtaining module is further configured to:
for at least one context statement, performing the following operations, respectively: inputting a context sentence and an abstract segment into an autoregressive language model to obtain a first language prediction probability;
obtaining a relevance value of a summary segment based on the obtained at least one first language prediction probability;
inputting the abstract fragment into an autoregressive language model to obtain a second language prediction probability, and obtaining a compression ratio value of the abstract fragment based on the second language prediction probability;
the correlation value and the compression ratio value of a summary segment are used as evaluation values of a summary segment.
In a possible embodiment, when selecting the target summary segment from the plurality of summary segments based on the evaluation value of each of the plurality of summary segments, the obtaining module is further configured to:
selecting a plurality of candidate abstract segments with compression ratio values meeting preset conditions from the plurality of abstract segments;
and taking the candidate summary segment with the maximum correlation value in the plurality of candidate summary segments as a target summary segment.
For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.
With regard to the apparatus in the above embodiment, the specific implementation manner of each module has been described in detail in the embodiment related to the method, and will not be elaborated herein.
Having described the method and apparatus for training a digest generation model according to an exemplary embodiment of the present application, an electronic device according to another exemplary embodiment of the present application is described next.
Those skilled in the art will appreciate that embodiments of the present application may be provided as a method, system, or computer program product for convenience of description, the various portions of which are described separately as modules, separated by functional partitions. Of course, the functionality of the various modules may be implemented in the same one or more pieces of software or hardware when implementing the present application.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
Based on the same inventive concept as the method embodiment of the present application, an embodiment of the present application further provides an electronic device, and a principle of the electronic device to solve the problem is similar to the method of the embodiment, so that the implementation of the electronic device may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 12, an electronic device 1200 may include at least a processor 1201 and a memory 1202. The memory 1202 stores program codes, and when the program codes are executed by the processor 1201, the processor 1201 executes the steps of any one of the above methods for training the digest generation model.
In some possible embodiments, an electronic device according to the present application may include at least one processor, and at least one memory. The memory has stored therein program code which, when executed by the processor, causes the processor to perform the steps of the method of training a digest generation model according to various exemplary embodiments of the present application described above in the present specification. For example, a processor may perform the steps as shown in fig. 2.
In an exemplary embodiment, the present application further provides a storage medium, such as a memory 1202, including program code executable by the processor 1201 of the electronic device 1200 to perform the above-described method of training a digest generation model. Alternatively, the storage medium may be a non-transitory computer readable storage medium, for example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The electronic apparatus 130 according to this embodiment of the present application is described below with reference to fig. 13. The electronic device 130 of fig. 13 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 13, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processing unit 131, the at least one memory unit 132, and a bus 133 connecting various system components (including the memory unit 132 and the processing unit 131).
Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The storage unit 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Storage unit 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Based on the same inventive concept as the above method embodiments, the present application embodiments provide a computer program product or a computer program comprising computer instructions, which are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the steps of any one of the above methods for training the digest generation model.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method of training a digest generation model comprising an encoding submodel and a decoding submodel, the method comprising:
carrying out multi-round iterative training on the abstract generating model to be trained based on a training sample set, and outputting the trained abstract generating model, wherein each training sample comprises: sample text, and corresponding correct sample summary and incorrect sample summary; wherein, in a round of iteration process, the following operations are executed:
inputting the obtained training sample into the coding sub-model to obtain coded text features, correct abstract features and error abstract features;
determining a first loss value based on the similarity of the correct abstract feature and the wrong abstract feature with the text feature respectively, and carrying out parameter adjustment on the coding sub-model based on the first loss value;
inputting the text features into the decoding submodel to obtain a prediction abstract, determining a second loss value based on the prediction abstract and the correct sample abstract, and performing parameter adjustment on at least one of the encoding submodel and the decoding submodel based on the second loss value.
2. The method of claim 1, wherein determining a second loss value based on the prediction digest and the correct sample digest comprises:
determining a correct prediction probability value based on the predictive summary and the correct sample summary, and determining a wrong prediction probability value based on the predictive summary and the wrong sample summary;
determining the second loss value based on the correct prediction probability value and the incorrect prediction probability value.
3. The method according to claim 1 or 2, wherein the error sample summary in each training sample is obtained by:
searching a keyword set from a correct sample abstract in the training sample;
for a plurality of keywords in the keyword set, respectively performing the following operations: if the replacement probability of one keyword reaches a preset probability, replacing the keyword with a corresponding replacement word; wherein the replacement word is selected from sample text in the training sample;
and taking the replaced correct sample abstract as an error sample abstract in the training sample.
4. The method of claim 3, wherein the searching for the keyword set from the correct sample summary in the training sample comprises:
obtaining context sentences corresponding to at least one abstract sentence in the correct sample abstract from sample texts in the training samples;
iteratively performing, for the at least one summary statement, a plurality of times: respectively cutting the at least one abstract statement to obtain word segments corresponding to the at least one abstract statement, splicing the at least one word segment into abstract segments, and taking each word segment as a new abstract statement;
for the obtained plurality of summary segments, the following operations are respectively executed: determining an evaluation value of one abstract section based on the abstract section and at least one context statement corresponding to the correct sample abstract;
selecting a target summary segment from the plurality of summary segments based on the evaluation value of each of the plurality of summary segments, and composing a plurality of words in the target summary segment into the keyword set.
5. The method of claim 4, wherein determining the evaluation value of the one digest segment based on the at least one context statement corresponding to the one digest segment and the correct sample digest comprises:
for the at least one context statement, performing the following operations, respectively: inputting a context statement and the abstract segment into an autoregressive language model to obtain a first language prediction probability;
obtaining a relevance value of the summary segment based on the obtained at least one first language prediction probability;
inputting the abstract fragment into the autoregressive language model to obtain a second language prediction probability, and obtaining a compression ratio value of the abstract fragment based on the second language prediction probability;
and taking the correlation value and the compression ratio value of the summary segment as the evaluation value of the summary segment.
6. The method of claim 5, wherein selecting a target summary segment from the plurality of summary segments based on the evaluation values of the respective plurality of summary segments comprises:
selecting a plurality of candidate abstract segments with compression ratio values meeting preset conditions from the plurality of abstract segments;
and taking the candidate summary segment with the maximum correlation value in the plurality of candidate summary segments as the target summary segment.
7. An apparatus for training a digest generation model including an encoding submodel and a decoding submodel, the apparatus comprising:
the training module is used for carrying out multi-round iterative training on the abstract generating model to be trained on the basis of a training sample set and outputting the trained abstract generating model, and each training sample comprises: sample text, and corresponding correct sample summary and incorrect sample summary; wherein, in a round of iteration process, the following operations are executed:
inputting the obtained training sample into the coding sub-model to obtain coded text features, correct abstract features and error abstract features;
determining a first loss value based on the similarity between the correct abstract feature and the text feature and the similarity between the incorrect abstract feature and the text feature, and performing parameter adjustment on the encoding submodel based on the first loss value;
inputting the text features into the decoding submodel to obtain a prediction abstract, determining a second loss value based on the prediction abstract and the correct sample abstract, and performing parameter adjustment on at least one of the encoding submodel and the decoding submodel based on the second loss value.
8. An electronic device, comprising a processor and a memory, wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 6.
9. A computer-readable storage medium, characterized in that it comprises program code for causing an electronic device to carry out the steps of the method according to any one of claims 1 to 6, when said program code is run on said electronic device.
10. A computer program product comprising computer instructions, the computer instructions being stored in a computer readable storage medium; when a processor of a computer device reads the computer instructions from the computer readable storage medium, the processor executes the computer instructions to cause the computer device to perform the steps of the method of any of claims 1-6.
CN202210238223.6A 2022-03-11 2022-03-11 Method, device, equipment and medium for training abstract generation model Pending CN114625866A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210238223.6A CN114625866A (en) 2022-03-11 2022-03-11 Method, device, equipment and medium for training abstract generation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210238223.6A CN114625866A (en) 2022-03-11 2022-03-11 Method, device, equipment and medium for training abstract generation model

Publications (1)

Publication Number Publication Date
CN114625866A true CN114625866A (en) 2022-06-14

Family

ID=81902276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210238223.6A Pending CN114625866A (en) 2022-03-11 2022-03-11 Method, device, equipment and medium for training abstract generation model

Country Status (1)

Country Link
CN (1) CN114625866A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099240A (en) * 2022-06-17 2022-09-23 北京百度网讯科技有限公司 Text generation model training method and device and text generation method and device
CN115129819A (en) * 2022-07-14 2022-09-30 广州欢聚时代信息科技有限公司 Text abstract model production method and device, equipment and medium thereof
CN115982343A (en) * 2023-03-13 2023-04-18 阿里巴巴达摩院(杭州)科技有限公司 Abstract generation method, method and device for training abstract generation model
CN116383027A (en) * 2023-06-05 2023-07-04 阿里巴巴(中国)有限公司 Man-machine interaction data processing method and server
KR102652009B1 (en) * 2023-09-07 2024-03-27 아이보람 주식회사 Method and apparatus for providing a video-based an e-book applying native language acquisition principles to a user terminal using a neural network

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099240A (en) * 2022-06-17 2022-09-23 北京百度网讯科技有限公司 Text generation model training method and device and text generation method and device
CN115099240B (en) * 2022-06-17 2023-12-26 北京百度网讯科技有限公司 Text generation model training method and device, text generation method and device
CN115129819A (en) * 2022-07-14 2022-09-30 广州欢聚时代信息科技有限公司 Text abstract model production method and device, equipment and medium thereof
CN115982343A (en) * 2023-03-13 2023-04-18 阿里巴巴达摩院(杭州)科技有限公司 Abstract generation method, method and device for training abstract generation model
CN115982343B (en) * 2023-03-13 2023-08-22 阿里巴巴达摩院(杭州)科技有限公司 Abstract generation method, and method and device for training abstract generation model
CN116383027A (en) * 2023-06-05 2023-07-04 阿里巴巴(中国)有限公司 Man-machine interaction data processing method and server
CN116383027B (en) * 2023-06-05 2023-08-25 阿里巴巴(中国)有限公司 Man-machine interaction data processing method and server
KR102652009B1 (en) * 2023-09-07 2024-03-27 아이보람 주식회사 Method and apparatus for providing a video-based an e-book applying native language acquisition principles to a user terminal using a neural network

Similar Documents

Publication Publication Date Title
CN111324728B (en) Text event abstract generation method and device, electronic equipment and storage medium
CN112131366B (en) Method, device and storage medium for training text classification model and text classification
CN108959312B (en) Method, device and terminal for generating multi-document abstract
CN114625866A (en) Method, device, equipment and medium for training abstract generation model
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
CN112131350B (en) Text label determining method, device, terminal and readable storage medium
CN111241237B (en) Intelligent question-answer data processing method and device based on operation and maintenance service
Al-Maleh et al. Arabic text summarization using deep learning approach
Nassiri et al. Transformer models used for text-based question answering systems
Mills et al. Graph-based methods for natural language processing and understanding—A survey and analysis
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
Wang et al. Encoding syntactic dependency and topical information for social emotion classification
CN113392651A (en) Training word weight model, and method, device, equipment and medium for extracting core words
Jiang et al. Enhancements of attention-based bidirectional lstm for hybrid automatic text summarization
Kumar et al. An abstractive text summarization technique using transformer model with self-attention mechanism
Li et al. LSTM-based deep learning models for answer ranking
Hsueh et al. A Task-oriented Chatbot Based on LSTM and Reinforcement Learning
CN113392647B (en) Corpus generation method, related device, computer equipment and storage medium
CN114330483A (en) Data processing method, model training method, device, equipment and storage medium
CN111507108B (en) Alias generation method and device, electronic equipment and computer readable storage medium
Yi et al. A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems
CN116414988A (en) Graph convolution aspect emotion classification method and system based on dependency relation enhancement
Lu et al. Weakly supervised concept map generation through task-guided graph translation
Choi et al. Neural attention model with keyword memory for abstractive document summarization
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination