CN114818690A - Comment information generation method and device and storage medium - Google Patents

Comment information generation method and device and storage medium Download PDF

Info

Publication number
CN114818690A
CN114818690A CN202110119102.5A CN202110119102A CN114818690A CN 114818690 A CN114818690 A CN 114818690A CN 202110119102 A CN202110119102 A CN 202110119102A CN 114818690 A CN114818690 A CN 114818690A
Authority
CN
China
Prior art keywords
topic
participle
text
comment information
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110119102.5A
Other languages
Chinese (zh)
Inventor
王伟
李丕绩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110119102.5A priority Critical patent/CN114818690A/en
Publication of CN114818690A publication Critical patent/CN114818690A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The application provides a comment information generation method, a comment information generation device and a comment information generation storage medium, belongs to the technical field of computers, relates to artificial intelligence and natural language processing technologies, and is used for improving the diversity of comment information. Determining a key word segmentation set corresponding to the text to be processed based on first similarity between each word segmentation of the text to be processed and a corresponding title, wherein the title represents core content of the text to be processed; determining a target topic corresponding to the text to be processed and a corresponding target topic word segmentation set based on a second similarity between each word segmentation of the text to be processed and each preset topic, wherein each target topic represents a recommended comment angle aiming at the text to be processed, and each target topic corresponds to at least one topic word segmentation; and generating target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set. When the comment information is generated, important information and comment angles in the text to be processed are considered, and diversified and rich comment information can be generated.

Description

Comment information generation method and device and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a comment information generation method, a comment information generation device and a comment information generation storage medium.
Background
With the popularity of various reading websites, a user can comment various texts in the reading website in the process of browsing a certain reading website so as to express the viewpoint of the user on the texts and highlight the main information of the texts. Therefore, an automatic comment information generation technology is provided for facilitating the comment of a user on a text.
An automatic comment information generation technique automatically generates comment information based on a given text. At present, a comment information generation framework based on a sequence-to-sequence (seq2seq, S2S) structure of a Long Short Term Memory network (LSTM) is mainly adopted to generate comment information. The framework mainly comprises an encoder and a decoder.
The process of generating comment information through the framework is as follows: inputting text participles contained in a text into an encoder, and encoding the text participles by the encoder to obtain semantic vectors of the text participles; the encoder then generates participle semantic vectors in the comment information one by applying an attention mechanism to the semantic vectors of the text participles and converts the participle semantic vectors into participles to generate the comment information. Obviously, comment information generated only from the participles in the text is not rich enough.
Disclosure of Invention
The application provides a comment information generation method, a comment information generation device and a comment information generation storage medium, which are used for improving the rich diversity of comment information.
In a first aspect, an embodiment of the present application provides a method for generating comment information, where the method includes:
determining a key word segmentation set corresponding to the text to be processed based on first similarity between each word segmentation contained in the text to be processed and a corresponding title, wherein the title represents core content of the text to be processed;
determining a target topic corresponding to the text to be processed and a corresponding target topic word segmentation set based on a second similarity between each word segmentation contained in the text to be processed and each preset topic, wherein each target topic represents a recommended comment angle aiming at the text to be processed, and each target topic corresponds to at least one topic word segmentation;
and generating target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set.
In a second aspect, the present application provides an apparatus for comment information generation, including:
the first determining unit is used for determining a key word segmentation set corresponding to the text to be processed based on first similarity between each word segmentation contained in the text to be processed and a corresponding title, and the title represents the core content of the text to be processed;
the second determining unit is used for determining a target topic corresponding to the text to be processed and a corresponding target topic word segmentation set based on a second similarity between each word segmentation contained in the text to be processed and each preset topic, wherein each target topic represents a recommended comment angle aiming at the text to be processed, and each target topic corresponds to at least one topic word segmentation;
and the generating unit is used for generating target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set.
In a possible implementation manner, the first determining unit is specifically configured to:
inputting each participle and title contained in the text to be processed into a first prediction submodel of the trained comment information generation model, and determining first similarity of each participle and title;
determining a first selection probability corresponding to each participle based on the first similarity, wherein the first selection probability is used for representing the probability of each participle being selected as a key participle;
and selecting at least one participle from each participle contained in the text to be processed to form a key participle set based on the first selected probability corresponding to each participle.
In a possible implementation manner, the first determining unit is specifically configured to:
respectively determining second selected probabilities of the corresponding participles based on the first selected probabilities of the participles, wherein the difference between the value of each second selected probability and 0 or 1 is smaller than a preset value, and each second selected probability is used for representing the probability that the corresponding participles are selected as key participles;
and screening out the participles with the difference value between the value of the second selected probability and 1 of each participle contained in the text to be processed as key participles based on the second selected probability of each participle, and forming a key participle set.
In a possible implementation manner, the first determining unit determines, based on the first selected probability of each participle, the second selected probability of the corresponding participle, specifically to:
determining a second selected probability of the corresponding participle in a Gumbel-Softmax distribution mode based on the first selected probability of each participle; alternatively, the first and second electrodes may be,
and determining a second selected probability of the corresponding participle in a Bernoulli distribution mode based on the first selected probability of each participle.
In a possible implementation manner, the second determining unit is specifically configured to:
inputting each participle contained in the text to be processed into a second prediction submodel of the trained comment information generation model;
respectively determining second similarity between each participle and each preset topic based on each preset topic in the second prediction submodel, and respectively determining topics associated with each participle based on each obtained second similarity;
respectively determining topic selection probability of each topic based on the obtained word segmentation number associated with each topic;
and determining a target topic corresponding to the text to be processed based on the topic selection probability of each topic.
In a possible implementation manner, the second predictor model is an MLP including a Softmax function, and the second predictor model is obtained by training as follows:
according to a first prediction training sample in the first prediction training sample data set, performing loop iteration training on the second predictor model, and outputting the trained second predictor model when a preset convergence condition is met, wherein the following operations are performed in the process of one loop iteration training:
selecting a first prediction training sample from a first prediction training sample data set, wherein the first prediction training sample comprises a historical text and at least one piece of corresponding first historical comment information, and the historical text comprises at least one historical word segmentation;
inputting at least one history word segmentation contained in the history text in the first prediction training sample into a second pre-constructed prediction submodel;
based on each preset topic in a pre-constructed second prediction submodel, respectively obtaining topics associated with each historical participle through a Softmax function;
respectively determining the predicted topic selection probability of each topic based on the obtained historical word segmentation number associated with each topic;
and constructing a first loss function based on the real topic selection probability and the predicted topic selection probability corresponding to each topic, and carrying out parameter adjustment on the second prediction submodel based on the first loss function, wherein the real topic selection probability is determined according to at least one piece of first historical comment information corresponding to the historical text.
In one possible implementation, the true topic selection probability is determined by:
inputting at least one piece of first historical comment information in a first prediction training sample into a topic perception submodel of a trained comment information generation model;
obtaining a first semantic vector of each piece of first historical comment information based on the topic perception submodel, and determining a first historical topic corresponding to the corresponding first historical comment information based on each obtained first semantic vector;
based on the obtained number of the first historical comment information associated with each first historical topic, respectively determining the historical topic selection probability of each first historical topic, and taking the historical topic selection probability as the true topic selection probability.
In one possible implementation, the topic perception topic sub-model is trained by:
according to a second prediction training sample in the second prediction training sample data set, executing loop iterative training on the topic perception submodel, and outputting the trained topic perception submodel when a preset convergence condition is met; wherein the following operations are executed in a loop iteration training process:
selecting a second prediction training sample from a second prediction training sample data set, wherein the second prediction training sample comprises at least one piece of second historical comment information;
inputting each second historical comment information in a second prediction training sample into a pre-constructed topic perception submodel, and determining a second semantic vector corresponding to each second historical comment information;
determining a second historical topic corresponding to the corresponding second historical comment information based on a second semantic vector corresponding to each second historical comment information, and determining the posterior topic selection probability based on the second historical topic;
and constructing a second loss function based on the second semantic vector and the posterior topic selection probability, and carrying out parameter adjustment on the topic perception submodel based on the second loss function.
In one possible implementation, constructing the second loss function based on the second semantic vector and the posterior topic selection probability includes:
reconstructing corresponding predicted comment information based on the second semantic vector;
and constructing a second loss function based on the distance between the predicted comment information, the posterior topic selection probability and the corresponding prior topic selection probability, and aligning the second semantic vector to the topic vector of the corresponding topic.
In a possible implementation manner, the generating unit is specifically configured to:
inputting the key participle set and the target topic participle set into a comment information generation submodule of a trained comment information generation model, executing multi-round comment participle set prediction through the comment information generation submodule, and generating target comment information based on a comment participle set output by the last round of prediction;
the process of word segmentation prediction of each round of comments is as follows:
determining a third selection probability of each key participle in the key participle set, a fourth selection probability of each topic participle in the target topic participle set and a fifth selection probability of each high-frequency participle in a preset high-frequency participle set through an attention mechanism according to the comment participle in the comment participle set predicted in the previous round;
and predicting the comment participle of the current turn from the key participle set, the target topic participle set or the high-frequency participle set according to a third selection probability, a fourth selection probability and a fifth selection probability, wherein the third selection probability, the fourth selection probability and the fifth selection probability are respectively used for representing the probability that each key participle, each topic participle and each high-frequency participle are selected as comment participles.
In a third aspect, an embodiment of the present application provides a comment information generation device, including: a memory and a processor, wherein the memory is configured to store computer instructions; and the processor is used for executing the computer instructions to realize the comment information generation method provided by the embodiment of the application.
In a fourth aspect, the present application provides a computer-readable storage medium, where computer instructions are stored, and when executed by a processor, the computer instructions implement the method for generating comment information provided in the present application.
The beneficial effect of this application is as follows:
the embodiment of the application provides a comment information generation method, a comment information generation device and a storage medium; in the embodiment of the application, in the process of generating comment information for a text to be processed, each participle and title contained in the text to be processed are firstly obtained, and the title is used for representing the core content of the text to be processed; then, according to a first similarity between the title and each participle of the text to be processed, selecting at least one participle from each participle of the text to be processed to form a key participle set, wherein the key participle set is used for representing important information of the text to be processed, namely extracting the important information from the text to be processed; meanwhile, based on second similarity between each participle contained in the text to be processed and each preset topic, determining a target topic corresponding to the text to be processed and a corresponding target topic participle set, wherein each target topic represents a recommended comment angle for the text to be processed, and each target topic corresponds to at least one topic participle, namely determining a comment angle of a reader for the text to be processed; then generating target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set;
according to the scheme for generating the comment information, the key participles in the text to be processed and topic participles corresponding to the text to be processed are paid more attention to not only the participles contained in the text to be processed, but also the target comment information is generated according to the important information of the text to be processed and the comment angle of readers, so that the generated comment information is richer and more diversified.
Other vectors and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of comment information generation in the related art;
FIG. 2 is a schematic diagram of a network structure of the LSTM;
FIG. 3 is a diagram of an application scenario;
fig. 4 is a schematic structural diagram of a comment information generation model in a training process according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a network structure of an MLP;
FIG. 6 is a flowchart of a method for training a first predictor model according to an embodiment of the present application;
fig. 7 is a schematic diagram of topic clustering according to historical comment information according to an embodiment of the present application;
FIG. 8 is a flowchart of a method for training a topic perception sub-model according to an embodiment of the present application;
FIG. 9 is a flowchart of a method for training a second predictor model according to an embodiment of the present application;
fig. 10 is a schematic diagram of a comment information generation model in a comment information generation process according to an embodiment of the present application;
FIG. 11 is a flowchart of a method for generating comment information according to an embodiment of the present application;
FIG. 12 is a flowchart of an overall method for comment information generation provided by an embodiment of the present application;
fig. 13 is a diagram illustrating a structure of a comment information generation apparatus according to an embodiment of the present application;
fig. 14 is a block diagram of a computing device according to an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solution and advantages of the present application more clearly and clearly understood, the technical solution in the embodiments of the present application will be described below in detail and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein.
Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
1. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
2. Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
3. Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning generally includes techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like.
4. The Long Short Term Memory Network (LSTM) is a time-cycle Neural Network, and is designed to solve the Long-Term dependence problem of the general cycle Neural Network (RNN). Mainly for handling and predicting problems related to the timeline of event occurrences. Such as text related problems and timing related problems. The text related problems mainly relate to the contexts of natural language processing, text processing and the like, and further comprise a dialogue system, emotion analysis, machine translation and the like. Timing related issues are timing prediction issues such as predicting weather, temperature, etc. Meanwhile, the LSTM can be used as a complex nonlinear unit for constructing a larger deep neural network. In the embodiment of the present application, an LSTM is used to construct a comment information generation model, that is, the comment information generation model provided in the embodiment of the present application includes an LSTM.
5. The Transformer is a language sequence processing model based on an attention machine system, the Transformer utilizes the attention machine system to mine the correlation between participles in a text, the model learns the context semantic vector of the participles, the output effect of the model is improved, the attention machine system can be utilized to realize rapid parallelism, and the defect of slow training of a Recurrent Neural Network (RNN) is overcome.
6. A multi-layer Perceptron (MLP), also called Artificial Neural Network (ANN), is a feedforward Artificial Neural Network model that maps multiple input data sets onto a single output data set, and is mainly used for classifying data.
7. Clustering analysis, also called cluster analysis, is a statistical analysis method for studying (sample or index) classification problems, and is also an important algorithm for data mining. The cluster analysis is composed of a plurality of patterns, generally, the pattern is a vector of measurement or a point in a multidimensional space, wherein the content of the cluster analysis is very rich, and the cluster analysis comprises a systematic clustering method, an ordered sample clustering method, a dynamic clustering method, a fuzzy clustering method, a graph theory clustering method, a cluster forecasting method and the like.
Cluster analysis is based on similarity, with more similarity between patterns in one cluster than between patterns not in the same cluster. There are a number of classification problems in the natural and social sciences. The class refers to a set of similar elements, and the clustering refers to gathering the set of similar elements together.
8. The posterior topic selection probability can be called posterior topic distribution and prior topic selection probability, and the prior topic selection probability can be called prior topic distribution, prior topic distribution or prior topic distribution.
The prior topic selection probability is independent of the test result or independent of random sampling, and reflects the probability obtained according to other related parameters before the statistical test is carried out.
The posterior topic selection probability is the real topic selection probability obtained according to the result after statistical test.
9. The Bernoulli distribution, also known as a two-point distribution or 0-1 distribution, is a discrete distribution with two possible outcomes. 1 indicates success, the probability of occurrence is p (where 0< p < 1). 0 indicates failure and the probability of occurrence is q-1-p. For example, as compared with the two results of the bernoulli distribution in the embodiment of the present application, 1 means that the participle is selected as a key participle, and 0 means that the participle is not selected as a key participle.
The following briefly introduces the design concept of the embodiments of the present application.
The method is mainly based on how to automatically generate corresponding comment information for a given text, and in the related technology of automatically generating comment information, the comment information is generated mainly by adopting a comment generation framework based on a sequence-to-sequence structure of a long-time memory network. The framework mainly comprises an encoder and a decoder. Fig. 1 exemplarily provides a schematic diagram of generating comment information in the related art.
In the related art, when comment information is generated, text participles are firstly input into an encoder, and the encoder performs encoding processing on the text participles to obtain semantic vectors corresponding to the text participles. And then, the decoder applies an attention mechanism to the semantic vectors of the text participles to generate comment participle semantic vectors in the comments one by one, converts the comment participle semantic vectors into specific comment participles, and further generates comment information according to the comment participles.
Obviously, in the related art for automatically generating comment information, each participle included in a text is mainly used as a basis in the process of generating the comment information, a reader factor, namely a recommended comment angle corresponding to the text, is not considered, and the importance of each participle in the text is not considered, so that the generated comment information is single and boring.
Since different readers have different comment angles for different texts or the same text; each participle in the text also has corresponding importance, and key participles in the text can more intuitively reflect core content, main ideas and the like to be expressed by the text.
In view of this, the embodiments of the present application provide a method, an apparatus, and a storage medium for generating comment information by using artificial intelligence, machine learning, and natural language processing techniques. In the process of generating the comment information, determining a key participle set, a target topic corresponding to the text to be processed and a corresponding target topic participle set according to each participle of the text to be processed; and then, generating target comment information according to the target topic word segmentation set and the key word segmentation set. The target topic is used for representing a recommended comment angle of the text to be processed, and can also be called a comment angle of a predicted reader to the text to be processed, and the target topic word segmentation set contains at least one topic word segmentation corresponding to the comment angle, so that the topic word segmentation is important information used when comment information is generated for the corresponding comment angle. The key word segmentation set comprises at least one key word segmentation extracted from the text to be processed and represents important information extracted from the word segmentation set of the text to be processed; therefore, in the process of generating the target comment information, the comment angle corresponding to the text to be processed and the importance of each participle in the text to be processed are considered, and therefore the generated comment information is richer and more diverse.
The comment information generation method provided by the embodiment of the application can be divided into a training part of a comment information generation model and an application part of the comment information generation model. The comment information generation model mainly comprises a topic perception submodel, a first prediction submodel, a second prediction submodel and a comment generation submodel.
In the embodiment of the application, the comment information generation model is mainly used for automatically generating comment information according to the target topic participle set and the key participle set corresponding to the text to be processed.
The key participles in the key participle set are determined according to the importance scores or probabilities (namely the first selected probabilities in the embodiment of the application) output by each participle in the text to be processed based on the first prediction submodel;
the target topic word segmentation set is obtained from each topic stored in the topic perception submodel and the corresponding topic word segmentation set based on the topic selection probability of each topic corresponding to the text to be processed output by the second prediction submodel aiming at the text to be processed; each topic and the corresponding topic word segmentation set stored in the topic perception submodel are determined by the topic perception submodel based on a large amount of historical comment information.
In the embodiment of the application, the trained topic sensing submodel can output the true topic selection probability corresponding to any historical text aiming at the historical comment information corresponding to any historical text, and the true topic selection probability can be used for assisting in training the second prediction submodel.
Therefore, in the embodiment of the application, the topic perception sub-model is trained to obtain the trained topic perception sub-model; then, based on the real topic selection probability corresponding to the historical text output by the trained topic sensing submodel and the topic segmentation set corresponding to each topic, the training of other submodels is assisted.
The training part of the comment information generation model comprises the following steps:
1. the first predictor model:
and obtaining a third prediction training sample data set used in training the first prediction submodel, wherein the third prediction training sample data set comprises historical texts, and specifically comprises title information and body information corresponding to each historical text.
According to a third prediction training sample in a third prediction training sample data set, performing loop iteration training on the first prediction submodel, and outputting the trained first prediction submodel when a preset convergence condition is met, wherein the following operations are performed in the process of one loop iteration training:
inputting historical title vectors of the historical texts and historical word segmentation vectors contained in the historical texts into a first pre-constructed predictor model;
determining and outputting a first history selection probability that the history participles corresponding to each history participle vector are selected as history key participles through a pre-constructed first predictor model;
selecting historical participles based on the selected probability of each first history, and forming a historical key participle set;
in order to encourage fewer historical key participles in the historical key participle set, the embodiment of the present application designs an L1 norm loss function, and performs parameter adjustment on the first predictor model based on the L1 norm loss function.
In the embodiment of the present application, the historical title vector of the historical text and each historical participle vector included in the historical text are obtained by performing encoding processing on the historical text based on an encoder with an LSTM structure or an encoder with a transform structure.
Taking an example that an encoder based on an LSTM structure obtains a historical title vector and each historical word segmentation vector of a historical text; fig. 2 illustratively provides a network architecture diagram of the LSTM. Referring to fig. 2, it can be seen that the LSTM includes a plurality of neurons, and the use of any one of the neurons of the LSTM is described below.
First, the context vector h of the last neuron is output t-1 And the historical word segmentation vector x of the current neuron input t As an input, the memory cell state C of the last neuron is obtained through the forgetting gate t-1 Probability of being forgotten f t (ii) a Wherein f is t 1 denotes complete retention, and f t 0 means complete discard.
Secondly, the context vector h output by the last neuron t-1 And the historical word segmentation vector x of the current neuron input t As an input, the memory cell state C 'of the current neuron is obtained through an input gate' t Probability of being retained i t And use of i t Calculating the memory cell state C 'of the current neuron being preserved' t
By C t-1 And C' t Updating the memory cell state of the current neuron, wherein the updated memory cell state is C t
Finally, the context vector h output by the last neuron t-1 And the historical word segmentation vector x of the current neuron input t As input, C is obtained through the output gate t Probability of memory cell state being preserved o t And, C is t Processed by activating a function t Multiplying to obtain context vector h of current neural output t The context vector is a history content vector of the history text, wherein the history content vector comprises a history title vector of the history text and a history text vector of the history text, and the corresponding history word segmentation vector can be obtained through each neuron.
2. Topic perception submodel:
obtaining a second prediction training sample data set used for training the topic perception submodel, wherein the second prediction training sample data set comprises a large amount of second historical comment information, namely the second historical comment information is a second prediction training sample;
according to second historical comment information in a second prediction training sample data set, executing loop iteration training on the topic perception submodel, and outputting the trained topic perception submodel when preset convergence conditions are met; wherein the following operations are executed in a loop iteration training process:
inputting the second historical comment information into a pre-constructed topic perception submodel;
obtaining a second semantic vector of second historical comment information through a pre-constructed topic perception submodel; reconstructing based on the second semantic vector to obtain predicted comment information; obtaining posterior topic selection probability corresponding to the second historical comment information based on the second semantic vector; determining a topic vector corresponding to the second historical comment information based on the posterior topic selection probability of the second historical comment information;
and constructing a second loss function based on the predicted comment information, the posterior topic alternative probability corresponding to the second historical comment information and the topic vector corresponding to the second historical comment information, and carrying out parameter adjustment on the topic perception submodel based on the second loss function.
3. The second predictor model:
obtaining a first prediction training sample data set used for training a second prediction submodel, wherein the first prediction training sample data set comprises a large number of historical texts and corresponding first historical comment information, and the historical texts and the corresponding first historical comment information are first prediction training samples;
according to a first prediction training sample in the first prediction training sample data set, performing loop iteration training on the second predictor model, and outputting the trained second predictor model when a preset convergence condition is met, wherein the following operations are performed in the process of one loop iteration training:
selecting a first prediction training sample from a first prediction training sample data set, wherein the first prediction training sample comprises a historical text and corresponding first historical comment information, and the historical text comprises at least one historical word segmentation;
inputting at least one history word segmentation contained in the history text in the first prediction training sample into a second pre-constructed prediction submodel;
determining topics associated with each historical word segmentation through a pre-constructed second predictor model;
respectively determining the predicted topic selection probability of each topic based on the historical word segmentation number associated with each topic, namely determining the predicted topic selection probability corresponding to the historical text;
constructing a first loss function based on the real topic selection probability and the predicted topic selection probability corresponding to each topic, namely constructing the first loss function based on the real topic selection probability and the predicted topic selection probability corresponding to the historical text; and carrying out parameter adjustment on the second prediction submodel based on the first loss function, wherein the selection probability of the real topic is determined according to the first historical comment information corresponding to the historical text.
4. Comment information generation submodel:
inputting a historical key word segmentation set corresponding to a historical text obtained based on a first prediction submodel and a historical topic word segmentation set corresponding to a historical text obtained based on a second prediction submodel into a pre-constructed comment information generation submodel;
generating a sub-model through pre-constructed comment information, and outputting comment information corresponding to the historical text;
and constructing a fourth loss function based on the comment information corresponding to the historical text, and performing parameter adjustment on the comment information generation submodel based on the fourth loss function.
In the embodiment of the application, when the comment information generation model is trained, the sub-models can also be jointly trained.
(II) use part of comment information generation model:
in the using process of the comment information generation model, a target topic word segmentation set and a key word segmentation set corresponding to the text to be processed are mainly determined. And the target topic word segmentation set is selected from each topic and the topic word segmentation set stored in the topic perception submodel based on the topic selection probability of each topic corresponding to the text to be processed, so that in the using process, the topic perception submodel can be removed, and only each topic and the corresponding topic word segmentation set obtained through the topic perception submodel are reserved.
In the embodiment of the application, after the text to be processed is determined, each participle and a corresponding title contained in the text to be processed are obtained, wherein the title is used for representing the core content of the text to be processed;
inputting each participle and title of the text to be processed into a first prediction submodel of a trained comment information generation model, determining first similarity between each participle and title contained in the text to be processed through the first prediction submodel, and determining first selection probability of each participle selected as a key participle based on the first similarity; then selecting key participles from the participles contained in the text to be processed based on the first selected probability, and forming a key participle set; simultaneously;
inputting each participle of the text to be processed into a second prediction submodel of the trained comment information generation model, determining a second similarity between each participle contained in the text to be processed and each preset topic through the second prediction submodel, and determining the topic selection probability of each topic corresponding to the text to be processed based on the second similarity; then based on topic selection probability of each topic corresponding to the text to be processed, determining a target topic and a corresponding target topic word segmentation set from each topic and a corresponding topic word segmentation set stored in the topic perception sub-model; each target topic represents a recommended comment angle aiming at the text to be processed, and each target topic corresponds to at least one topic participle;
and finally, generating target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set.
Obviously, in the embodiment of the application, when the comment information is automatically generated through the comment information generation model, important information in the text to be processed and a comment angle corresponding to the text to be processed are considered, so that the generated target comment information is rich and diverse.
After introducing the design concept of the embodiment of the present application, some simple descriptions are provided below for application scenarios to which the technical solution of the embodiment of the present application can be applied, and it should be noted that the application scenarios described below are only used for describing the embodiment of the present application and are not limited. In a specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.
As shown in fig. 3, a schematic view of an application scenario provided in the embodiment of the present application is provided, where the application scenario includes a terminal device 30 (such as may include, but is not limited to, 30-1 or 30-2 illustrated in the figure) and a server 31;
the terminal device 30 is an electronic device used by a user, and various reading software and websites with a reading comment function are installed and operated in the terminal device 30. The terminal device 30 may be a personal computer, a mobile phone, a tablet computer, a notebook, an e-book reader, or other computer device; the terminal device 30 may also be a chat robot.
The server 31 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
In one possible embodiment, the terminal device 30 and the server 31 may communicate with each other through a communication network, which is a wired network or a wireless network. The terminal device 30 and the server 31 may be directly or indirectly connected by wired or wireless communication. For example, the terminal device 30 may be indirectly connected to the server 31 through the wireless access point 32, or the terminal device 30 may be directly connected to the server 31 through the internet, which is not limited herein.
Because a large amount of training sample data is needed in the training process of the comment information generation model, in the embodiment of the present application, the training process of the comment information generation model is deployed in the server 30, and a large amount of historical texts and the historical comment information corresponding to each historical text can be stored in the server 30, so as to be used for training the comment information generation model. After obtaining the trained comment information generation model based on the training method in the embodiment of the application, the trained comment information generation model may be deployed on the terminal device 30, may also be deployed on the server 31, and may also deploy some submodels in the comment information generation model on the terminal device 30, and deploy the other submodels on the server 31.
Taking the terminal device 30 as a tablet computer for example:
when the comment information generation model is deployed on the terminal device 30 and a reader reads texts such as news or novel reading materials in the tablet computer, the tablet computer can acquire the text currently read by the reader and serve as the text to be processed; then acquiring a key word segmentation set and a target topic word segmentation set corresponding to the text to be processed; then, generating at least one piece of target comment information corresponding to the text to be processed based on the key word segmentation set and the target topic word segmentation set, and presenting the target comment information to a reader through a display page of the tablet computer, so that the reader can select the target comment information to directly publish the target comment information or publish the target comment information after re-editing;
when the comment information generation model is deployed on the server 31 and a reader reads texts such as news or novel reading materials in the tablet computer, the tablet computer can acquire the text currently read by the reader and serve as the text to be processed; then, the server 31 is notified of the text to be processed, and at this time, the server 31 receives the text to be processed sent by the tablet computer, and obtains a key word segmentation set and a target topic word segmentation set corresponding to the text to be processed; then, generating at least one piece of target comment information corresponding to the text to be processed based on the key word segmentation set and the target topic word segmentation set; then, the server 31 returns the generated target comment information to the tablet computer to be presented to the reader through the display page of the tablet computer, so that the reader can select the target comment information to be directly issued, or the target comment information is edited again and then issued;
when part of the comment information generation model submodels are deployed in the terminal device 30 and the other submodels are deployed in the server 31, for example, the first prediction submodel, the second prediction submodel and the text information generation submodel are deployed in the terminal device 30, and the topic sensing submodel is deployed in the server 31, that is, the topic participle sets corresponding to the trained topics are stored in the server 31. At this time, when a reader reads texts such as news or novel reading materials in the tablet computer, the tablet computer can acquire the text currently read by the reader and serve as the text to be processed; then the tablet computer obtains a key word segmentation set corresponding to the text to be processed based on the first prediction submodel, determines the topic selection probability of each topic corresponding to the text to be processed based on the second prediction submodel, and transmits the topic selection probability of each topic to the server 31, the server 31 obtains a corresponding target topic word segmentation set based on the topic selection probability of each topic, and feeds the target topic word segmentation set back to the tablet computer, at the moment, the tablet computer generates at least one piece of target comment information corresponding to the text to be processed based on the key word segmentation set and the target topic word segmentation set, and the target comment information is presented to a reader through a display page of the tablet computer, so that the reader can select the target comment information to be directly published, or re-edit the target comment information and publish the target comment information. It should be noted that, in this embodiment of the application, the first prediction sub-model, the second prediction sub-model, and the topic sensing sub-model may also be deployed in the server 31, and the comment information generation sub-model is deployed in the terminal device 30, which is not described herein again.
Taking the terminal device 30 as a chat robot for example:
the chat robot obtains the reading material currently read by the reader, and the chat robot obtains comment information for the reading material, specifically, refer to an application scenario in which the terminal device 30 is a tablet computer. However, when the terminal device 30 is a chat robot, the chat robot directly outputs the obtained comment information in a voice output mode after obtaining the comment information, and the comment information is not modified, that is, the reader and the chat robot discuss the same reading.
In a possible implementation manner, the manner of acquiring the reading currently read by the reader by the chat robot may be: the method comprises the steps that reading materials sent by terminal equipment such as a mobile phone or a tablet personal computer are received in a wireless mode, and the reading materials are sent after a reader triggers a sending instruction in the terminal equipment such as the mobile phone or the tablet personal computer; or the keyword is obtained by scanning based on the keyword after the keyword sent by the reader through a handwriting input mode, a voice input mode or the like is received.
In a possible implementation manner, the text to be processed may be from a browser webpage, text information matched with query information input by a user, or a data source such as a public number article, which is not limited herein.
In a possible application scenario, in the embodiment of the application, a cloud storage technology may be adopted to store the training sample data set for training the comment information generation model, or a topic word segmentation set corresponding to each topic obtained through training; the distributed Cloud Storage system (hereinafter referred to as a Storage system) refers to a Storage system which integrates a large number of Storage devices (Storage devices are also referred to as Storage nodes) of different types in a network through application software or application interfaces to cooperatively work through functions of cluster application, grid technology, distributed Storage file system and the like, and provides data Storage and service access functions to the outside.
In a possible application scenario, in order to reduce the communication delay, the servers 31 may be deployed in different regions, or in order to balance the load, different servers 31 may respectively serve the regions corresponding to the terminal devices 30. The plurality of servers 31 may share data by a block chain, and the plurality of servers 31 correspond to a data sharing system configured by the plurality of servers 31. For example, the terminal device 30 is located at a site a and is in communication connection with the server 31, and the terminal device 30 is located at a site b and is in communication connection with another server 31.
Each server 31 in the data sharing system has a node identifier corresponding to the server 31, and each server 31 in the data sharing system may store node identifiers of other servers 31 in the data sharing system, so that the generated block is broadcast to other servers 31 in the data sharing system according to the node identifiers of other servers 31. Each server 31 may maintain a node identifier list as shown in the following table, and store the name of the server 31 and the node identifier in the node identifier list correspondingly. The node identifier may be an Internet Protocol (IP) address and any other information that can be used to identify the node, and only the IP address is used as an example in table 1.
TABLE 1
Background server name Node identification
Node
1 119.113.131.174
Node 2 118.116.189.143
Node N 119.124.789.238
Based on the above application scenarios, the following describes a method for generating comment information provided by the exemplary embodiment of the present application according to the accompanying drawings in combination with the above described application scenarios, and it should be noted that the above application scenarios are only shown for facilitating understanding of the spirit and principle of the present application, and the embodiments of the present application are not limited in any way in this respect.
In the method for automatically generating comment information for a to-be-processed text provided by the embodiment of the application, the comment information generation model is trained, that is, each participle contained in the to-be-processed text is input into the trained comment information generation model, and finally, the comment information generation model outputs target comment information for the to-be-processed text. Therefore, in the embodiment of the application, the comment information generation model is trained; and then generating corresponding target comment information for any text to be processed based on the trained comment information generation model.
The following embodiments are described with respect to a training process of the comment information generation model and a use process of the comment information generation model in the embodiments of the present application, respectively.
The first embodiment is as follows: the training process of the comment information generation model comprises the following steps:
referring to fig. 4, fig. 4 is a schematic structural diagram of a comment information generation model in the embodiment of the present application, which is exemplarily provided.
As can be seen from fig. 4, the comment information generation model mainly includes a topic perception submodel, a first prediction submodel, a second prediction submodel, and a comment information generation submodel. It should be noted that there is a certain association relationship between each sub-model, and the structural division of each sub-model shown in fig. 4 is only for making the embodiment of the present application clearer, and cannot show the actual division between the structures of each sub-model.
The method comprises the steps that a trained comment information generation model is used, when target comment information is generated, first selected probabilities of all participles contained in a text to be processed are mainly determined through a first predication submodel, the first selected probabilities are used for representing the probability that all the participles are selected as key participles, and then a key participle set is determined based on the first selected probabilities; and
determining topic selection probability of each topic corresponding to the text to be processed through a second prediction submodel, then selecting a target topic and a corresponding target topic word segmentation set from each topic and a corresponding topic word segmentation set which are predetermined through a trained topic perception submodel based on the determined topic selection probability, wherein each target topic represents a recommendation comment angle aiming at the text to be processed, and each target topic corresponds to at least one topic word segmentation;
and finally, generating a sub-model through the comment information, and generating target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set.
Therefore, the process of training the comment information generation model can be regarded as separately training each submodel and then combining the separately trained submodels.
In the embodiment of the application, when the second prediction submodel is trained, the probability of selecting the real topic output by the trained topic sensing submodel is required to assist training. Therefore, in the embodiment of the present application, before the second predictor model is trained, the topic perception sub-model should be trained.
The following describes a training process of each submodel in the review information generation model in the embodiment of the present application.
The first predictor model:
in one possible implementation, the first predictor model is an MLP, or the first predictor model is an MLP connected Gumbel-Softmax distribution. Fig. 5 exemplarily provides a network structure diagram of MLP.
As can be seen from fig. 5, the MLP consists of three layers, an input layer, a hidden layer and an output layer. Wherein the neurons of each layer are connected with all the neurons of the adjacent layer, namely are all connected, and the neurons of the same layer are not connected.
Each neuron has a weight for an input, a bias, and an activation function. And when a third prediction training sample in a third prediction training sample data set used for training the first prediction submodel is provided for the MLP network, output is obtained through first-layer neuron operation (multiplication by a weight value, addition of bias, and one-time activation function operation), then the output of the first layer is used as the input of the second layer, operation is carried out, the output … of the second layer is obtained until the output layer is operated, and then a target output result is obtained. And then, according to the direction of reducing the target output and the actual error, modifying each connection weight layer by layer from the output layer through each intermediate layer, namely, adjusting the parameters of the first predictor model, and finally returning to the input layer. MLP relies on such a set of mechanisms for computation and prediction.
Therefore, in the embodiment of the application, the MLP is used for predicting the similarity between each history word in the history text and the corresponding title of the history text, so as to predict the first history selection probability of each history word in the history text being selected as a key word;
when the first prediction submodel is MLP and connected with Gumbel-Softmax distribution, in order to realize end-to-end training of the model, selecting Gumbel-Softmax distribution in the training process, wherein the Gumbel-Softmax distribution is used for enabling the first selected probability to approach to 0 or 1, namely predicting second history selected probability that each history participle in the history text is selected as a key participle;
the Gumbel-Softmax distribution is:
Figure BDA0002921805520000221
Figure BDA0002921805520000222
wherein the content of the first and second substances,
Figure BDA0002921805520000223
represents the second selected probability, ∈ j τ represents the heat of each participle, a random sample of Gumbel (0, 1).
In the embodiment of the application, before the first predictor model is trained, a third prediction training sample data set used when the first predictor model is trained is obtained, where the third prediction training sample data set includes historical texts, and specifically includes title information and text information corresponding to each historical text.
And then, according to a third prediction training sample in a third prediction training sample data set, performing loop iterative training on the first predictor model, and outputting the trained first predictor model when a preset convergence condition is met. Fig. 6 exemplarily provides a flowchart of a method for training a first predictor model in the embodiment of the present application, which is exemplified by one-cycle iterative training, wherein the following operations are performed during the one-cycle iterative training:
step S600, history title vector h of history text te And contained in the history textRespective historical participle vectors of
Figure BDA0002921805520000224
Inputting the prediction data into a first pre-constructed predictor model;
wherein the historical title vector h te And each history word segmentation vector contained in the history text
Figure BDA0002921805520000225
The title message and the body message of the history text are input to an LSTM structured encoder or a Transformer structured encoder.
Step S601, based on a pre-constructed first prediction submodel
Figure BDA0002921805520000226
Figure BDA0002921805520000227
Determining first history similarity of each history word and each title word in the history titles;
each first historical similarity is determined based on a distance between each historical title participle vector in the historical title and each historical participle vector.
Step S602, determining and outputting a first history selection probability that the corresponding history participle is selected as the history key participle based on each first history similarity through a first pre-constructed prediction submodel.
Step S603, selecting history participles based on the respective first history selection probabilities, and forming a history key participle set.
Step S604, determining an L1 norm loss function based on the number of the historical key participles in the historical key participle set
Figure BDA0002921805520000231
And parameter adjustments are made to the first predictor model based on the L1 norm loss function.
In the embodiment of the present application, to encourage the first predictor model to close more gates, fewer gates are selectedThe history key participles form a history key participle set, and the embodiment of the application adds L1 norm items of all gates on the basis of lost items
Figure BDA0002921805520000232
The term L1 norm is a loss function designed in the training process of the first predictor model in the embodiment of the present application, so that the parameter adjustment is performed on the first predictor model based on the L1 norm loss function.
Topic perception submodel:
the topic perception submodel is a model for constructing topics for conducting reader perception on all historical comment information and is mainly used for learning semantic vectors of the historical comment information from the historical comment information and mining the topics perceived by readers. In order to achieve the purpose, in the embodiment of the application, a variation generation clustering algorithm is designed in the topic perception sub-model, and the variation generation clustering algorithm can be jointly trained with the whole comment information generation model in an end-to-end manner, and fig. 7 exemplarily provides a schematic diagram for topic clustering according to the history comment information, wherein a black filling part represents a history comment information set of a first type of topic, and a white filling part represents a history comment information set of a second type of topic.
In the embodiment of the present application, before training the topic awareness submodel, a second prediction training sample data set for training the topic awareness submodel is obtained, where the second prediction training sample data set includes a large amount of second historical comment information, that is, the second historical comment information is a second prediction training sample.
And then, according to second historical comment information in a second prediction training sample data set, executing loop iterative training on the topic perception submodel, and outputting the trained topic perception submodel when preset convergence conditions are met. Fig. 8 provides a flowchart of a method for training a first predictor model in the embodiment of the present application, taking a loop iteration training as an example, where the following operations are performed in a loop iteration training process:
step S800, inputting second historical comment information into a pre-constructed topic perception submodel;
in the embodiment of the present application, each second historical comment information Y is represented using a bag-of-words vector.
Because the topic perception submodel is a generative model, the reconstruction processing is carried out on the second semantic vector of each second historical comment information to obtain the predicted comment information. Therefore, a second semantic vector of the second historical comment information is obtained through a pre-constructed topic perception sub-model.
Step S801, obtaining a second semantic vector of second historical comment information through a pre-constructed topic perception submodel;
in the embodiment of the application, after the second historical comment information Y is input into a pre-constructed topic perception submodel, a title c corresponding to the second historical comment information is generated according to the previous title category distribution p (c); and conditionally generating a second semantic vector z of the second historical comment information according to the gaussian distribution p (z | c).
Step S802, reconstructing the pre-constructed topic sensing submodel based on a second semantic vector of second historical comment information to obtain predicted comment information;
reconstructing the semantic vector based on the second historical comment information by conditional distribution p (y | z) through a pre-constructed topic perception submodel to obtain predicted comment information y;
according to the above process of generating the predictive comment information y, the joint probability p (y, z, c) is decomposed into:
p(y,z,c)=p(y|z)p(z|c)p(c)
in the embodiment of the application, after a second semantic vector of second historical comment information is determined, the posterior topic selection probability corresponding to the second historical comment information can be determined based on the second semantic vector; and determining a topic vector corresponding to the second historical comment information based on the posterior topic selection probability of the second historical comment information.
Step S803, determining, by a pre-constructed topic sensing submodel, a posterior topic selection probability q (c | z) and a corresponding topic vector μ corresponding to the second historical comment information based on the second semantic vector of the second historical comment information.
In the training process, based on the posterior topic selection probability q (c | z), a weighted topic vector is calculated by:
Figure BDA0002921805520000251
step S804, a second loss function is constructed based on the predicted comment information, the posterior topic selection probability and the corresponding topic vector, and parameter adjustment is carried out on the topic perception submodel based on the second loss function.
Using the Jensen inequality, the log-likelihood can be expressed as:
Figure BDA0002921805520000252
wherein L is ELBO Is the lower bound of Evidence (ELBO), q (z, c | y) is the posterior topic selection probability, factoring as follows:
q(z,c|y)=q(z|y)q(c|z)
thus, L ELBO Can be rewritten as:
Figure BDA0002921805520000253
the first item is a reconstruction item and prediction comment information obtained by reconstruction processing, and model reconstruction input is encouraged. The second term is to align a second semantic vector z of the input second historical comment information Y with a topic vector corresponding to the respective topic c, where q (c | z) can be regarded as a clustering result of the input second historical comment information Y. The last term is used to narrow the distance between the posterior topic selection probability q (c | z) and the prior topic selection probability p (c).
Therefore, based on the reconstructed comment information, the posterior topic alternative probability corresponding to the second historical comment information and the topic vector corresponding to the second historical comment information, the second loss function is constructed as
Figure BDA0002921805520000261
Figure BDA0002921805520000262
And adjusting parameters of the topic perception submodel based on the second loss function.
In order to prevent the posterior topic selection probability q (c | z) from collapsing, all the second historical comment information is clustered into corresponding topics, and the prior topic selection probability p (c) is set to be a uniform distribution p (c) ═ 1/k. p (z | c) is a parameterized diagonal gaussian function as follows:
Figure BDA0002921805520000266
wherein, mu c Is the mean gaussian of topic c, which is also the topic vector as topic c. A parameterized diagonal gaussian function q (z | y) is employed:
μ=l 1 (h),log σ =l 2 (h)
Figure BDA0002921805520000263
wherein l 1 And l 2 And h is obtained by encoding the second historical comment information through a comment information encoder for linear transformation, wherein the comment information encoder comprises an MLP and has a tanh activation function. Furthermore, an MLP classifier is used to predict the topic distribution q (c | z). p (y | z) is modeled by the comment information decoder, which is a single-layer MLP with a softmax activation function in the last layer.
After training, topic vector
Figure BDA0002921805520000264
Only from the corpus of historical review information in the training set. The topic vectors can be used for controlling the generated rich and diverse comment information, namely the topic vectors
Figure BDA0002921805520000265
The method is applied to an automatic comment information generation process.
The second predictor model:
in the embodiment of the application, the second prediction submodel is an MLP including a Softmax function, and is mainly used for predicting topic selection probability, namely topic distribution, of each topic corresponding to the historical text in the training process.
In the embodiment of the present application, before training the second predictor model, a first prediction training sample data set for training the second predictor model is obtained, where the first prediction training sample data set includes a large number of historical texts and corresponding first historical comment information, and the historical texts and the corresponding first historical comment information are the first prediction training sample.
And then, according to a first prediction training sample in the first prediction training sample data set, executing loop iterative training on the second predictor model, and outputting the trained second predictor model when a preset convergence condition is met. Fig. 9 is a flowchart of a method for training a second predictor model in the embodiment of the present application, which is taken as an example of one-cycle iterative training, where the following operations are performed in the one-cycle iterative training process:
step S900, selecting a first prediction training sample from the first prediction training sample data set, wherein the first prediction training sample comprises a historical text and corresponding first historical comment information, and the historical text comprises at least one historical word segmentation.
Step S901, inputting at least one historical word segmentation contained in the historical text in the first prediction training sample into a second pre-constructed prediction submodel;
in the embodiment of the application, at least one historical participle included in the historical text in the first prediction training sample is input into the second pre-constructed predictor model, that is, a historical participle vector is input into the second pre-constructed predictor model, and the historical participle vector is obtained by inputting at least one historical participle included in the historical text into an encoder of an LSTM structure or an encoder of a transform structure.
Step S902, determining topics associated with each historical participle through a pre-constructed second prediction submodel;
in the embodiment of the application, the second prediction submodel includes each preset topic, and determines a second historical similarity between each historical participle and each preset topic, so as to determine topics associated with each historical participle based on the second historical similarity.
Step S903, respectively determining the predicted topic selection probability of each topic based on the historical word segmentation number associated with each topic;
in the embodiment of the application, the predicted topic selection probability of each topic is used for representing the predicted topic selection probability corresponding to the historical text.
In the embodiment of the present application, the second predictor model is based on p (c | X) ═ MLP (h) e ) And predicting the selection probability of the predicted topic corresponding to the historical text.
Step S904, constructing a first loss function Ltop ═ D based on the real topic selection probability and the predicted topic selection probability corresponding to each topic KL (q (c | z) | p (c | X)), and parameter-adjusting the second predictor model based on the first loss function.
In the embodiment of the application, the real topic selection probability corresponding to each topic, namely the real topic selection probability corresponding to the historical text, is determined based on the first historical comment information corresponding to the historical text through the trained topic perception submodel.
Determining the probability of selecting the real topic corresponding to the historical text by the following method:
inputting first historical comment information corresponding to the historical text into the trained topic perception submodel;
obtaining a second semantic vector of the first historical comment information through the trained topic perception submodel;
in the embodiment of the application, after first historical comment information is input into a pre-constructed topic perception submodel, a title c corresponding to the first historical comment information is generated according to previous title category distribution p (c); and conditionally generating a first semantic vector of the first historical comment information according to the gaussian distribution p (z | c).
After a first semantic vector of second historical comment information is determined, the posterior topic selection probability corresponding to the first historical comment information is determined based on the first semantic vector, and the determined posterior topic selection probability is the real topic selection probability corresponding to the historical text.
Comment information generation submodel:
in the embodiment of the present application, the comment information generation submodel is a decoder including an attention mechanism in an LSTM structure or a decoder including an attention mechanism in a transform structure.
In the training process, inputting a historical key word segmentation set obtained based on a first prediction submodel into a pre-constructed comment information generation submodel; and
inputting the topic selection probability of each topic corresponding to the historical text output according to the second prediction submodel into a pre-constructed comment information generation submodel from each topic stored in the topic perception submodel in a corresponding topic word segmentation set to obtain a historical topic word segmentation set corresponding to the historical text;
generating a submodel through pre-constructed comment information, and predicting and outputting comment information corresponding to a historical text based on a historical key word segmentation set, a historical topic word segmentation set and a pre-stored high-frequency word segmentation set;
and constructing a fourth loss function Lce based on the comment information corresponding to the historical text, and performing parameter adjustment on the comment information generation submodel based on the fourth loss function.
In this embodiment of the application, the fourth loss function may be determined according to the smoothness of the comment information corresponding to the history text.
In a possible implementation mode, the comment information generation model is integrally trained, namely the topic perception submodel, the first prediction submodel, the second prediction submodel and the comment information generation submodel are jointly trained.
In the joint training process, the overall objective to be optimized is:
L=λ 1 L ELBO2 Lsal+λ 3 Lce+λ 4 Ltop
wherein λ is 1 、λ 2 、λ 3 、λ 4 Is 4 hyperparameters, L ELBO The overall objective is mainly used for balancing influences among all submodels.
It should be noted that, the corresponding preset convergence condition in each training process may be that the corresponding loss value reaches a set first threshold, or that the number of times of training reaches a set second threshold.
After the trained comment information generation model is obtained, the target comment information is automatically generated based on the trained comment information generation model. In the process of automatically generating the target comment information, the target topic corresponding to the text to be processed, the corresponding target topic participle set and the key participle set corresponding to the text to be processed are determined.
In the process of determining the target topic corresponding to the text to be processed and the corresponding target topic word segmentation set, after determining the topic selection probability of each topic corresponding to the text to be processed through the trained second prediction submodel, selecting the target topic and the corresponding target topic word segmentation set from each topic and the corresponding topic word segmentation set stored in the trained topic perception submodel, wherein the topic perception submodel is not needed in the process, and only the result output by the trained topic perception submodel is used, so that when the comment information is automatically generated by using the trained comment information generation model, the structure of the trained topic perception submodel can be removed from the comment information generation model. Referring to fig. 10, fig. 10 exemplarily provides a schematic diagram of a trained comment information generation model in a comment information generation process in an embodiment of the present application.
Example two: the use process of the comment information generation model comprises the following steps:
referring to fig. 11, fig. 11 exemplarily provides a comment information generation method in an embodiment of the present application, where the comment information generation method includes the following steps:
step S1101, determining a key word segmentation set corresponding to the text to be processed based on a first similarity between each segmentation word included in the text to be processed and the corresponding title.
After the text to be processed is determined, the title T ═ { T1, T2, …, tm } and the text B ═ { B1, B2, …, bn } of the text to be processed can be obtained, that is, the participle set X ═ T, B corresponding to the text to be processed, and each participle in the text to be processed is included in the participle set.
In one possible implementation manner, each participle X ═ T, B ] contained in the text to be processed and the determined title T ═ T1, T2, …, tm } are input into a first prediction submodel of the trained comment information generation model; and the first predictor model is combined with the title to determine the first selected probability corresponding to each participle.
In a possible implementation manner, the method can also be called that each word segmentation vector and the determined title vector contained in the text to be processed are input into a first prediction submodel of the trained comment information generation model; determining a first selection probability of corresponding participles of each participle vector by combining the title vector based on the first predictor model;
the participle vectors and the heading vectors are obtained by inputting each participle X ═ T, B ] and heading T ═ T1, T2, …, tm } of the text to be processed into an LSTM structure encoder of the trained comment information generation model or a Transformer structure encoder of the trained comment information generation model for encoding processing. Taking an encoder using a bi-directional LSTM structure as an example, the header vector is the last hidden vector in two directions in the LSTM structure.
In an embodiment of the present application, the first predictor model is an MLP, and the first selection probability can be determined by the following formula:
Figure BDA0002921805520000311
wherein the content of the first and second substances,
Figure BDA0002921805520000312
for each participle vector, h te Is a header vector.
Specifically, a first distance between each word segmentation vector and the title vector can be determined based on the first predictor model; a first similarity between each word segmentation and the title can be determined based on the first distance; based on the first similarity, a first selected probability corresponding to each participle can be determined, and the first selected probability is used for representing the probability of each participle being selected as a key participle.
The smaller the distance between the participle vector and the title vector is, the higher the first similarity between the participle and the title is, and the higher the first selection probability that the participle is selected as a key participle is.
After the first selection probability corresponding to each participle is determined, at least one participle is selected from each participle contained in the text to be processed to form a key participle set based on the first selection probability corresponding to each participle.
In one possible implementation, a participle whose probability value of the corresponding first selected probability is greater than the third threshold is selected as the key participle, for example, the third threshold is 95%, and when the first selected probability of a participle t2 is 96%, the participle t2 is selected as the key participle.
In a possible implementation manner, when at least one participle is selected from the participles contained in the text to be processed to form a keyword set based on a first selected probability corresponding to the participle, after the first selected probability is determined, a second selected probability of the corresponding participle can be respectively determined based on the first selected probability of the participle, wherein a difference value between a value of each second selected probability and 0 or 1 is less than a preset value; even if the first chosen probability approaches 0 or 1 to obtain a corresponding second chosen probability. The second selected probability is therefore also used to characterize the probability that the corresponding participle is selected as a key participle.
Each first chosen probability is made to approach 0 or 1 to obtain a corresponding second chosen probability, mainly by:
the first method is as follows: determining a second selected probability of the corresponding participle in a Gumbel-Softmax distribution mode; the formula corresponding to the Gumbel-Softmax distribution is as follows:
Figure BDA0002921805520000313
Figure BDA0002921805520000314
wherein the content of the first and second substances,
Figure BDA0002921805520000321
represents the second selected probability, ∈ j τ represents the heat of each participle, a random sample of Gumbel (0, 1).
By adopting the first mode, the determined second selected probability of each participle is closer to 0 or 1, at this time, participles with the second selected probability closer to 1 need to be screened out from each participle contained in the text to be processed as key participles, namely, participles with the difference value between the value of the second selected probability and 1 smaller than the preset value are screened out from each participle contained in the text to be processed as key participles, and then a key participle set is formed based on the screened out participles.
For example, the first selected probability of a participle t2 is 99.3%, and can be made to approach 1 more by the Gumbel-Softmax distribution, and at this time, after the first selected probability passes the Gumbel-Softmax distribution, the obtained second selected probability may be 99.99%, and 99.99% is closer to 1, so that the participle t2 is selected as the key participle.
The second method comprises the following steps: determining a second selected probability of the corresponding participle in a Bernoulli distribution mode;
in the second mode, the obtained first selected probability is used to parameterize the bernoulli distribution, and the binary gate of each participle can be obtained from the bernoulli distribution, wherein the specific corresponding formula is as follows:
g i ~Bernoulli(β i )
wherein, g i Representing the second selected probability, β i Representing the first selected probability.
And at the moment, the second selected probability is 0 or 1, and the participles with the value of 1 of the second selected probability are screened out from the participles contained in the text to be processed to serve as key participles, and a key participle set is formed.
For example, the first selected probability of a word t2 is 99.3%, and can be changed to 1 by bernoulli distribution, and at this time, after the first selected probability passes the bernoulli distribution, the obtained second selected probability may be 1, so that the word t2 is selected as the key word.
Step S1102, determining a target topic corresponding to the text to be processed and a corresponding target topic participle set based on a second similarity between each participle included in the text to be processed and each preset topic.
Each target topic represents a recommended comment angle aiming at the text to be processed, and each target topic corresponds to at least one topic participle.
After each participle vector contained in the text to be processed is obtained through an LSTM structure encoder of the trained comment information generation model or a Transformer structure encoder of the trained comment information generation model, each participle vector contained in the text to be processed is input into a second prediction submodel of the trained comment information generation model.
The second predictor model respectively determines a second similarity between each participle and each preset topic based on each participle vector and each preset topic vector; respectively determining topics c associated with the participles corresponding to the participle vectors based on the obtained second similarity; then respectively determining topic selection probability p (c | X) of each topic c based on the obtained number of the word segments associated with each topic c; namely, determining the topic selection probability p (c | X) of each topic corresponding to the text to be processed. At this time, the second predictor model outputs topic selection probabilities p (c | X) of the topics corresponding to the text to be processed.
In the embodiment of the application, the second prediction submodel inputs the output topic selection probability of each topic into the topic perception submodel of the trained comment information generation model, and each topic and the corresponding topic participle set or topic vector are stored in the topic perception submodel
Figure BDA0002921805520000331
Therefore, the target topic and the corresponding target topic participle set can be selected from the topic sensing submodel in which the topics are stored and the corresponding topic participle set based on the topic selection probability of each topic.
For example, the probability of selecting each topic corresponding to the text to be processed is that c1 is 50%, the probability of selecting the topic c2 is 35%, the probability of selecting the topic c3 is 10%, and the like, at this time, the topic c1 with the maximum probability value of selecting the topic is set, and each topic and the corresponding topic word segmentation set or topic vector are stored in the topic perception sub-model
Figure BDA0002921805520000332
In (1), determining a topic vector mu corresponding to the topic c1 1 . It should be noted that the topic with the topic selection probability is only an example, and in the embodiment of the present application, multiple topics and corresponding topic segmentation sets may be selected to be used for generating comment information.
Step S1103, generating target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set.
After the key word segmentation set and the target topic word segmentation set are determined, the key word segmentation set and the target topic word segmentation set are input into a comment information generation submodel of a trained comment information generation model. And enabling the comment information generation submodel to generate target comment information of the text to be processed according to the key word segmentation set, the target topic word segmentation set and the pre-stored high-frequency word segmentation set.
In the embodiment of the present application, the comment information generation submodel is a decoder including an attention mechanism in an LSTM structure or a decoder including an attention mechanism in a transform structure.
In a possible implementation mode, a submodel is generated through comment information, when target comment information of a text to be processed is generated according to a key word segmentation set, a target topic word segmentation set and a pre-stored high-frequency word segmentation set, multiple rounds of comment word segmentation prediction are required to be executed to obtain a comment word segmentation sequence or a comment word segmentation set, and the target comment information is generated based on the comment word segmentation sequence or the comment word segmentation set output in the last round of prediction;
the process of word segmentation prediction of each round of comments is as follows:
determining a third selection probability of each key participle in the key participle set, a fourth selection probability of each topic participle in the target topic participle set and a fifth selection probability of each high-frequency participle in a preset high-frequency participle set through an attention mechanism according to the comment participle in the comment participle sequence predicted in the previous round;
and predicting the comment participle of the current turn from the key participle set, the target topic participle set or the high-frequency participle set according to the weighted values of the third selection probability, the fourth selection probability and the fifth selection probability, wherein the third selection probability, the fourth selection probability and the fifth selection probability are respectively used for representing the probability that each key participle, each topic participle and each high-frequency participle are selected as comment participles.
Referring to fig. 12, fig. 12 is a flowchart of an overall method for generating comment information in an embodiment of the present application, where the overall method includes the following steps:
step S1200, the encoder based on the LSTM structure treats each participle X in the processed text i Performing coding processing to determine word segmentation vectors of each word segmentation contained in the text to be processed
Figure BDA0002921805520000341
And a title vector h corresponding to the text to be processed te
The title T ═ { T1, T2, …, tm } and the body B ═ B1, B2, …, bn } of the text to be processed, so that each participle X in the text to be processed is a word X i =[T,B]={t1,t2,…,tm,b1,b2,…,bn}。
Step S1201, the title vector h te And each participle vector
Figure BDA0002921805520000342
Inputting the first prediction model to obtain corresponding first selected probability beta of the first prediction model for each participle vector output i
Wherein the first selected probability is based on a formula
Figure BDA0002921805520000343
And (4) determining.
Step S1202, parameterizing Bernoulli distribution g by using each first selected probability respectively i ~Bernoulli(β i ) Determining a corresponding second selected probability g i Second selected probability g i Is 0 or 1.
In step S1203, selecting the participle with the second selected probability of 1 to form a key participle set.
Step S1204, dividing each word segmentation vector
Figure BDA0002921805520000351
Inputting the topic selection probability p (c | X) of each topic output by the second prediction model aiming at the text to be processed into the second prediction model;
wherein the topic selection probability for each topic is based on
Figure BDA0002921805520000352
And (4) determining.
Step S1205, based on topic selection probability p (c | X) of each topic, from topics determined by trained topic perception submodels and corresponding topic participle sets
Figure BDA0002921805520000353
Determining a target topic corresponding to a text to be processed and a corresponding target topic word segmentation set;
and step S1206, generating target comment information based on the key word segmentation sets and the target topic word segmentation sets.
Based on the same inventive concept, the present application further provides an apparatus 1300 for generating comment information, and fig. 13 exemplarily provides that one of the apparatuses in the present application includes:
a first determining unit 1301, configured to determine, based on a first similarity between each participle included in the to-be-processed text and a corresponding title, a key participle set corresponding to the to-be-processed text, where the title represents a core content of the to-be-processed text;
a second determining unit 1302, configured to determine, based on a second similarity between each participle included in the to-be-processed text and each preset topic, a target topic corresponding to the to-be-processed text and a corresponding target topic participle set, where each target topic represents one recommended comment angle for the to-be-processed text, and each target topic corresponds to at least one topic participle;
and the generating unit 1303 is configured to generate target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set.
In a possible implementation manner, the first determining unit 1301 is specifically configured to:
inputting each participle and title contained in the text to be processed into a first prediction submodel of the trained comment information generation model, and determining first similarity of each participle and title;
determining a first selection probability corresponding to each participle based on the first similarity, wherein the first selection probability is used for representing the probability of each participle being selected as a key participle;
and selecting at least one participle from each participle contained in the text to be processed to form a key participle set based on the first selected probability corresponding to each participle.
In a possible implementation manner, when the first determining unit 1301 selects at least one participle from the participles included in the text to be processed to form a keyword set, based on the first selected probability corresponding to the participle:
respectively determining second selected probabilities of the corresponding participles based on the first selected probabilities of the participles, wherein the difference between the value of each second selected probability and 0 or 1 is smaller than a preset value, and each second selected probability is used for representing the probability that the corresponding participles are selected as key participles;
and screening out the participles with the difference value between the value of the second selected probability and 1 of each participle contained in the text to be processed as key participles based on the second selected probability of each participle, and forming a key participle set.
In a possible implementation manner, when the first determining unit 1301 determines the second selected probability of the corresponding participle based on the first selected probability of each participle, respectively:
determining a second selected probability of the corresponding participle in a Gumbel-Softmax distribution mode based on the first selected probability of each participle; alternatively, the first and second electrodes may be,
and determining a second selected probability of the corresponding participle in a Bernoulli distribution mode based on the first selected probability of each participle.
In a possible implementation manner, the second determining unit 1302 is specifically configured to:
inputting each participle contained in the text to be processed into a second prediction submodel of the trained comment information generation model;
respectively determining second similarity between each participle and each preset topic based on each preset topic in the second prediction submodel, and respectively determining topics associated with each participle based on each obtained second similarity;
respectively determining topic selection probability of each topic based on the obtained word segmentation number associated with each topic;
and determining a target topic corresponding to the text to be processed based on the topic selection probability of each topic.
In a possible implementation manner, the second predictor model is an MLP including a Softmax function, and the second predictor model is obtained by training as follows:
according to a first prediction training sample in the first prediction training sample data set, performing loop iteration training on the second predictor model, and outputting the trained second predictor model when a preset convergence condition is met, wherein the following operations are performed in the process of one loop iteration training:
selecting a first prediction training sample from a first prediction training sample data set, wherein the first prediction training sample comprises a historical text and at least one piece of corresponding first historical comment information, and the historical text comprises at least one historical word segmentation;
inputting at least one history word segmentation contained in the history text in the first prediction training sample into a second pre-constructed prediction submodel;
based on each preset topic in a pre-constructed second prediction submodel, respectively obtaining topics associated with each historical participle through a Softmax function;
respectively determining the selection probability of the predicted topic of each topic based on the obtained historical word segmentation number associated with each topic;
and constructing a first loss function based on the real topic selection probability and the predicted topic selection probability corresponding to each topic, and carrying out parameter adjustment on the second prediction submodel based on the first loss function, wherein the real topic selection probability is determined according to at least one piece of first historical comment information corresponding to the historical text.
In one possible implementation, the true topic selection probability is determined by:
inputting at least one piece of first historical comment information in a first prediction training sample into a topic perception submodel of a trained comment information generation model;
obtaining a first semantic vector of each piece of first historical comment information based on the topic perception submodel, and determining a first historical topic corresponding to the corresponding first historical comment information based on each obtained first semantic vector;
based on the obtained number of the first historical comment information associated with each first historical topic, respectively determining the historical topic selection probability of each first historical topic, and taking the historical topic selection probability as the true topic selection probability.
In one possible implementation, the topic-aware topic sub-model is trained by:
according to a second prediction training sample in the second prediction training sample data set, executing loop iterative training on the topic perception submodel, and outputting the trained topic perception submodel when a preset convergence condition is met; wherein the following operations are executed in one loop iteration training process:
selecting a second prediction training sample from a second prediction training sample data set, wherein the second prediction training sample comprises at least one piece of second historical comment information;
inputting each second historical comment information in a second prediction training sample into a pre-constructed topic perception submodel, and determining a second semantic vector corresponding to each second historical comment information;
determining a second historical topic corresponding to the corresponding second historical comment information based on a second semantic vector corresponding to each second historical comment information, and determining the posterior topic selection probability based on the second historical topic;
and constructing a second loss function based on the second semantic vector and the posterior topic selection probability, and carrying out parameter adjustment on the topic perception submodel based on the second loss function.
In one possible implementation, constructing the second loss function based on the second semantic vector and the posterior topic selection probability includes:
reconstructing corresponding predicted comment information based on the second semantic vector;
and constructing a second loss function based on the distance between the predicted comment information, the posterior topic selection probability and the corresponding prior topic selection probability, and aligning the second semantic vector to the topic vector of the corresponding topic.
In a possible implementation manner, the generating unit 1303 is specifically configured to:
inputting the key participle set and the target topic participle set into a comment information generation submodule of a trained comment information generation model, executing multi-round comment participle set prediction through the comment information generation submodule, and generating target comment information based on a comment participle set output by the last round of prediction;
the process of word segmentation prediction of each round of comments is as follows:
determining a third selection probability of each key participle in the key participle set, a fourth selection probability of each topic participle in the target topic participle set and a fifth selection probability of each high-frequency participle in a preset high-frequency participle set through an attention mechanism according to the comment participle in the comment participle set predicted in the previous round;
and predicting the comment participle of the current turn from the key participle set, the target topic participle set or the high-frequency participle set according to a third selection probability, a fourth selection probability and a fifth selection probability, wherein the third selection probability, the fourth selection probability and the fifth selection probability are respectively used for representing the probability that each key participle, each topic participle and each high-frequency participle are selected as comment participles.
For convenience of description, the above submodels are described separately in terms of functional division into units (or modules). Of course, the functionality of the various elements (or modules) may be implemented in the same one or more pieces of software or hardware in practicing the present application.
After introducing the method and apparatus for comment information generation of the exemplary embodiment of the present application, a computing device for comment information generation of another exemplary embodiment of the present application is introduced next.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
In one possible implementation, the comment information generation computing device provided by the embodiment of the present application may include at least a processor and a memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform any of the steps of the review information generation methods of the various exemplary embodiments of the present application.
A comment information generation computing device 1400 according to this embodiment of the present application is described below with reference to fig. 14. The comment information generating computing device 1400 as shown in fig. 14 is merely an example, and should not bring any limitation to the function and the range of use of the embodiment of the present application.
As shown in fig. 14, the submodels of computing device 1400 may include, but are not limited to: the at least one processor 1401, the at least one memory 1402, and a bus 1403 connecting different system submodels, including the memory 1402 and the processor 1401.
Bus 1403 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The memory 1402 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)14021 and/or cache memory 14022, and may further include Read Only Memory (ROM) 14023.
Memory 1402 may also include a program/utility 14025 having a set (at least one) of program modules 14024, such program modules 14024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Computing device 1400 can also communicate with one or more external devices 1404 (e.g., keyboard, pointing device, etc.), and also with one or more devices that enable a user to interact with computing device 1400, and/or with any devices (e.g., router, modem, etc.) that enable computing device 1400 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 1405. Moreover, computing device 1400 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 1406. As shown in FIG. 14, the network adapter 1406 communicates with other modules for the computing device 1400 over a bus 1403. It should be understood that although not shown in fig. 14, other hardware and/or software modules may be used in conjunction with computing device 1400, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In some possible embodiments, various aspects of the comment information generation method provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps in the comment information generation method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for transmission control of a short message according to an embodiment of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executed on a computing device.
A readable signal medium may include a data signal propagating in baseband or as a submodel to a carrier wave, in which readable program code is carried. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (15)

1. A method of comment information generation, the method comprising:
determining a key word segmentation set corresponding to the text to be processed based on first similarity between each word segmentation included in the text to be processed and a corresponding title, wherein the title represents core content of the text to be processed;
determining a target topic corresponding to the text to be processed and a corresponding target topic word segmentation set based on a second similarity between each word segmentation contained in the text to be processed and each preset topic, wherein each target topic represents a recommended comment angle for the text to be processed, and each target topic corresponds to at least one topic word segmentation;
and generating target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set.
2. The method as claimed in claim 1, wherein the determining a set of key word segments corresponding to the text to be processed based on a first similarity between each word segment included in the text to be processed and the title comprises:
inputting each participle and the title contained in the text to be processed into a first prediction submodel of a trained comment information generation model, and determining first similarity of each participle and the title;
determining a first selected probability corresponding to each participle based on the first similarity, wherein the first selected probability is used for representing the probability of each participle being selected as a key participle;
and selecting at least one participle from the participles contained in the text to be processed to form the key participle set based on the first selected probability corresponding to the participles.
3. The method according to claim 2, wherein the selecting at least one participle from the participles included in the text to be processed to form the keyword set based on the first selected probability corresponding to the participle specifically comprises:
respectively determining second selected probabilities of the corresponding participles based on the first selected probabilities of the participles, wherein the difference between the value of each second selected probability and 0 or 1 is smaller than a preset value, and the second selected probabilities are used for representing the probability that the corresponding participles are selected as key participles;
and screening out the participles with the difference value between the value of the second selected probability and 1 being smaller than a preset value from the participles contained in the text to be processed as key participles based on the second selected probability of the participles, and forming a key participle set.
4. The method of claim 3, wherein determining a second selected probability for a respective participle based on the first selected probability for the respective participle comprises:
determining a second selected probability of the corresponding participle in a Gumbel-Softmax distribution mode based on the first selected probability of each participle; alternatively, the first and second electrodes may be,
and determining a second selected probability of the corresponding participle in a Bernoulli distribution mode based on the first selected probability of each participle.
5. The method as claimed in claim 1, wherein the determining the target topic corresponding to the text to be processed based on the second similarity between each participle included in the text to be processed and each preset topic includes:
inputting each participle contained in the text to be processed into a second prediction submodel of the trained comment information generation model;
respectively determining second similarity between each participle and each preset topic based on each preset topic in the second predictor model, and respectively determining topics associated with each participle based on each obtained second similarity;
respectively determining topic selection probability of each topic based on the obtained word segmentation number associated with each topic;
and determining a target topic corresponding to the text to be processed based on the topic selection probability of each topic.
6. The method of claim 5, wherein the second predictor model is a multi-layered perceptron (MLP) that includes a Softmax function, the second predictor model being trained by:
according to a first prediction training sample in a first prediction training sample data set, performing loop iteration training on the second predictor model, and outputting the trained second predictor model when a preset convergence condition is met, wherein the following operations are performed in the process of one-time loop iteration training:
selecting a first prediction training sample from the first prediction training sample data set, wherein the first prediction training sample comprises a historical text and corresponding at least one piece of first historical comment information, and the historical text comprises at least one historical word segmentation;
inputting at least one historical word segmentation contained in the historical text in the first prediction training sample into a second pre-constructed prediction submodel;
based on each preset topic in the pre-constructed second prediction submodel, respectively obtaining topics associated with each historical participle through a Softmax function;
respectively determining the predicted topic selection probability of each topic based on the obtained historical word segmentation number associated with each topic;
and constructing a first loss function based on the real topic selection probability and the predicted topic selection probability corresponding to each topic, and carrying out parameter adjustment on the second prediction submodel based on the first loss function, wherein the real topic selection probability is determined according to at least one piece of first historical comment information corresponding to the historical text.
7. The method of claim 6, wherein the true topic selection probability is determined by:
inputting at least one piece of first historical comment information in the first prediction training sample into a topic perception submodel of a trained comment information generation model;
obtaining a first semantic vector of each piece of first historical comment information based on the topic perception submodel, and determining a first historical topic corresponding to the corresponding first historical comment information based on each obtained first semantic vector;
based on the obtained number of the first historical comment information associated with each first historical topic, respectively determining the historical topic selection probability of each first historical topic, and taking the historical topic selection probability as the true topic selection probability.
8. The method of claim 7, wherein the topic awareness topic sub-model is trained by:
according to a second prediction training sample in a second prediction training sample data set, performing loop iterative training on the topic perception sub-model, and outputting the trained topic perception sub-model when a preset convergence condition is met; wherein the following operations are executed in a loop iteration training process:
selecting a second prediction training sample from the second prediction training sample data set, wherein the second prediction training sample comprises at least one piece of second historical comment information;
inputting each second historical comment information in the second prediction training sample into a pre-constructed topic perception submodel, and determining a second semantic vector corresponding to each second historical comment information;
determining a second historical topic corresponding to the corresponding second historical comment information based on the second semantic vector corresponding to each second historical comment information, and determining the posterior topic selection probability based on the second historical topic;
and constructing a second loss function based on the second semantic vector and the posterior topic selection probability, and carrying out parameter adjustment on the topic perception sub-model based on the second loss function.
9. The method of claim 8, wherein said constructing a second loss function based on said second semantic vector and said a posteriori topic selection probability comprises:
reconstructing corresponding predicted comment information based on the second semantic vector;
and constructing a second loss function based on the distance between the predicted comment information, the posterior topic selection probability and the corresponding prior topic selection probability, and aligning the second semantic vector to the topic vector of the corresponding topic.
10. The method of any one of claims 1 to 9, wherein a comment information generation submodel is further included in the comment information generation model, and the generating of the target comment information of the text to be processed based on the set of key participles and the set of target topic participles comprises:
inputting the key participle set and the target topic participle set into a comment information generation submodule of a trained comment information generation model, executing multi-round comment participle set prediction through the comment information generation submodule, and generating the target comment information based on a comment participle set output by the last round of prediction;
the process of predicting the word segmentation of each turn of comments is as follows:
determining a third selection probability of each key participle in the key participle set, a fourth selection probability of each topic participle in the target topic participle set and a fifth selection probability of each high-frequency participle in a preset high-frequency participle set through an attention mechanism according to the comment participle in the comment participle set predicted in the previous round;
predicting the comment participle of the current turn from the key participle set, the target topic participle set or the high-frequency participle set according to the third selected probability, the fourth selected probability and the fifth selected probability, wherein the third selected probability, the fourth selected probability and the fifth selected probability are respectively used for representing the probability that each key participle, each topic participle and each high-frequency participle are selected as comment participles.
11. An apparatus for comment information generation, the apparatus comprising:
the first determining unit is used for determining a key word segmentation set corresponding to the text to be processed based on first similarity between each word segmentation included in the text to be processed and a corresponding title, wherein the title represents core content of the text to be processed;
a second determining unit, configured to determine, based on a second similarity between each participle included in the to-be-processed text and each preset topic, a target topic corresponding to the to-be-processed text and a corresponding target topic participle set, where each target topic represents one recommended comment angle for the to-be-processed text, and each target topic corresponds to at least one topic participle;
and the generating unit is used for generating target comment information of the text to be processed based on the key word segmentation set and the target topic word segmentation set.
12. The apparatus of claim 11, wherein the first determining unit is specifically configured to:
inputting each participle and the title contained in the text to be processed into a first prediction submodel of a trained comment information generation model, and determining first similarity of each participle and the title;
determining a first selected probability corresponding to each participle based on the first similarity, wherein the first selected probability is used for representing the probability of each participle being selected as a key participle;
and selecting at least one participle from the participles contained in the text to be processed to form the key participle set based on the first selected probability corresponding to the participles.
13. The apparatus of claim 11, wherein the second determining unit is specifically configured to:
inputting each participle contained in the text to be processed into a second prediction submodel of the trained comment information generation model;
respectively determining second similarity between each participle and each preset topic based on each preset topic in the second predictor model, and respectively determining topics associated with each participle based on each obtained second similarity;
respectively determining topic selection probability of each topic based on the obtained word segmentation number associated with each topic;
and determining a target topic corresponding to the text to be processed based on the topic selection probability of each topic.
14. An apparatus for comment information generation, characterized by comprising: a memory and a processor, wherein the memory is configured to store computer instructions; a processor for executing computer instructions to implement the method of any one of claims 1-10.
15. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions which, when executed by a processor, implement the method of any one of claims 1-10.
CN202110119102.5A 2021-01-28 2021-01-28 Comment information generation method and device and storage medium Pending CN114818690A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110119102.5A CN114818690A (en) 2021-01-28 2021-01-28 Comment information generation method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110119102.5A CN114818690A (en) 2021-01-28 2021-01-28 Comment information generation method and device and storage medium

Publications (1)

Publication Number Publication Date
CN114818690A true CN114818690A (en) 2022-07-29

Family

ID=82526388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110119102.5A Pending CN114818690A (en) 2021-01-28 2021-01-28 Comment information generation method and device and storage medium

Country Status (1)

Country Link
CN (1) CN114818690A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591948A (en) * 2024-01-19 2024-02-23 北京中科闻歌科技股份有限公司 Comment generation model training method and device, and information generation method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591948A (en) * 2024-01-19 2024-02-23 北京中科闻歌科技股份有限公司 Comment generation model training method and device, and information generation method and device

Similar Documents

Publication Publication Date Title
Wadawadagi et al. Sentiment analysis with deep neural networks: comparative study and performance assessment
Wang et al. Neural networks and deep learning
Lv et al. Aspect-level sentiment analysis using context and aspect memory network
CN116415654A (en) Data processing method and related equipment
CN110377913A (en) A kind of sentiment analysis method and device thereof, electronic equipment and storage medium
CN112948676A (en) Training method of text feature extraction model, and text recommendation method and device
Ngo et al. Adaptive anomaly detection for internet of things in hierarchical edge computing: A contextual-bandit approach
Yao et al. Non-deterministic and emotional chatting machine: learning emotional conversation generation using conditional variational autoencoders
Kumar et al. Deep learning-based frameworks for aspect-based sentiment analysis
Wang et al. Design of Deep Learning Mixed Language Short Text Sentiment Classification System Based on CNN Algorithm
Yuan et al. Deep learning from a statistical perspective
John et al. Stock market prediction based on deep hybrid RNN model and sentiment analysis
Xia An overview of deep learning
CN114818690A (en) Comment information generation method and device and storage medium
Verma et al. Synthesized feature learning model on news aggregator for chatbot
Kassawat et al. Incorporating joint embeddings into goal-oriented dialogues with multi-task learning
CN116663523A (en) Semantic text similarity calculation method for multi-angle enhanced network
Zulqarnain et al. An improved gated recurrent unit based on auto encoder for sentiment analysis
Song Distilling knowledge from user information for document level sentiment classification
CN112818658B (en) Training method, classifying method, device and storage medium for text classification model
Pedroza et al. Machine reading comprehension (lstm) review (state of art)
Afrae et al. Smart Sustainable Cities: A Chatbot Based on Question Answering System Passing by a Grammatical Correction for Serving Citizens
Cvejoski et al. Recurrent point review models
Yang et al. Service component recommendation based on LSTM
Im et al. Cross-active connection for image-text multimodal feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination