CN115658935A

CN115658935A - Personalized comment generation method and device

Info

Publication number: CN115658935A
Application number: CN202211553822.3A
Authority: CN
Inventors: 刘剑锋; 王宝元
Original assignee: Beijing Hongmian Xiaoice Technology Co Ltd
Current assignee: Beijing Hongmian Xiaoice Technology Co Ltd
Priority date: 2022-12-06
Filing date: 2022-12-06
Publication date: 2023-01-31
Anticipated expiration: 2042-12-06
Also published as: CN115658935B

Abstract

The invention provides a personalized comment generation method and device. The method comprises the following steps: multi-modal data are extracted from multi-modal user content, and corresponding multi-modal knowledge is obtained; comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the user personalized information, and emotion distribution characteristics are obtained; embedding the emotion distribution characteristics into the multi-modal data to obtain the multi-modal data blended with the emotion characteristics; vector coding is carried out on the multi-modal knowledge and the user personalized information to obtain a corresponding multi-modal knowledge coding vector and a corresponding personalized information coding vector; and inputting the multi-modal data, the multi-modal knowledge coding vectors and the personalized information coding vectors which are blended with the emotional features into the comment generation model to obtain comment information which is generated by the comment generation model and corresponds to the multi-modal user content. The method provided by the invention can fully utilize multi-mode information, and improve the comment generation accuracy and robustness, so that the comment which is more personalized and rich in knowledge can be generated.

Description

Personalized comment generation method and device

Technical Field

The invention relates to the technical field of big data processing, in particular to a personalized comment generation method and device. In addition, an electronic device and a processor-readable storage medium are also related.

Background

In recent years, with the rapid development of internet technology, various social applications are more and more widespread. Automatic review technology is used in various social application scenarios to increase user liveness and attract more customers, such as news reviews, video reviews, commodity reviews, etc. However, in the prior art, only comments in a single modality are usually focused, that is, an input end has only one modality, for example, only text information is focused in news comments, only picture information is focused in picture comments, and the like. But in the social domain, UGC (user generated content) is often multi-modal, such as text + picture, and input in a single modality may result in incomplete information, thereby deteriorating relevance of comment generation. Meanwhile, the comment is personalized content generation, and the emotion polarity of the comment is guided by combining the comment information and the content to be commented is less explored at present. At present, in a comment generation process, only one-side enhancement is focused, and factors in multiple aspects such as multi-mode, knowledge enhancement and personalized emotion control are not combined, so that the comment generation accuracy and robustness are poor, and a comment which is more personalized and rich in knowledge is difficult to generate. Therefore, how to design a more effective personalized comment generation scheme to improve comment generation efficiency and precision becomes a challenge to be solved urgently.

Disclosure of Invention

Therefore, the invention provides a personalized comment generation method and device, and aims to overcome the defects that the personalized comment generation scheme in the prior art is high in limitation, and the comment generation accuracy and robustness are poor.

In a first aspect, the present invention provides a personalized comment generating method, including:

multi-modal data are extracted from multi-modal user content to be reviewed, and corresponding multi-modal knowledge is obtained based on the multi-modal data;

comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information, and emotion distribution characteristics are obtained; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics;

vector coding is carried out on the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors;

inputting the multi-modal data blended with the emotional features, the multi-modal knowledge coding vector and the personalized information coding vector into a comment generation model to obtain comment information which is generated by the comment generation model and corresponds to the multi-modal user content; the comment generation model is obtained by training based on first multi-mode sample data and standard comment information corresponding to the first multi-mode sample data.

Further, the obtaining of the multi-modal knowledge based on the multi-modal data specifically includes:

obtaining original multi-modal knowledge based on the multi-modal data and the search engines respectively corresponding to the multi-modal data; performing paragraph rearrangement on the original multi-modal knowledge based on a multi-modal sequencing model to obtain candidate multi-modal knowledge, and acquiring multi-modal knowledge of which the correlation degree with the multi-modal data meets a preset correlation degree threshold value from the candidate multi-modal knowledge;

the multi-modal ranking model is obtained by training based on multi-modal sample knowledge and ranking results corresponding to the multi-modal sample knowledge.

Further, the extracting of the multi-modal data from the multi-modal user content to be reviewed specifically includes:

extracting text data and picture data from the multimodal user content;

the obtaining of the original multi-modal knowledge based on the multi-modal data and the search engines respectively corresponding thereto specifically includes: inputting the text data into a text search engine to obtain text associated knowledge retrieved by the text search engine; inputting the picture data into a picture search engine to obtain picture associated knowledge retrieved by the picture search engine; and taking the text associated knowledge and the picture associated knowledge as original multi-modal knowledge.

Further, obtaining multi-modal knowledge whose correlation with the multi-modal data satisfies a preset correlation threshold from the candidate multi-modal knowledge specifically includes:

based on a text coding model of the multi-mode sequencing model, coding text data in the candidate multi-mode knowledge to obtain a first text vector; based on a picture coding model of the multi-mode sequencing model, coding picture data in the candidate multi-mode knowledge to obtain a first picture vector; adding the first text vector and the first picture vector to obtain a vector of candidate multi-modal knowledge;

based on the text coding model, coding text data in the multi-modal data to obtain a second text vector; based on the picture coding model, coding picture data in the multi-mode data to obtain a second picture vector; adding the second text vector and the second picture vector to obtain a vector of multi-modal data;

and sequencing the relevance of the vector of the multi-modal data and the vector of the candidate multi-modal knowledge to obtain the final multi-modal knowledge meeting a preset relevance threshold. Further, the comment emotion judgment based on the multi-modal knowledge, the multi-modal data and the acquired user personalized information to obtain emotion distribution characteristics specifically includes:

inputting the multi-modal knowledge, the multi-modal data and the user personalized information into a comment emotion distinguishing model to obtain emotion distribution characteristics output by the comment emotion distinguishing model; the comment emotion distinguishing model is obtained by training based on second multi-modal sample data and a comment emotion distinguishing result corresponding to the second multi-modal sample data.

Further, vector coding is performed on the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors, and the method specifically includes: inputting the user personalized information into a personalized coding model to obtain a personalized information coding vector; inputting the multi-modal knowledge into a knowledge coding model to obtain the multi-modal knowledge coding vector; and the dimensionality of the personalized information coding vector is the same as the dimensionality of the multi-mode knowledge coding vector.

Further, the embedding the emotion distribution characteristics into the multimodal data to obtain the multimodal data blended with emotion characteristics specifically includes:

inputting the multi-modal data into an embedding model to obtain a corresponding multi-modal embedding vector;

and fusing the multi-modal embedded vectors and the emotion distribution characteristics to obtain multi-modal data fused with emotion characteristics.

In a second aspect, the present invention further provides a personalized comment generating apparatus, including:

the data acquisition unit is used for extracting multi-modal data from multi-modal user content to be reviewed and acquiring corresponding multi-modal knowledge based on the multi-modal data;

the emotion analysis unit is used for performing comment emotion judgment on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information to obtain emotion distribution characteristics; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics;

the vector coding unit is used for carrying out vector coding on the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors;

a comment generating unit, configured to input the multi-modal data, the multi-modal knowledge encoding vector, and the personalized information encoding vector that are blended into the emotional features into a comment generating model, and obtain comment information that is generated by the comment generating model and corresponds to the multi-modal user content; the comment generation model is obtained by training based on first multi-mode sample data and standard comment information corresponding to the first multi-mode sample data.

Further, the data obtaining unit is specifically configured to:

extracting text data and picture data from the multimodal user content;

the obtaining of the original multi-modal knowledge based on the multi-modal data and the search engines respectively corresponding to the multi-modal data specifically comprises: inputting the text data into a text search engine to obtain text associated knowledge retrieved by the text search engine; inputting the picture data into a picture search engine to obtain picture associated knowledge retrieved by the picture search engine; and taking the text associated knowledge and the picture associated knowledge as original multi-modal knowledge.

Further, the data obtaining unit is specifically configured to:

based on a text coding model of the multi-modal sequencing model, coding text data in the candidate multi-modal knowledge to obtain a first text vector; based on a picture coding model of the multi-mode sequencing model, coding picture data in the candidate multi-mode knowledge to obtain a first picture vector; adding the first text vector and the first picture vector to obtain a vector of candidate multi-modal knowledge;

based on the text coding model, coding text data in the multi-modal data to obtain a second text vector; based on the picture coding model, picture data in the multi-mode data are coded to obtain a second picture vector; adding the second text vector and the second picture vector to obtain a vector of multi-modal data;

and sequencing the relevance of the vectors of the multi-modal data and the vectors of the candidate multi-modal knowledge to obtain the final multi-modal knowledge meeting a preset relevance threshold. Further, the emotion analysis unit is specifically configured to:

Further, the vector encoding unit is specifically configured to: inputting the user personalized information into a personalized coding model to obtain a personalized information coding vector; inputting the multi-modal knowledge into a knowledge coding model to obtain a multi-modal knowledge coding vector; wherein the dimension of the personalized information coding vector is the same as the dimension of the multi-modal knowledge coding vector.

Further, the emotion analysis unit is specifically configured to:

and carrying out fusion processing on the multi-modal embedded vector and the emotion distribution characteristics to obtain multi-modal data fused with the emotion characteristics.

In a third aspect, the present invention also provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the personalized comment generating method as described in any one of the above when executing the computer program.

In a fourth aspect, the present invention further provides a processor-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the personalized comment generating method according to any one of the above items.

According to the personalized comment generation method, the multi-modal data are extracted from the multi-modal user content to be commented, and corresponding multi-modal knowledge is obtained based on the multi-modal data; then, comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information, and emotion distribution characteristics are obtained; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics; further carrying out vector coding on the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors; and finally, inputting the multi-modal data blended with the emotional characteristics, the multi-modal knowledge coding vector and the personalized information coding vector into a comment generation model to obtain comment information which is generated by the comment generation model and corresponds to the multi-modal user content. The method can make full use of multi-modal information, and improve the comment generation accuracy and robustness, so that the comments which are more personalized and rich in knowledge can be generated.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is one of the flow diagrams of a personalized comment generating method provided by the embodiment of the present invention;

fig. 2 is a second schematic flowchart of a personalized comment generating method according to an embodiment of the present invention;

fig. 3 is a third schematic flowchart of a personalized comment generating method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a relevance analysis flow in the personalized comment generating method according to the embodiment of the present invention;

fig. 5 is a schematic structural diagram of a personalized comment generating apparatus according to an embodiment of the present invention;

fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments of the present invention, belong to the protection scope of the present invention.

The following describes an embodiment of the personalized comment generating method according to the present invention in detail. As shown in fig. 1, which is a schematic flow chart of a personalized comment generating method provided in an embodiment of the present invention, a specific process includes the following steps:

step 101: multimodal data are extracted from multimodal user content to be reviewed and corresponding multimodal knowledge is obtained based on the multimodal data.

The multi-modal user content may refer to multi-modal UGC (user generated content) or multi-modal post in the social domain. The multi-modal data may refer to text data, picture data, and the like included in the multi-modal UGC. For example, the multi-modal UGC can refer to news content composed of pictures and text authored by a user. The text data may be the network popular phrase "yyds", and the picture data may refer to a picture of a certain player described by "yyds".

As shown in fig. 2, in the implementation of the present invention, text data (i.e., post text) and picture data (i.e., post picture) are first extracted from the multimodal user content; then, different knowledge is recalled based on the multi-modal data and the corresponding search engines respectively, original multi-modal knowledge is obtained, paragraph rearrangement is carried out on the original multi-modal knowledge based on a multi-modal sequencing model (namely a multi-modal knowledge rearrangement model), candidate multi-modal knowledge is obtained, and multi-modal knowledge, the correlation degree of which with the multi-modal data meets a preset correlation degree threshold value, is obtained from the candidate multi-modal knowledge. The multi-modal ranking model depends on a large number of social contact and news multi-modal data trained cross-modal matching models, then the trained text coding models (text encoders) and image coding models (image encoders) are used for respectively obtaining respective embedding (namely corresponding text vectors and image vectors) for different modal data encodes, the embedding is added to obtain final embedding (namely vectors of candidate multi-modal knowledge or vectors of multi-modal data), and then relevancy scores are calculated for ranking. It should be noted that, in the multi-modal ranking model training stage, an image encoder and a text encoder can be learned through cross-modal learning (similarity matrix draws near the diagonal positive sample pair distance and draws away other negative sample pair distances) so that vector representations of two modes of text and image after being encoded can be mapped to the same semantic space.

For example, multi-modal sample knowledge can be determined by using multi-modal data acquired in the social field, a multi-modal matching model is trained, so that a multi-modal ranking model is obtained, and the multi-modal ranking model is used for performing rearrangement by using multi-modal information in a knowledge rearrangement stage, so that a ranking result is more accurate.

Specifically, the original multi-modal knowledge is obtained based on the multi-modal data and the search engines respectively corresponding to the multi-modal data, and the corresponding implementation process comprises the following steps: inputting the text data into a text search engine, obtaining text associated knowledge retrieved by the text search engine, inputting the picture data into a picture search engine, obtaining picture associated knowledge retrieved by the picture search engine, and finally taking the text associated knowledge and the picture associated knowledge as original multi-modal knowledge. It should be noted that the text associated knowledge is multi-modal knowledge related to the text returned based on the text retrieval; also, the picture associated knowledge is multimodal knowledge related to the picture returned based on the picture retrieval.

The text search engine (i.e., full text search engine) may refer to a database created by extracting information, mainly including web page characters, of each website from the internet, retrieves related records (i.e., text associated knowledge) matching the query conditions of the user text data, and returns results to the user in a certain order. Similarly, the picture search engine may refer to a database established by extracting information, mainly including web pictures, of each website from the internet, retrieve related records (i.e., picture association knowledge) matching the query conditions of the user picture data, and then return the results to the user in a certain arrangement order.

In addition, multi-modal knowledge with the correlation degree of the multi-modal data meeting a preset correlation degree threshold value is obtained from the candidate multi-modal knowledge, and the corresponding implementation process comprises the following steps: coding text data in the candidate multi-modal knowledge based on a text coding model in the multi-modal sequencing model to obtain a first text vector; based on a picture coding model in the multi-mode sequencing model, coding picture data in the candidate multi-mode knowledge to obtain a first picture vector; adding the first text vector and the first picture vector to obtain a vector of candidate multi-modal knowledge; based on the text coding model, coding text data in the multi-modal data to obtain a second text vector; based on the picture coding model, coding picture data in the multi-mode data to obtain a second picture vector; adding the second text vector and the second picture vector to obtain a vector of multi-modal data; and sequencing the relevance of the vector of the multi-modal data and the vector of the candidate multi-modal knowledge to obtain the final multi-modal knowledge meeting a preset relevance threshold. That is, the recalled multimodal knowledge paragraphs are ranked using a multimodal ranking model, and the multimodal knowledge paragraphs after final ranking are returned.

As shown in fig. 4, in the multi-modal ranking model training stage, an image encoder and a text encoder are learned through cross-modal opposite sides (a correlation matrix is used to reduce a distance between a diagonal positive sample pair and reduce a distance between other negative sample pairs), so that vector representations of two coded modalities of text data (i.e., text data in the multi-modal data and text data in the candidate multi-modal knowledge) and image (i.e., picture data in the multi-modal data and picture data in the candidate multi-modal knowledge) can be mapped to the same semantic space.

The rearrangement process is illustrated below by a simple example:

input multimodal data (i.e., multimodal UGC) is obtained. The content of the multimodal data includes the text "Player C and team D agree" and "the picture of the Player C when playing the ball". Then, relevant data is retrieved as knowledge thereof by a corresponding search engine based on the text data and the picture data, respectively, and the retrieved data is divided into different knowledge paragraphs, such as a multi-modal knowledge K1, a multi-modal knowledge K2, and a multi-modal knowledge K3, by paragraph. Wherein the multi-modal knowledge K1, the multi-modal knowledge K2, and the multi-modal knowledge K3 each include text data and picture data. Then, text vectors and image vectors of multi-modal UGC and multi-modal knowledge are respectively used for vector representation, and the respective text vectors and image vectors are added to obtain the final vector representation UGC _ embedding of multi-modal UGC and multi-modal knowledge (namely, a second encoding vector obtained by encoding the text data and the image data in the multi-modal data); k1_ embedding, K2_ embedding, and K2_ embedding (i.e., a third encoding vector obtained by encoding text data and picture data in the candidate multi-modal knowledge). And finally, obtaining a knowledge paragraph with the highest correlation degree by calculating the correlation degree sequence of UGC _ embedding and candidate multi-modal knowledge (K1 _ embedding, K2_ embedding and K2_ embedding).

In the knowledge recall stage of the step, the related multi-mode knowledge is recalled through text data retrieval and image data retrieval respectively, so that the recall range of the knowledge is enlarged. The multi-modal sequencing model depends on data in a large number of social fields, a cross-modal matching model trained by the multi-modal data is utilized, then, trained text encoders (text coding models) and image encoders (picture coding models) are used for coding knowledge in different modes respectively to obtain respective imbedding, the imbedding is added to obtain final imbedding, then, relevance scores of the knowledge paragraphs candied imbedding and post imbedding are calculated for sequencing, and multi-modal knowledge sequenced according to the relevance is obtained.

It should be noted that the multimodal knowledge described in the present invention includes, but is not limited to, knowledge corresponding to the picture data, knowledge corresponding to the text data, and knowledge corresponding to the video data, and is not described in detail herein.

Before step 102 is executed, it is further required to obtain a pre-set reviewer ID (i.e., user identification information), and obtain corresponding user personalized information such as "football fans, like player a, and not bothersome to player B" according to the reviewer ID.

Step 102: comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information, and emotion distribution characteristics are acquired; and embedding the emotion distribution characteristics into the multi-modal data to obtain the multi-modal data blended with the emotion characteristics.

Specifically, as shown in fig. 3, the multi-modal knowledge, the multi-modal data, and the user personalized information may be input to a comment emotion recognition model, so as to obtain an emotion distribution characteristic, i.e., an emotion distribution representation, output by the comment emotion recognition model. Further, the multi-modal data (i.e. multi-modal UGC) is input into an embedding model (embedding layer) to obtain corresponding multi-modal embedding vectors, and the multi-modal embedding vectors and the emotion distribution characteristics are fused to obtain multi-modal data fused with emotion characteristics. That is, the multimodal UGC passes through the embedding layer and is directly added with the emotion distribution characteristics to obtain multimodal data blended with the emotion characteristics, and the multimodal data are sent to a subsequent generation model. The comment emotion distinguishing model is obtained by training based on preset second multi-mode sample data and a comment emotion distinguishing result corresponding to the second multi-mode sample data.

In the process of obtaining the comment emotion distinguishing model, an emotion classification model can be trained through an open source tagging data set, then the model performance is gradually improved on comment data in the social field through active learning to obtain a final comment emotion distinguishing model of social comments, the collected comment data is subjected to emotion distinguishing through the comment emotion distinguishing model, the obtained result is used as a label of the comment emotion distinguishing model (multi-modal data + multi-modal knowledge + user personalized information) and is used for training the final comment emotion distinguishing model (multi-modal data + multi-modal knowledge + user personalized information)

Step 103: and carrying out vector coding on the multi-modal knowledge and the user personalized information to obtain a corresponding multi-modal knowledge coding vector and a corresponding personalized information coding vector.

In the implementation process of the step, the user personalized information can be input into a personalized coding model to obtain a personalized information coding vector; and inputting the multi-modal knowledge into a knowledge coding model to obtain the multi-modal knowledge coding vector. Wherein the dimension of the personalized information coding vector is the same as the dimension of the multi-modal knowledge coding vector. The personalized coding model and the knowledge coding model can be obtained through pre-training.

Specifically, the user personalized information and the multi-modal knowledge are converted into a fixed number of vectors, such as 16 × 512-dimensional vectors, through a personalized information encoder (i.e., a personalized coding model) and a multi-modal knowledge encoder (i.e., a knowledge coding model), respectively, and both the personalized coding model and the back bone of the knowledge coding model can be implemented by using a transform. In addition, in order to output a fixed dimension vector, a Percer Resampler is connected after the personalized coding model and the knowledge coding model.

Step 104: inputting the multi-modal data blended with the emotional features, the multi-modal knowledge coding vector and the personalized information coding vector into a comment generation model to obtain comment information which is generated by the comment generation model and corresponds to the multi-modal user content; the comment generation model is obtained by training based on first multi-mode sample data and standard comment information corresponding to the first multi-mode sample data. The multi-modal data (i.e. multi-modal post) integrated with emotional features is input as input, and other information such as the multi-modal knowledge and the personalized information is integrated with a comment generation model in a cross attention mode to guide the generation of final comments. And guiding final comment generation by combining multi-modal post, multi-modal knowledge, personalized information and emotional information in a comment generation stage by using the comment generation model.

It should be noted that the comment generation model can be obtained by a gated xattn-dense structure and integrating user personalized information and multi-modal knowledge. And circularly outputting the probability distribution of the next word in a predefined word list in the comment generation model, taking the word with the maximum probability as output each time, circularly repeating, and guiding the generation and the end of the expression of the output special word < eos >, thereby obtaining comment information corresponding to the multi-mode user content.

According to the method, the multi-mode information is fully utilized in the whole pipeline generated by the comment, and the multi-mode participation is realized in the processes of knowledge retrieval, knowledge rearrangement and final comment generation, so that the final comment information is higher in relevance to the production content of the user. The retrieval process of the text data and the picture data can bring back more abundant information, and particularly, the picture in the news field can often retrieve the knowledge with very large information amount. And the comment emotion information is utilized to guide (multi-modal data + multi-modal knowledge + user personalized information) the study of the comment emotion judgment model, so that the difficulty that data is not directly supervised is avoided. Relevant knowledge of the multi-modal data is blended in the generation process of the final comment generation model, and the user faces emotion information and user personalized information of the multi-modal data, so that the comment which is more personalized and rich in knowledge can be generated.

According to the personalized comment generation method, the multi-modal data are extracted from the multi-modal user content to be commented, and corresponding multi-modal knowledge is obtained based on the multi-modal data; then, comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information, and emotion distribution characteristics are acquired; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics; further carrying out vector coding on the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors; and finally, inputting the multi-modal data blended with the emotional features, the multi-modal knowledge coding vector and the personalized information coding vector into a comment generation model to obtain comment information which is generated by the comment generation model and corresponds to the multi-modal user content. The method can fully utilize multi-modal information, and improve the comment generation accuracy and robustness, so that the comment which is more personalized and rich in knowledge can be generated.

Corresponding to the personalized comment generation method, the invention also provides a personalized comment generation device. Since the embodiment of the device is similar to the embodiment of the method described above, the description is relatively simple, and please refer to the description of the embodiment of the method described above for relevant points, and the embodiment of the personalized comment generating device described below is only illustrative. Please refer to fig. 5, which is a schematic structural diagram of a personalized comment generating apparatus according to an embodiment of the present invention.

The personalized comment generation device specifically comprises the following parts:

the data acquisition unit 501 is used for extracting multi-modal data from multi-modal user content to be reviewed and acquiring corresponding multi-modal knowledge based on the multi-modal data;

the emotion analysis unit 502 is used for performing comment emotion judgment on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information to obtain emotion distribution characteristics; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics;

the vector coding unit 503 is configured to perform vector coding on the multi-modal knowledge and the user personalized information to obtain a corresponding multi-modal knowledge coding vector and a corresponding personalized information coding vector;

a comment generating unit 504, configured to input the multimodal data, the multimodal knowledge encoding vector, and the personalized information encoding vector that are blended into the emotional features into a comment generating model, and obtain comment information that is generated by the comment generating model and corresponds to the multimodal user content; the comment generation model is obtained by training based on first multi-mode sample data and standard comment information corresponding to the first multi-mode sample data.

Further, the data obtaining unit is specifically configured to:

obtaining original multi-modal knowledge based on the multi-modal data and the search engines respectively corresponding to the multi-modal data; performing paragraph rearrangement on the original multi-modal knowledge based on a multi-modal sequencing model to obtain candidate multi-modal knowledge, and acquiring multi-modal knowledge of which the correlation with the multi-modal data meets a preset correlation threshold from the candidate multi-modal knowledge;

Further, the data obtaining unit is specifically configured to:

extracting text data and picture data from the multimodal user content;

Further, the data obtaining unit is specifically configured to:

coding text data in the candidate multi-modal knowledge based on a text coding model in the multi-modal sequencing model to obtain a first text vector; based on a picture coding model in the multi-mode sequencing model, coding picture data in the candidate multi-mode knowledge to obtain a first picture vector; adding the first text vector and the first picture vector to obtain a vector of candidate multi-modal knowledge;

Further, the emotion analysis unit is specifically configured to:

The personalized comment generating device extracts multi-modal data from multi-modal user content to be commented on, and acquires corresponding multi-modal knowledge based on the multi-modal data; then, comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information, and emotion distribution characteristics are acquired; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics; further carrying out vector coding on the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors; and finally, inputting the multi-modal data blended with the emotional characteristics, the multi-modal knowledge coding vector and the personalized information coding vector into a comment generation model to obtain comment information which is generated by the comment generation model and corresponds to the multi-modal user content. The method can make full use of multi-modal information, and improve the comment generation accuracy and robustness, so that the comments which are more personalized and rich in knowledge can be generated.

Corresponding to the personalized comment generation method, the invention further provides electronic equipment. Since the embodiment of the electronic device is similar to the embodiment of the method described above, the description is relatively simple, and please refer to the description of the embodiment of the method described above for relevant points, and the electronic device described below is only exemplary. Fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. The electronic device may include: a processor (processor) 601, a memory (memory) 602 and a communication bus 603, wherein the processor 601 and the memory 602 communicate with each other through the communication bus 603 and communicate with the outside through a communication interface 604. Processor 601 may invoke logic instructions in memory 602 to perform a personalized comment generation method comprising: multi-modal data are extracted from multi-modal user content to be reviewed, and corresponding multi-modal knowledge is obtained based on the multi-modal data; comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information, and emotion distribution characteristics are acquired; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics; vector coding is carried out on the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors; inputting the multi-modal data blended with the emotional features, the multi-modal knowledge coding vector and the personalized information coding vector into a comment generation model to obtain comment information which is generated by the comment generation model and corresponds to the multi-modal user content; the comment generation model is obtained by training based on first multi-mode sample data and standard comment information corresponding to the first multi-mode sample data.

Furthermore, the logic instructions in the memory 602 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a Memory chip, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In another aspect, embodiments of the present invention further provide a computer program product, where the computer program product includes a computer program stored on a processor-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the personalized comment generation method provided by the above-mentioned method embodiments. The method comprises the following steps: multi-modal data are extracted from multi-modal user content to be reviewed, and corresponding multi-modal knowledge is obtained based on the multi-modal data; comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information, and emotion distribution characteristics are acquired; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics; vector coding is carried out on the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors; inputting the multi-modal data blended with the emotional features, the multi-modal knowledge coding vectors and the personalized information coding vectors into a comment generation model to obtain comment information which is generated by the comment generation model and corresponds to the multi-modal user content; the comment generation model is obtained by training based on first multi-mode sample data and standard comment information corresponding to the first multi-mode sample data.

In still another aspect, an embodiment of the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and when executed by a processor, the computer program is implemented to perform the personalized comment generating method provided in the foregoing embodiments. The method comprises the following steps: multi-modal data are extracted from multi-modal user content to be reviewed, and corresponding multi-modal knowledge is obtained based on the multi-modal data; comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information, and emotion distribution characteristics are obtained; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics; vector coding is carried out on the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors; inputting the multi-modal data blended with the emotional features, the multi-modal knowledge coding vectors and the personalized information coding vectors into a comment generation model to obtain comment information which is generated by the comment generation model and corresponds to the multi-modal user content; the comment generation model is obtained by training based on first multi-mode sample data and standard comment information corresponding to the first multi-mode sample data.

The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memories (NAND FLASH), solid State Disks (SSDs)), etc.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A personalized comment generating method is characterized by comprising the following steps:

comment emotion judgment is carried out on the basis of the multi-modal knowledge, the multi-modal data and the acquired user personalized information, and emotion distribution characteristics are acquired; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics;

vector coding is carried out on the multi-modal knowledge and the user personalized information, and corresponding multi-modal knowledge coding vectors and personalized information coding vectors are obtained;

2. The personalized comment generating method of claim 1, wherein the obtaining of the corresponding multimodal knowledge based on the multimodal data specifically comprises:

3. The method for generating personalized comments according to claim 2, wherein the extracting multimodal data from multimodal user content to be reviewed specifically comprises:

extracting text data and picture data from the multimodal user content;

4. The personalized comment generating method according to claim 2, wherein the obtaining of the multi-modal knowledge from the candidate multi-modal knowledge, the relevance of which to the multi-modal data satisfies a preset relevance threshold, specifically comprises:

and sequencing the relevance of the vector of the multi-modal data and the vector of the candidate multi-modal knowledge to obtain the final multi-modal knowledge meeting a preset relevance threshold.

5. The method for generating personalized comments according to claim 1, wherein the obtaining of the emotion distribution characteristics by performing comment emotion discrimination based on the multi-modal knowledge, the multi-modal data, and the obtained personalized information of the user specifically includes:

6. The personalized comment generating method according to claim 1, wherein vector-coding the multi-modal knowledge and the user personalized information to obtain corresponding multi-modal knowledge coding vectors and personalized information coding vectors, specifically comprising:

inputting the user personalized information into a personalized coding model to obtain the personalized information coding vector; inputting the multi-modal knowledge into a knowledge coding model to obtain the multi-modal knowledge coding vector; and the dimensionality of the personalized information coding vector is the same as the dimensionality of the multi-mode knowledge coding vector.

7. The personalized comment generating method according to claim 1, wherein the embedding the emotion distribution feature into the multimodal data to obtain the multimodal data blended with emotion features specifically comprises:

8. A personalized comment generating apparatus, comprising:

the emotion analysis unit is used for judging comment emotion based on the multi-modal knowledge, the multi-modal data and the acquired user personalized information to obtain emotion distribution characteristics; embedding the emotion distribution characteristics into the multi-modal data to obtain multi-modal data blended with emotion characteristics;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the personalized comment generating method according to any one of claims 1 to 7 when executing the computer program.

10. A processor-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the personalized comment generating method according to any one of claims 1 to 7.