CN114970494A

CN114970494A - Comment generation method and device, electronic equipment and storage medium

Info

Publication number: CN114970494A
Application number: CN202110211022.2A
Authority: CN
Inventors: 陈小帅
Original assignee: Tencent Technology Beijing Co Ltd
Current assignee: Tencent Technology Beijing Co Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2022-08-30

Abstract

The application relates to the technical field of computers, in particular to a comment generation method and device, electronic equipment and a storage medium, and aims to improve comment generation efficiency and accuracy. The method comprises the following steps: responding to a comment request triggered by a target object aiming at target multimedia content, and performing word segmentation processing on first description information of the target multimedia content to obtain each word segmentation in the first description information; obtaining a target description feature for the target object based on second description information of the target object and a preset candidate object description feature set; and predicting comment information to be issued of the target object aiming at the target multimedia content based on the target description characteristics and the participles. According to the method and the device, the personalized representation of the object is learned while the comment is generated, and when an automatic comment function is provided for a user, the comment is generated quickly and accurately by using the target multimedia content and the personalized description information of the object, and the personalized demand of the automatic comment on the current object is improved.

Description

Comment generation method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computers, in particular to a comment generation method and device, an electronic device and a storage medium.

Background

With the rapid development of internet technologies, various social application platforms also come into play, specifically including various instant messaging applications, content sharing platforms and the like, all existing social application platforms have a social function, a user can post multimedia contents such as articles, videos and pictures on the social application platforms, platform contacts can forward, approve and comment each other, and the posted contents are interactively discussed and published.

Taking video comments as an example, when video comments are generated for users in the related technology, the video comments are generally carried by comments of similar videos or generated based on video contents, and the generated comments are basically the same for all users, so that the generated comments are not accurate enough, and the acceptance rate of the comments is not high.

Disclosure of Invention

The embodiment of the application provides a comment generation method and device, electronic equipment and a storage medium, and aims to improve comment generation efficiency and accuracy.

The comment generation method provided by the embodiment of the application comprises the following steps:

responding to a comment request triggered by a target object aiming at target multimedia content, and performing word segmentation processing on first description information of the target multimedia content to obtain each word segmentation in the first description information; and

obtaining a target description feature for the target object based on second description information of the target object and a preset candidate object description feature set;

and predicting comment information to be issued by the target object aiming at the target multimedia content based on the target description characteristics and the word segmentation.

The comment generation device provided by the embodiment of the application comprises:

the word segmentation processing unit is used for responding to a comment request triggered by a target object aiming at target multimedia content, performing word segmentation processing on first description information of the target multimedia content, and obtaining each word segmentation in the first description information; and

the feature extraction unit is used for obtaining a target description feature for the target object based on second description information of the target object and a preset candidate object description feature set;

and the predicting unit is used for predicting comment information to be issued by the target object aiming at the target multimedia content based on the target description characteristics and the word segmentation.

Optionally, the word segmentation processing unit is specifically configured to:

inputting the first description information of the target multimedia content into a trained comment generation model;

performing word segmentation and coding processing on the first description information based on a coding part in the trained comment generation model to obtain each word segmentation in the first description information and a word vector of each word segmentation.

Optionally, the prediction unit is specifically configured to:

inputting the word vector of each participle and the target description feature into a decoding part in the trained comment generation model, and performing decoding processing based on the decoding part to acquire the comment information output by the trained comment generation model;

the trained comment generation model is obtained by training based on a training sample data set, wherein each training sample in the training sample data set comprises an object portrait of a sample object, description information of sample multimedia content, and a real comment issued by the sample object aiming at the sample multimedia content.

Optionally, if the comment information includes a plurality of comment words, the prediction unit is specifically configured to:

sequentially generating each comment word in the comment information in a loop iteration mode; wherein, in a loop iteration process, the following operations are executed:

inputting the comment words output last time into the decoding part, wherein the comment words input into the decoding part for the first time are preset initial marker words;

and decoding the comment words output at this time based on the comment words output last time, the word vectors of the participles and the target description characteristics to generate the comment words output at this time.

Optionally, the apparatus further comprises:

a model training unit, configured to obtain the trained comment generation model through training in the following manner:

selecting training samples from the training sample data set;

performing loop iteration training on the untrained comment generation model according to the training sample to obtain the trained comment generation model, wherein each training iteration training comprises the following operations:

inputting the object portrait of the sample object in the training sample and the description information of the sample multimedia content into an untrained comment generation model, and obtaining a predicted comment output by the untrained comment generation model;

performing parameter adjustment on the untrained comment-generating model based on an error between the predicted comment and a true comment in a corresponding training sample.

Optionally, the model training unit is specifically configured to:

and selecting a training sample containing a sample object with the number of times of comments reaching a third preset threshold value from the training sample data set, or selecting a training sample containing a sample multimedia content with the number of times of comments reaching a fourth preset threshold value.

An electronic device provided by an embodiment of the present application includes a processor and a memory, where the memory stores program codes, and when the program codes are executed by the processor, the processor is caused to execute any one of the steps of the comment generating method.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the steps of any of the comment generating methods described above.

An embodiment of the present application provides a computer-readable storage medium, which includes program code, when the program product runs on an electronic device, the program code is configured to enable the electronic device to execute the steps of any one of the comment generating methods described above.

The beneficial effect of this application is as follows:

the embodiment of the application provides a comment generation method and device, electronic equipment and a storage medium, and the comment generation method and device simultaneously learn the personalized expression of an object when the comment is generated, and simultaneously use the personalized description information of the target multimedia content and the object when an automatic comment function is provided for a user, so that the comment is quickly and accurately generated, the personalized demand of the automatic comment on the current object is improved, the object acceptance rate of the automatic comment is further improved, and the auxiliary effect of the automatic comment function on object interaction is enhanced.

Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is an alternative schematic diagram of an application scenario in an embodiment of the present application;

fig. 2 is a schematic flowchart of a comment generation method in the embodiment of the present application;

FIG. 3 is a flowchart of a method for generating personalized automatic comments of a user in an embodiment of the present application;

FIG. 4 is a flowchart illustrating a process of constructing a model for generating personalized comments for a user according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a comment generation model in the embodiment of the present application;

FIG. 6 is a flow chart illustrating an alternative training method for a comment generation model in an embodiment of the present application;

fig. 7 is a user vector obtaining method for a user with a small number of comments in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a composition of a comment generating apparatus in an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device to which an embodiment of the present application is applied;

fig. 10 is a schematic structural diagram of another electronic device to which the embodiments of the present application are applied.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.

Some concepts related to the embodiments of the present application are described below.

Multimedia content: a man-machine interactive information exchange and transmission medium combining two or more media. The media includes text, pictures, sound, movies, etc. In the embodiment of the present application, the multimedia content may be articles, information, video, music, and the like.

Description information: in the embodiment of the present application, the description information is divided into description information of multimedia content and description information of an object, where the description information of the multimedia content is mainly used for describing attributes of the multimedia content, and taking the multimedia content as an example, the description information of the video mainly refers to video text content, including title text, dialog text recognized based on Automatic Speech Recognition (ASR), subtitle text recognized based on Optical Character Recognition (OCR), and the like. The description information of the object comprises one or more of an object representation of the object and an object identifier of the object, wherein the object is mainly used for the object representation which can be a user representation. In the embodiment of the present application, the first description information and the second description information both refer to description information, and the first and second are defined for distinction, specifically, the first description information is description information of multimedia content, and the second description information is description information of an object.

The candidate object describes a feature set: the feature set is pre-constructed in the embodiment of the application, and contains description features of a plurality of candidate objects, and the description features can also be in a depth vector representation form and can be obtained based on machine learning. Wherein the candidate object refers to a sample object.

User portrait: the method is also called as a user role, and the user role is an effective tool for drawing a target user and contacting user appeal and design direction, and mainly comprises basic information such as age, gender, occupation, behavior preference and the like of the user. User portrayal has found wide application in a variety of fields. In practice, the most superficial and life-related words usually link the user's attributes, behavior and expectations. As a virtual representation of an actual user, a user role formed by user images is constructed according to multimedia operation data and preferences of the user in a period of time, and the formed user role needs to represent a main audience and a target group of products. In the embodiment of the present application, the user portrayal of the candidate object and the target object can be represented as an information tag weight set, wherein the information tag is mainly used for describing the interest of the user and can also be called as an interest tag. The set comprises at least one interest tag and a weight corresponding to each interest tag.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross subject, and relates to multi-domain subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to technologies such as artificial intelligence, natural language processing and machine learning. The method for training the comment generation model provided in the embodiment of the application can be divided into two parts, including a training part and an application part; the training part trains a comment generation model through the machine learning technology, so that a comment generation model is trained by a training sample containing a sample object and sample multimedia content, a prediction comment output by the comment generation model is obtained after the training sample passes through the comment generation model, and model parameters are continuously adjusted through an optimization algorithm in combination with real comments labeled by the training sample; the application part is used for generating comments to be issued to the target object aiming at the target multimedia content by using the comment generation model trained in the training part. In addition, it should be further noted that, in the embodiment of the present application, the comment generating model may be trained online or offline, and is not limited specifically herein.

The following briefly introduces the design concept of the embodiments of the present application:

with the rapid development of information technology and the internet, multimedia contents such as online information, videos, short videos, electronic books, internet articles, forum posts and the like which allow readers or audiences to make comments are more and more popular with people, and become a main way for people to obtain information in daily life. People can acquire and browse various multimedia contents presented in the form of pictures, texts or videos through some main web portals or large news websites or short video Applications (APPs).

By taking multimedia content as a video as an example, when video comments are generated for users in the related technology, the comments are generally carried by similar videos, or the comments are generated based on the video content, and the generated comments are basically the same for all users, so that the generated comments are not accurate enough, and the acceptance rate of the comments is not high.

In view of this, the embodiment of the present application provides a comment generation method and apparatus, an electronic device, and a storage medium, where the comment generation method and apparatus simultaneously learn personalized representations of users when generating comments, and when providing an automatic comment function for a user, use personalized description information of a user and target multimedia content at the same time, quickly and accurately generate comments, improve personalized requirements of automatic comments on current users, further improve user acceptance rate of automatic comments, and enhance an auxiliary effect of the automatic comment function on user interaction.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are merely for illustrating and explaining the present application and are not intended to limit the present application, and that the embodiments and features described in the embodiments in the present application may be combined with each other without conflict.

Fig. 1 is a schematic view of an application scenario in the embodiment of the present application. The application scenario diagram includes two terminal devices 110 and a server 120. The terminal device 110 is provided with a client, and the client can be logged in through the terminal device 110. The client related to the embodiment of the present application may be software, or a web page, an applet, and the like, and the server 120 is a background server corresponding to the software, or the web page, the applet, and the like, and the specific type of the client is not limited.

It should be noted that fig. 1 is only an example, and the number of the terminal devices and the servers is not limited in practice, and is not specifically limited in the embodiment of the present application.

The terminal device 110 and the server 120 may communicate with each other through a communication network. In an alternative embodiment, the communication network is a wired network or a wireless network. The terminal device 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

In this embodiment, the terminal device 110 is an electronic device used by a user, and the electronic device may be a computer device having a certain computing capability and running instant messaging software and a website or social contact software and a website, such as a personal computer, a mobile phone, a tablet computer, a notebook, an e-book reader, a smart television, and a smart home. The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

The comment generation model may be deployed on the server 120 for training, and a large number of training samples may be stored in the server 120 for training the comment generation model. Optionally, after the comment generating model is obtained based on the training method in the embodiment of the present application through training, the trained comment generating model may be directly deployed on the server 120 or the terminal device 110. In this embodiment of the present application, a comment generation model may be deployed on the server 120, and in this embodiment of the present application, the comment generation model is mainly used for automatically generating a personalized comment for a target object when the target object intends to comment on target multimedia content.

In a possible application scenario, the training samples in the present application may be stored by using a cloud storage technology. A distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or an application interface to cooperatively work by using functions such as cluster application, grid technology, and a distributed storage file system, and provides a data storage function and a service access function to the outside.

In one possible application scenario, the servers 120 may be deployed in different regions for reducing communication delay, or different servers 120 may serve the regions corresponding to the terminal devices 10 respectively for load balancing. The plurality of servers 120 share data through a blockchain, and the plurality of servers 120 correspond to a data sharing system formed by the plurality of servers 120. For example, terminal device 110 is located at location a and communicatively coupled to server 120, and terminal device 110 is located at location b and communicatively coupled to other servers 120.

Each server 120 in the data sharing system has a node identifier corresponding to the server 120, and each server 120 in the data sharing system may store node identifiers of other servers 120 in the data sharing system, so that the generated block is broadcast to other servers 120 in the data sharing system according to the node identifiers of other servers 120. Each server 120 may maintain a node identifier list as shown in the following table, and store the name of the server 120 and the node identifier in the node identifier list correspondingly. The node identifier may be an IP (internet protocol) address and any other information that can be used to identify the node, and table 1 only illustrates the IP address as an example.

TABLE 1

Server name	Node identification
		Node 1	119.115.151.174
Node 2	118.116.189.145
		…	…
Node N	119.124.789.258

The comment generation method provided by the exemplary embodiment of the present application is described below with reference to the drawings in conjunction with the application scenarios described above, it should be noted that the above application scenarios are only shown for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in any way in this respect.

Note that, the comment generating method in the embodiment of the present application may be executed by a server or a terminal device alone, or may be executed by both the server and the terminal device.

Referring to fig. 2, an implementation flowchart of a comment generating method provided in the embodiment of the present application is described here as an example executed by a terminal device. The specific implementation flow of the method is as follows:

s21: the method comprises the steps that a terminal device responds to a comment request triggered by a target object aiming at target multimedia content, word segmentation processing is carried out on first description information of the target multimedia content, and each word segmentation in the first description information is obtained; and

s22: the terminal equipment obtains target description characteristics aiming at the target object based on second description information of the target object and a preset candidate object description characteristic set;

the multimedia content refers to online information, audio and video, short videos, electronic books, internet articles, forum posts and other network contents allowing readers or audiences to comment. In the embodiments of the present application, details are mainly described by taking video as an example.

In the embodiment of the present application, when a target object intends to comment on a target multimedia content, a comment request may be triggered, for example: when a user clicks a control for launching a bullet screen in the process of watching a video, or when the user watches a short video, clicks a comment control, and the like, a personalized comment of a target object for a target multimedia content can be generated based on the comment automatic generation method in the embodiment of the application. Through automatic comment generation, the comment input speed of the target object is improved, the comment input cost of the target object is reduced, and the interaction efficiency and experience of the target object are improved.

Specifically, feature extraction is preferably performed on the target object, and word segmentation is performed on the target multimedia content. Further, the comment is generated by step S23.

S23: and the terminal equipment predicts comment information to be issued of the target object aiming at the target multimedia content based on the target description characteristics and the word segmentation.

According to the embodiment of the application, when the comment is generated, the personalized expression of the user is learned at the same time, when the automatic comment function is provided for the user, the target multimedia content and the personalized description information of the user are used at the same time, the comment is generated quickly and accurately, the personalized demand of the automatic comment on the current user is improved, the user acceptance rate of the automatic comment is further improved, and the auxiliary effect of the automatic comment function on user interaction is enhanced.

It should be noted that, in the embodiment of the present application, a candidate object description feature set needs to be constructed in advance, where the feature set includes description features of many candidate objects, and the description features may also be in a depth vector representation form and may be obtained based on machine learning. Wherein the candidate object refers to a sample object.

In an optional implementation manner, the candidate objects in the embodiment of the present application are all objects that have a comment posting frequency reaching a certain threshold, that is, objects with high frequency comment.

In an alternative embodiment, when step S22 is executed, the following two cases can be specifically distinguished:

in case one, the target object is one of the candidates.

In this case, the second description information of the target object includes an object identifier of the target object, that is, a user ID, and at this time, based on the user ID, a description feature corresponding to the ID may be queried from the candidate object description feature set, that is, the target description feature of the target object.

In case two, the target object is not one of the candidates.

When the target object has few comments or has no comments, the corresponding comment frequency is smaller than a set threshold value, that is, the target object is a low-frequency comment object, some similar objects can be screened from a preset candidate object description feature set at this time, and the target description feature of the target object is further determined based on the description features of the similar objects.

Specifically, when searching for similar objects, the similarity between each candidate object and the target object is determined mainly based on the object portrait in the second description information, and then the similarity is screened. Finally, the description features of at least one candidate object with the similarity reaching a first preset threshold are obtained from the candidate object description feature set, and the target description features of the target object are determined based on the obtained description features of the at least one candidate object.

In the embodiment of the present application, in consideration of the uncertainty of the number of the candidate objects with the queried similarity reaching the first preset threshold, when determining the target description feature of the target object based on the obtained description feature of at least one candidate object, the following two methods may be adopted:

in the first mode, if the at least one inquired candidate object does not reach the set number, the corresponding similarity is used as the weight, and the description characteristics of the at least one inquired candidate object are subjected to weighted summation to obtain the target description characteristics.

For example, the number is set to 5, assuming that there are 4 candidate objects queried, the corresponding feature describing vectors are D1, D2, D3 and D4, and the corresponding weights, i.e., the similarities between the candidate objects and the target objects, are assumed to be s1, s2, s3 and s4, respectively. Then, the target description feature D of the target object is s1 · D1+ s2 · D2+ s3 · D3+ s4 · D4.

And secondly, if at least one inquired candidate object reaches the set number, screening the candidate objects meeting the set number according to the corresponding similarity, taking the corresponding similarity as the weight, and performing weighted summation on the description characteristics of each screened candidate object to obtain the target description characteristics.

For example, the number is set to 5, and it is assumed that there are 8 queried candidate objects, at this time, the candidate objects with the top 5 of the similarity may be selected according to the similarity ranking, and it is assumed that the corresponding feature description vectors are D1, D2, D3, D4, and D5, and the corresponding weights, that is, the similarities between the candidate objects and the target object, are s1, s2, s3, s4, and s5, respectively, so that the target feature description D of the target object is s1 · D1+ s2 · D2+ s3 · D3+ s4 · D4+ s5 · D5.

In the embodiment of the application, the high-frequency comment user can inquire the user vector representation directly based on the user ID, the low-frequency comment user needs to find out similar high-frequency comment users, and then the vector representation of the low-frequency comment user is calculated based on the similar high-frequency user representation.

It should be noted that the candidate object description feature set in the embodiment of the present application is also referred to as a user personalized vector table, where the table only stores high-frequency user vector representations, and when querying a high-frequency similar user for a comment low-frequency user, the similarity of the image is compared with the high-frequency user, and vector weighted summation of at least one user whose similarity satisfies a certain threshold is used as the vector representation of the current low-frequency user. The video content and the user personalized expression are used for generating the comment at the same time, so that the automatically generated comment better meets the personalized requirement of the current user, and the effect of customizing the automatic comment for the user is achieved.

In an alternative embodiment, the similarity between the two objects is obtained based on the degree of similarity of the object representations between the target object and the candidate object. The object portrait comprises at least one information tag and a tag weight corresponding to the at least one information tag, wherein the information tag is obtained by analyzing the historical behavior of the object and can be used for expressing the interest of the object, so the information tag can also be called an interest tag.

For example, user A's user representation is { (Liu somebody, 0.231), (laugh paragraph, 0.226), …, (bake, 0.097) }, where the value is the user's preference weight for this interest tag, and the user representation is constructed by iterative learning over the user's historical play behavior.

In the embodiment of the application, the similarity of two users is the weighted sum of the interest tag overlap ratios of the two users. Specifically, for any candidate object, firstly, an object portrait of the candidate object needs to be obtained, and further, each interest tag in the object portrait of the target object is compared with each interest tag in the object portrait of the candidate object respectively to determine the coincidence degree between every two interest tags; and then taking the sum of the products of the coincidence degree of the interest label and the corresponding weight as the similarity between the target object and the candidate object.

The text similarity between every two interest tags can be used as the coincidence degree of the two interest tags, and the weight corresponding to every two interest tags is determined according to the tag weight of each interest tag in every two interest tags. For example, the overlap ratio between the interest tag a in the target object and the interest tag B in the candidate object is 0.8 (the overlap ratio may range from 0 to 1), the tag weight corresponding to the interest tag a is a, and the tag weight corresponding to the interest tag B is B, the overlap ratio of the two interest tags is (a + B)/2, and finally, the product x1 of the overlap ratio of the two interest tags and the corresponding weights is 1 (a + B)/2.

Assuming that xi represents the product of the overlap ratio of two interest tags and the corresponding weight (i ═ 1, 2, …, 6), the similarity between the target object and the candidate object is the sum of the overlap ratio of each two interest tags and the corresponding weight, i.e. x1+ x2+ … + x 6.

Of course, in order to increase the calculation speed, it may also be directly compared whether the two interest tags are completely consistent, if the two interest tags are completely consistent, the corresponding contact ratio is 1, otherwise, the corresponding contact ratio is 0, and actually, there are many calculation manners, which are not specifically limited herein.

In the above embodiment, the similarity between the users is calculated by the similarity calculation of the user figures, and the user figures are the interest tag weight set of the users.

It should be noted that the comment generating method in the embodiment of the present application may also be implemented based on machine learning, specifically, a trained comment generating model is input by using the first description information of the target multimedia content; performing word segmentation and coding processing on the first description information based on a coding part in the trained comment generation model to obtain each word segmentation in the first description information and a word vector of each word segmentation; and further inputting the word vector and the target description characteristics of each participle into a decoding part in the trained comment generation model, and performing decoding processing based on the decoding part to acquire comment information output by the trained comment generation model.

The trained comment generation model in the embodiment of the application is obtained by training based on a training sample data set, and each training sample in the training sample data set comprises an object portrait of a sample object, description information of sample multimedia content, and a real comment issued by the sample object for the sample multimedia content.

Taking multimedia content as an example, refer to fig. 3, which is a flowchart of a method for generating personalized automatic comments for a user in the embodiment of the present application. In the embodiment of the application, a user personalized comment generation model is trained and generated based on a video library and a user comment library, wherein sample multimedia content and description information are stored in the video library, and real comments issued by sample objects aiming at the sample multimedia content and the description information of the sample multimedia content are stored in the user comment library. In addition, a user vector can be derived through model training, and a user vector library is constructed, namely the candidate object description feature set listed in the embodiment of the application.

When the user intends to comment on the video, for example, when the user clicks a comment button on a client to comment, a user vector can be obtained based on the user portrait, and then the user personalized comment is generated based on the video content and the user vector. Wherein, the user portrait is stored in the user portrait library, and the corresponding portrait information can be inquired from the user portrait library based on the user ID.

In the embodiment of the application, the comments meeting the individuation of the user are automatically generated based on the current video content and the user information, so that the user can directly use the automatic comments for interaction, and the comment interaction experience is improved.

Referring to fig. 4, which is a flow chart for constructing a user personalized comment generation model in the embodiment of the application, an automatic user personalized comment generation model is constructed based on a large amount of platform video comment data, a trained user depth representation is derived from a user vector representation library, and a video comment is generated for later use based on user personalized features. The method comprises the specific processes of inputting description information of videos in a video library and description information of users in a user comment library into a comment generation model, obtaining comments of the users, output by the model, of the users for the videos, comparing the obtained comments with corresponding real comments of the users to the videos in the user comment library, adjusting model parameters, and continuously carrying out iterative training to obtain a trained comment generation model.

The following describes in detail a comment generation model in the embodiment of the present application:

fig. 5 is a schematic structural diagram of a comment generation model in the embodiment of the present application. The model is an automatic comment generation model constructed based on a Transformer Encoder-Decoder model, and specifically comprises two parts: an encoding portion and a decoding portion.

In the embodiment of the application, the training data of the user personalized comment generation model is as follows, in order to enable the user personalized comment automatic generation model and the user vector representation learning to be more sufficient, a user with a comment amount larger than UC (namely, a third preset threshold) and a video with a comment amount larger than VC (namely, a fourth preset threshold) are selected, and only comments of the user and the video are reserved in the training data. And only the high-frequency comment user part is in the generated user vector table. For example, users who post over 1000 reviews are selected, and videos with over 1500 reviews are selected. The third preset threshold and the fourth preset threshold may be the same or different, and are not specifically limited herein.

Different comments can be published by different users for the same video text, and different comments can also be published by the same user for different video texts. Thus, the data structure of the training sample may be: video 1 text content, user ID1, user comment 1; video 1 textual content, user ID2, user comment 2; video 2 textual content, user ID1, user comment 3; video 2 textual content, user ID3, user comment 4; video 2 textual content, user ID4, user comment 5; …, respectively; video v text content, user IDu, user comment c.

Each training sample comprises description information of a video, description information of a user and a real comment issued to the video by the user. The video text content identifies the description information of the video, the user ID represents the description information of the user, and in addition, the description information of the user can further contain the image information of the user or be directly inquired from a user image library according to the user ID.

In the embodiment of the application, the input characteristics of the Encoder part of the user personalized automatic comment generation model are video text content, including title text, dialog text recognized based on ASR (auto text recognition), and subtitle text recognized based on OCR (optical character recognition). Considering that the dialog text and the subtitle text may be long, the key information may be retained by extracting the keywords. The video text content is converted into word vector representations, such as word 1, word 2, …, word n shown in fig. 5, by segmenting, and ID-based query vectors. Then, a video text depth representation is constructed through a transform Encoder part, namely, a word 1 representation, a word 2 representation, … and a word n representation are shown in fig. 5.

In an optional implementation manner, if the comment information to be output by the final model includes a plurality of comment words, when the decoding portion decodes the word vector (i.e., the word 1 representation and the word 2 representation … shown in fig. 5) and the target description feature (i.e., the user representation in fig. 5) of each participle to obtain the comment information output by the trained comment generation model, a loop iteration manner needs to be adopted to sequentially generate each comment word in the comment information. Wherein, in each loop iteration process, the following operations are executed:

firstly, the comment word output last time needs to be input into the decoding part again, and then decoding processing is performed on the basis of the comment word output last time, the word vector of each participle and the target description feature, so that the comment word output this time is generated.

For example, as shown in fig. 5, when a comment word 1 is generated, the initial marker word that is input correspondingly is < S >, then, when a comment word 2 is generated, a comment word 1, … is input correspondingly, and when a comment word n is generated, a comment word n-1 is input correspondingly.

That is, when each comment word is generated in the model Decoder, the comment word generated in the previous step is used as input, and the current user vector and the word vector of each participle are simultaneously used as model input features, the user vector representation is queried through the current user ID, and whether to copy or select vocabulary generation from the original video text is calculated based on the copy mechanism when the word is specifically generated.

In the training stage, loss is calculated by comparing the difference between the generated comment words and the real comments in the training samples, and model parameters and user vector representation are updated through error back propagation. By the method, the comments generated by the model can meet the current video content and meet the personalized requirements of users.

Fig. 6 is a schematic flowchart of a training method for a comment generation model in the embodiment of the present application, where the method may be executed by a server or a terminal device alone, or may be executed by both the server and the terminal device, and here, it is described that the method is executed by the server as an example. The specific implementation flow of the method is as follows:

step S601: the server selects training samples from the training sample data set;

step S602: the server inputs the object portrait of the sample object in the training sample and the description information of the sample multimedia content into an untrained comment generation model;

step S603: the server obtains the prediction comments output by the untrained comment generation model;

step S604: the server adjusts parameters of the untrained comment generation model based on errors between the predicted comment and the real comment in the corresponding training sample;

step S605: the server judges whether the comment generation model subjected to parameter adjustment is converged, if so, the step S606 is executed, otherwise, the step S601 is returned;

step S606: and the server takes the comment generation model obtained after the parameter adjustment as a trained comment generation model.

In step S601, when a training sample is selected from the training sample data set, specifically, a training sample whose number of times of comments of a sample object included in the training sample set reaches a third preset threshold is selected, or a training sample whose number of times of comments of sample multimedia contents included in the training sample set reaches a fourth preset threshold is selected. In addition, when the training sample data set is constructed, only the training samples meeting the above conditions are selected to construct the data set, so that the training samples can be randomly selected when being selected from the training sample data set without referring to the above conditions.

When the personalized comment generation model based on the user and the video is used for building the personalized comment for the user, the deep representation of the video text is built according to the model Encoder format requirement input model according to the video text content, and the personalized automatic comment is gradually generated by the model Decoder part based on the current user vector and the vocabulary generated in the last step.

Fig. 7 is a diagram illustrating a user vector obtaining method for a user with a small number of comments according to an embodiment of the present application. When high-frequency similar users are inquired for low-frequency comment users, comparing the similarity of the portrait with high-frequency users, and weighting and summing the vectors of top k users with the similarity meeting a certain threshold value to be used as the vector representation of the current low-frequency user. The video content and the user personalized expression are used for generating the comment at the same time, so that the automatically generated comment better meets the personalized requirement of the current user, and the effect of customizing the automatic comment for the user is achieved.

In summary, the method for generating the user personalized automatic comment is provided, when the automatic comment is generated for the user, the user depth vector description is introduced, the user personalized description is learned, and when the automatic comment function is provided for the user, the video content and the user personalized information are used simultaneously, so that the personalized demand of the automatic comment for the current user is improved, the user acceptance rate of the automatic comment is further improved, the auxiliary effect of the automatic comment function on user interaction is enhanced, and the video interaction efficiency of the user is improved.

Based on the same inventive concept, the embodiment of the application also provides a comment generation device. As shown in fig. 8, which is a schematic structural diagram of a comment generating apparatus 800 provided in an embodiment of the present application, the comment generating apparatus may include:

a word segmentation processing unit 801, configured to perform word segmentation processing on first description information of a target multimedia content in response to a comment request triggered by a target object for the target multimedia content, to obtain each word segmentation in the first description information; and

a feature extraction unit 802, configured to obtain a target description feature for the target object based on second description information of the target object and a preset candidate object description feature set;

and the predicting unit 803 is configured to predict comment information to be issued by the target object for the target multimedia content based on the target description features and the respective participles.

Optionally, the feature extraction unit 802 is specifically configured to:

if the target object is one of the candidate objects, inquiring corresponding target description characteristics from the candidate object description characteristic set based on the object identification in the second description information;

if the target object is not one of the candidate objects, determining the similarity between each candidate object and the target object based on the object portrait in the second description information; obtaining the description features of at least one candidate object with the similarity reaching a first preset threshold from the candidate object description feature set; determining a target description feature of the target object based on the obtained description feature of the at least one candidate object;

and the candidate object is an object with the number of times of comments reaching a second preset threshold value.

Optionally, the feature extraction unit 802 is specifically configured to:

if the at least one inquired candidate object does not reach the set number, taking the corresponding similarity as the weight, and carrying out weighted summation on the description characteristics of the at least one inquired candidate object to obtain the target description characteristics;

if at least one inquired candidate object reaches the set number, screening out the candidate objects meeting the set number according to the corresponding similarity, and taking the corresponding similarity as weight to perform weighted summation on the description characteristics of each screened candidate object to obtain target description characteristics.

Optionally, the feature extraction unit 802 is specifically configured to:

for any candidate object, acquiring an object portrait of any candidate object, wherein the object portrait comprises at least one information tag and tag weight corresponding to the at least one information tag, and the information tag is obtained according to object historical behavior analysis;

respectively comparing each information label in the object portrait of the target object with at least one information label in the object portrait of any candidate object, and determining the coincidence degree between every two information labels;

and taking the sum of products of the coincidence degree between every two information labels and the corresponding weight as the similarity between the target object and any one candidate object, wherein the weight corresponding to every two information labels is determined according to the label weight of each information label in every two information labels.

Optionally, the word segmentation processing unit 801 is specifically configured to:

inputting first description information of target multimedia content into a trained comment generation model;

and performing word segmentation and coding processing on the first description information based on a coding part in the trained comment generation model to obtain each word segmentation in the first description information and a word vector of each word segmentation.

Optionally, the prediction unit 803 is specifically configured to:

inputting the word vector and the target description characteristics of each participle into a decoding part in the trained comment generation model, and performing decoding processing based on the decoding part to obtain comment information output by the trained comment generation model;

the trained comment generation model is obtained by training based on a training sample data set, and each training sample in the training sample data set comprises an object portrait of a sample object, description information of sample multimedia content and a real comment issued by the sample object aiming at the sample multimedia content.

Optionally, if the comment information includes a plurality of comment words, the prediction unit 803 is specifically configured to:

inputting the comment words output last time into a decoding part, wherein the comment words input into the decoding part for the first time are preset initial marker words;

and decoding the comment words output last time, the word vectors of all the participles and the target description characteristics to generate the comment words output this time.

Optionally, the apparatus further comprises:

the model training unit 804 is configured to obtain a trained comment generation model through the following training:

selecting training samples from a training sample data set;

performing loop iteration training on the untrained comment generation model according to the training sample to obtain a trained comment generation model, wherein each training iteration training comprises the following operations:

inputting an object portrait of a sample object in a training sample and description information of sample multimedia content into an untrained comment generation model, and obtaining a prediction comment output by the untrained comment generation model;

the untrained comment-generating model is parametrically adjusted based on the error between the predicted comment and the true comment in the corresponding training sample.

Optionally, the model training unit 804 is specifically configured to:

and selecting a training sample containing a sample object with the number of times of comment reaching a third preset threshold value from the training sample data set, or selecting a training sample containing a sample multimedia content with the number of times of comment reaching a fourth preset threshold value.

For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.

Having described the comment generating method and apparatus of the exemplary embodiment of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

The electronic equipment is based on the same inventive concept as the method embodiment, and the embodiment of the application also provides the electronic equipment. The electronic device may be used to generate user comments. In one embodiment, the electronic device may be a server, such as server 120 shown in FIG. 1. In this embodiment, the electronic device may be configured as shown in fig. 9, and include a memory 901, a communication module 903, and one or more processors 902.

A memory 901 for storing computer programs executed by the processor 902. The memory 901 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, programs required for running an instant messaging function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.

Memory 901 may be a volatile memory (RAM), such as a random-access memory (RAM); the memory 901 may also be a non-volatile memory (non-volatile memory), such as a read-only memory (rom), a flash memory (flash memory), a hard disk (HDD) or a solid-state drive (SSD); or the memory 901 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. Memory 901 may be a combination of the above.

The processor 902 may include one or more Central Processing Units (CPUs), a digital processing unit, and the like. A processor 902 for implementing the above comment generating method when calling the computer program stored in the memory 901.

The communication module 903 is used for communicating with terminal equipment and other servers.

The embodiment of the present application does not limit the specific connection medium among the memory 901, the communication module 903, and the processor 902. In the embodiment of the present application, the memory 901 and the processor 902 are connected through the bus 904 in fig. 9, the bus 904 is depicted by a thick line in fig. 9, and the connection manner between other components is merely illustrative and is not limited. The bus 904 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 9, but only one bus or one type of bus is not depicted.

The memory 901 stores a computer storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are used for implementing the comment generation method according to the embodiment of the present application. The processor 902 is configured to execute the comment generating method described above, as shown in fig. 2.

In another embodiment, the electronic device may also be other electronic devices, such as the terminal device 110 shown in fig. 1. In this embodiment, the structure of the electronic device may be as shown in fig. 10, including: a communications component 1010, a memory 1020, a display unit 1030, a camera 1040, a sensor 1050, audio circuitry 1060, a bluetooth module 1070, a processor 1080, and the like.

The communication component 1010 is configured to communicate with a server. In some embodiments, a Wireless Fidelity (WiFi) module may be included, the WiFi module being a short-range Wireless transmission technology, through which the electronic device may help the user to transmit and receive information.

Memory 1020 may be used to store software programs and data. Processor 1080 performs various functions of terminal device 110 and data processing by executing software programs or data stored in memory 1020. The memory 1020 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. The memory 1020 stores an operating system that enables the terminal device 110 to operate. The memory 1020 in the present application may store an operating system and various application programs, and may also store codes for executing the comment generating method according to the embodiment of the present application.

The display unit 1030 may also be used to display a Graphical User Interface (GUI) for displaying information input by or provided to the user and various menus of the terminal device 110. Specifically, the display unit 1030 may include a display screen 1032 disposed on the front surface of the terminal device 110. The display 1032 may be configured in the form of a liquid crystal display, a light emitting diode, or the like. The display unit 1030 may be configured to display a multimedia content playing screen in the embodiment of the present application.

The display unit 1030 may also be used to receive input numeric or character information, generate signal input related to user settings and function control of the terminal device 110, and particularly, the display unit 1030 may include a touch screen 1031 disposed on the front surface of the terminal device 110, and may collect touch operations of a user thereon or nearby, such as clicking a button, dragging a scroll box, and the like.

The touch screen 1031 may cover the display screen 1032, or the touch screen 1031 and the display screen 1032 may be integrated to implement the input and output functions of the terminal device 110, and after the integration, the touch screen 1031 may be referred to as a touch display screen for short. In the present application, the display unit 1030 may display the application program and the corresponding operation steps.

The camera 1040 may be used to capture still images, and the user may post comments on the images captured by the camera 1040 through the application. The number of the cameras 1040 may be one or plural. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive elements convert the optical signals to electrical signals, which are then passed to a processor 1080 for conversion to digital image signals.

The terminal device may further comprise at least one sensor 1050, such as an acceleration sensor 1051, a distance sensor 1052, a fingerprint sensor 1053, a temperature sensor 1054. The terminal device may also be configured with other sensors such as a gyroscope, barometer, hygrometer, thermometer, infrared sensor, light sensor, motion sensor, and the like.

Audio circuitry 1060, speaker 1061, and microphone 1062 may provide an audio interface between a user and terminal device 110. The audio circuit 1060 may transmit the electrical signal converted from the received audio data to the speaker 1061, and convert the electrical signal into a sound signal by the speaker 1061 and output the sound signal. Terminal device 110 may also be configured with a volume button for adjusting the volume of the sound signal. On the other hand, the microphone 1062 converts the collected sound signals into electrical signals, which are received by the audio circuit 1060 and converted into audio data, which are then output to the communication component 1010 for transmission to, for example, another terminal device 110, or output to the memory 1020 for further processing.

The bluetooth module 1070 is used for exchanging information with other bluetooth devices having a bluetooth module through a bluetooth protocol. For example, the terminal device may establish a bluetooth connection with a wearable electronic device (e.g., a smart watch) having a bluetooth module via the bluetooth module 1070, so as to perform data interaction.

The processor 1080, which is a control center of the terminal device, connects various parts of the entire terminal device using various interfaces and lines, performs various functions of the terminal device and processes data by operating or executing software programs stored in the memory 1020 and calling data stored in the memory 1020. In some embodiments, processor 1080 may include one or more processing units; processor 1080 may also integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a baseband processor, which primarily handles wireless communications. It will be appreciated that the baseband processor described above may not be integrated into the processor 1080. In the present application, the processor 1080 may run an operating system, an application program, a user interface display, a touch response, and a comment generating method according to the embodiment of the present application. Further, processor 1080 is coupled to a display unit 1030.

In some possible embodiments, various aspects of the comment generating method provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps in the comment generating method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device, for example, the computer device may perform the steps as shown in fig. 2.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media capable of storing program codes, such as a removable Memory device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, and an optical disk.

Alternatively, the integrated unit in the embodiment of the present application may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof that contribute to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such changes and modifications of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such changes and modifications.

Claims

1. A comment generation method, characterized in that the method comprises:

2. The method according to claim 1, wherein the obtaining of the target description feature for the target object based on the second description information of the target object and a preset candidate object description feature set specifically includes:

3. The method according to claim 2, wherein the determining the target descriptive feature of the target object based on the obtained descriptive feature of the at least one candidate object specifically comprises:

if the at least one inquired candidate object does not reach the set number, taking the corresponding similarity as weight, and carrying out weighted summation on the description features of the at least one inquired candidate object to obtain the target description features;

if the at least one inquired candidate object reaches the set number, screening out the candidate objects meeting the set number according to the corresponding similarity, and taking the corresponding similarity as weight to perform weighted summation on the description characteristics of each screened candidate object to obtain the target description characteristics.

4. The method of claim 2 or 3, wherein determining a similarity between each candidate object and the target object based on the object representation in the second description information comprises:

for any candidate object, acquiring an object portrait of the any candidate object, wherein the object portrait comprises at least one information tag and tag weight corresponding to the at least one information tag, and the information tag is obtained according to object historical behavior analysis;

and taking the sum of products of the coincidence degree between every two information labels and the corresponding weight as the similarity between the target object and the any one candidate object, wherein the weight corresponding to every two information labels is determined according to the label weight of each of the two information labels.

5. The method according to claim 1, wherein the performing word segmentation processing on the first description information of the target multimedia content to obtain each word segmentation in the first description information specifically includes:

6. The method as claimed in claim 5, wherein the predicting comment information of the target object to be published for the target multimedia content based on the target description features and the respective participles specifically includes:

7. The method according to claim 6, wherein if the comment information includes a plurality of comment words, the step of inputting the word vector of each participle and the target description feature into a decoding portion of the trained comment generation model, and performing decoding processing based on the decoding portion to obtain the comment information output by the trained comment generation model specifically includes:

8. The method of any of claims 5-7, wherein the trained comment generation model is trained by:

selecting training samples from the training sample data set;

9. The method of claim 8, wherein said selecting training samples from said set of training samples comprises:

10. A comment generation apparatus characterized by comprising:

and the predicting unit is used for predicting comment information to be issued by the target object aiming at the target multimedia content based on the target description characteristics and the participles.

11. The apparatus of claim 10, wherein the feature extraction unit is specifically configured to:

12. The apparatus of claim 11, wherein the feature extraction unit is specifically configured to:

if the at least one inquired candidate object reaches the set number, screening the candidate objects meeting the set number according to the corresponding similarity, taking the corresponding similarity as the weight, and carrying out weighted summation on the description characteristics of the screened candidate objects to obtain the target description characteristics.

13. The apparatus according to claim 11 or 12, wherein the feature extraction unit is specifically configured to:

respectively comparing each information label in the object portrait of the target object with at least one information label in the object portrait of any candidate object, and determining the contact ratio between every two information labels;

14. An electronic device, comprising a processor and a memory, wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 9.

15. A computer-readable storage medium, characterized in that it comprises program code for causing an electronic device to carry out the steps of the method according to any one of claims 1 to 9, when said program code is run on said electronic device.