CN114329005A - Information processing method, information processing device, computer equipment and storage medium - Google Patents

Information processing method, information processing device, computer equipment and storage medium Download PDF

Info

Publication number
CN114329005A
CN114329005A CN202111133730.5A CN202111133730A CN114329005A CN 114329005 A CN114329005 A CN 114329005A CN 202111133730 A CN202111133730 A CN 202111133730A CN 114329005 A CN114329005 A CN 114329005A
Authority
CN
China
Prior art keywords
information
reply
sequence
target
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111133730.5A
Other languages
Chinese (zh)
Inventor
陈楠
陈小帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111133730.5A priority Critical patent/CN114329005A/en
Publication of CN114329005A publication Critical patent/CN114329005A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses an information processing method, an information processing device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring content information of multimedia data and target comment information aiming at the multimedia data; acquiring global description information, wherein the global description information is used for describing information semantics of the content information and the target comment information; recognizing a reply type aiming at the target comment information by adopting the global description information, and acquiring a reply strategy matched with the reply type; and generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy, and outputting the reply information, so that the accuracy of the generated reply information can be improved.

Description

Information processing method, information processing device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an information processing method and apparatus, a computer device, and a storage medium.
Background
With the continuous and deep development of computer technology, more and more multimedia data are published, and then, in order to improve the attraction of the published multimedia data to users, corresponding reply information can be generated for comment information of the multimedia data, so that interaction is formed, and the goal of attracting users is achieved. Most of the existing modes for generating reply information of comment information are generated based on a reply template, and the mode for generating reply information based on the reply template causes the reply information to lack of flexibility, and the reply template needs to be updated regularly, so that the generated reply information is low in accuracy, and therefore, how to flexibly generate the reply information with high accuracy becomes a current research hotspot.
Disclosure of Invention
The embodiment of the invention provides an information processing method, an information processing device, computer equipment and a storage medium, which can improve the accuracy of generated reply information.
In one aspect, an embodiment of the present invention provides an information processing method, including:
acquiring content information of multimedia data and target comment information aiming at the multimedia data;
acquiring global description information, wherein the global description information is used for describing information semantics of the content information and the target comment information;
recognizing a reply type aiming at the target comment information by adopting the global description information, and acquiring a reply strategy matched with the reply type;
and generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy, and outputting the reply information.
In another aspect, an embodiment of the present invention provides an information processing apparatus, including:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring content information of multimedia data and target comment information aiming at the multimedia data;
the obtaining unit is further configured to obtain global description information, where the global description information is used to describe information semantics of the content information and the target comment information;
the processing unit is used for identifying a reply type aiming at the target comment information by adopting the global description information and acquiring a reply strategy matched with the reply type;
and the processing unit is further used for generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy and outputting the reply information.
In still another aspect, an embodiment of the present invention provides a computer device, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program that supports the computer device to execute the above method, the computer program includes program instructions, and the processor is configured to call the program instructions to perform the following steps:
acquiring content information of multimedia data and target comment information aiming at the multimedia data;
acquiring global description information, wherein the global description information is used for describing information semantics of the content information and the target comment information;
recognizing a reply type aiming at the target comment information by adopting the global description information, and acquiring a reply strategy matched with the reply type;
and generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy, and outputting the reply information.
In still another aspect, an embodiment of the present invention provides a computer-readable storage medium, in which program instructions are stored, and when the program instructions are executed by a processor, the program instructions are used for executing the information processing method according to the first aspect.
In the embodiment of the application, when the computer device needs to generate reply information for comment information of multimedia data, the computer device may acquire content information and target comment information of the multimedia data, and global description information for describing information semantics of the content information and the target comment information, further, the computer device may identify a reply type for the target comment information by using the global description information, so that the computer device may generate the reply type for the target comment information according to a reply policy matched with the reply type and the content information of the multimedia data, so that the computer device realizes a difference in information types based on comment information, generates reply information for comment information of different information types by using a differentiated reply policy, and generates reply information by using a differentiated reply policy based on a difference in information types of comment information, the diversity in the generation process of the reply information is improved, and in addition, the content information of the multimedia data is fully considered in the generation process of the reply information, so that the reasonability and the accuracy of the reply information generated by the computer equipment can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram illustrating a model call relationship according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram of an information processing method provided by an embodiment of the invention;
FIG. 3 is a schematic flow chart diagram of an information processing method provided by an embodiment of the invention;
FIG. 4a is a diagram of a comment response model provided by an embodiment of the present invention;
FIG. 4b is a schematic diagram of a generative model provided by an embodiment of the present invention;
fig. 5 is a schematic block diagram of an information processing apparatus provided by an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The embodiment of the application provides an information processing method, so that when generating reply information for comment information (such as target comment information) in multimedia data, computer equipment can adopt different reply strategies to generate the reply information according to different information types of the target comment information to be replied, that is, when generating the reply information of the target comment information, the generated reply information is closely related to the target comment information to be replied, and therefore the accuracy of the computer equipment in generating the reply information for the comment information can be effectively improved. The multimedia data includes data in a plurality of media forms, such as text, sound, image, and the like, which are fused, and the multimedia data may be, for example, video data, audio data, or the like, in this embodiment, the multimedia data is mainly taken as the video data for example for description, and other multimedia data may also be referred to in this embodiment; in addition, the comment information includes subjective or objective statement information for the multimedia data sent by the user, and the comment information may be question information sent by the user, or may also be description information related to (or unrelated to) the multimedia data, and the like. The comment information of the multimedia data can include one or more of text information, expression information and audio/video information, and the target comment information mentioned in the embodiment of the application is the text information in the comment information, and when the target comment information includes the expression information and/or the audio/video information, the expression information and/or the audio/video information can be subjected to text conversion, so that the text comment information is obtained. The computer device may be implemented as various types of terminal devices such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), a vehicle-mounted terminal, or may be a server.
In one embodiment, the target comment information can be divided into different information types based on the expressed semantics of the target comment information, wherein if the expressed semantics of the target comment information are related to an objective question of a place of inquiry, a person of inquiry and the like, or related to content discussion of multimedia data, personal attitude comments and the like, the computer device can determine that the information type of the target comment information is an objective fact type; and if the semantics expressed by the target comment information are not related to the semantics, the computer equipment can consider the information type of the target comment information to be a general type. When the computer device determines that the information type of the target comment information is an objective fact type, the computer device adopts a reply strategy associated with the objective fact type to generate reply information, the reply strategy associated with the objective fact type is a strategy which uses an open domain technology and is constructed by combining content information of multimedia data with the target comment information, wherein the open domain technology is an intelligent question-answering technology based on deep learning, so that the method can be understood, if the reply information is constructed by the computer device by adopting the reply strategy associated with the objective fact type, the content of the multimedia data and the semantics of the target comment information are fully considered, and the accuracy of the constructed reply information can be improved. If the computer equipment determines that the information type of the target comment information is general, the computer equipment can generate corresponding reply information for the target comment information by combining the plot type of the multimedia data and the content information of the multimedia data, and generate corresponding reply information based on the comment information sent for the user, so that the user stickiness between the multimedia data and the user can be improved, and the user satisfaction can be improved. In addition, when the computer equipment generates corresponding reply information for the target comment information, different reply strategies can be adopted for generating the reply information, so that the diversity of the computer equipment in generating the reply information is improved.
In one embodiment, the computer device adopts different reply strategies based on different information types of the target comment information to be replied, the computer device can call the comment reply model to generate the reply information when the computer device determines that the information type of the target comment information is an objective fact type, and the computer device generates the reply information by calling the generation model if the computer device determines that the information type of the target comment information is a general type. Wherein, the comment reply model and the generation model are both network models obtained based on deep learning, the comment reply model and the generation model can be connected as shown in fig. 1, when the computer device determines that comment reply is needed to the multimedia data, the computer device can input the target comment information into the comment reply model after determining the target comment information to be replied, thereby determining whether the target comment information is objective fact or not based on the processing of the comment reply model on the target comment information, in one embodiment, if the computer device inputs the target comment information into the comment reply model, and then obtains reply information for the target comment information, then the information type of the target comment information is objective fact, otherwise, the comment reply model can further input the target comment information into the generation model, so that the corresponding reply information can be generated for the target comment information by the generation model. That is to say, when generating corresponding reply information for the target comment information, the computer device processes the target comment information by calling the comment reply model, so as to determine the information type of the target comment information, and then, based on the information type of the target comment information, it is determined whether to directly call the comment reply model to generate corresponding reply information for the target comment information, or to call the generation model to generate corresponding reply information for the target comment information.
Referring to fig. 2, a schematic flow chart of an information processing method proposed in an embodiment of the present application, where the information processing method can be executed by the computer device, and as shown in fig. 2, the method can include:
s201, content information of the multimedia data and target comment information aiming at the multimedia data are obtained.
S202, global description information is obtained, and the global description information is used for describing information semantics of the content information and the target comment information.
In step S201 and step S202, the multimedia data includes video or audio or the like published into a certain platform (or application), after the multimedia data is published, in order to improve the attraction of the multimedia data to the user or improve the user stickiness of the user at a publishing platform of the multimedia data, the computer device may generate corresponding reply information for the comment information of the multimedia data after the multimedia data is published to the corresponding platform, and based on the computer device generating corresponding reply information for the comment information, the user who published the comment information can be guided to discuss further around the multimedia data, based on the discussion, attraction to other users can be further realized, which is helpful for improving the attraction of the multimedia data to the users, so that the user stickiness of the multimedia data corresponding to the publishing platform can be improved.
In one embodiment, if the computer device determines that corresponding reply information needs to be generated for the target comment information from comment information included in the multimedia data, the computer device may first acquire content information and the target comment information of the multimedia data. When the multimedia data is a video, the text information included in the content information of the video may include: the text information may further include text information such as a title (title) of the video, a topic tag (topic tag), an authoring type, an operation type, an existing comment, a reply and the like. The target comment information may be any comment information submitted by any user, and in the embodiment of the present application, a manner in which the computer device selects the target comment information from the one or more comment information of the multimedia data is not limited. In addition, the content information of the multimedia data further includes image information of the multimedia data, wherein when the multimedia data is a video, the image information of the video may include a part or all of video frames in the video. In one embodiment, the text information may also include author information of the target comment information, such as author Identification (ID), profile, rating, user representation, and the like.
In one embodiment, based on the acquisition of the content information and the target comment information of the multimedia data by the computer device, the computer device further acquires global description information for the content information and the target comment information, where the global description information is used for describing the content information and the target comment information, that is, the global description information includes semantics included in the content information and semantics of the target comment information, so it can be understood that the computer device can make the computer device fully consider the semantics of the target comment information and the semantics of the content information in the process of generating reply information of the target comment information by acquiring the global description information, so as to ensure reasonability and accuracy of the generated reply information. When the computer device obtains the global description information, a first coding sequence can be constructed on the basis of a coding sequence of text information included in the content information and a coding sequence of the target comment information, and then the global description information can be obtained on the basis of word vectors of all participles in the first coding sequence and by combining an attention mechanism.
After the computer device obtains the content information and the target comment information of the multimedia data and obtains the global description information, the reply type for the target comment information can be determined based on the global description information, so that a corresponding reply strategy is adopted to generate the reply information, and step S203 is executed instead.
S203, recognizing the reply type aiming at the target comment information by adopting the global description information, and acquiring a reply strategy matched with the reply type.
And S204, generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy, and outputting the reply information.
In steps S203 and S204, when the computer device identifies the reply type for the target comment information based on the global description information, the computer device determines the information type of the target comment information based on the global description information, and then determines the information type of the target comment information. The information types of the comment information include an objective fact type and a general type, the objective fact type comment information is comment information for which a corresponding answer can be found through the comment information and the multimedia data, the objective fact type comment information may be, for example, inquiry information for a character included in the multimedia data, and the like, and the general type comment information is comment information for which a specific answer cannot be found. In an embodiment, since one information type is associated with one reply type, the computer device obtains the reply type of the target comment information after determining the information type of the target comment information, and based on the determination of the reply type by the computer device, a reply policy matching the reply type may be obtained, and the reply information may be generated based on the obtained reply policy.
In one embodiment, the computer device obtains a reply policy matching the reply type based on the determined reply type to perform a process of generating the reply message, that is, a process of calling different models and generating the reply message. And if the reply type determined by the computer device is associated with the general type information type, the computer device calls the generation model to generate the reply information.
When the computer device generates the reply information by calling the comment reply model, the comment reply model is generated according to a first coding sequence obtained by coding text information and target comment information included in the content information, in a specific implementation, when the computer device calls the comment reply model to generate the reply information of the target comment information based on the first coding sequence, two linear layers in the comment reply model are respectively called to respectively identify and process each word vector in the first coding sequence, so that the probability that each word vector in the first coding sequence is the starting position of the reply information and the probability that each word vector in the first coding sequence is the ending position of the reply information are determined, and then further, the computer device can respectively generate the probability that each word vector in the first coding sequence corresponds to the starting position and the ending position based on the probability that each word vector in the first coding sequence respectively corresponds to the starting position and the ending position, and selecting part or all of the first coding sequence to decode so as to obtain reply information aiming at the target comment information. If the computer device determines that the generation model needs to be called to generate the reply information, the generation model is obtained by decoding a second coding sequence generated by text information and image information included in the content information when the reply information is generated.
In the embodiment of the application, when the computer device needs to generate reply information for comment information of multimedia data, the computer device may acquire content information and target comment information of the multimedia data, and global description information for describing information semantics of the content information and the target comment information, further, the computer device may identify a reply type for the target comment information by using the global description information, so that the computer device may generate the reply type for the target comment information according to a reply policy matched with the reply type and the content information of the multimedia data, so that the computer device realizes a difference in information types based on comment information, generates reply information for comment information of different information types by using a differentiated reply policy, and generates reply information by using a differentiated reply policy based on a difference in information types of comment information, the diversity in the generation process of the reply information is improved, and in addition, the content information of the multimedia data is fully considered in the generation process of the reply information, so that the reasonability and the accuracy of the reply information generated by the computer equipment can be improved.
Referring to fig. 3, which is a schematic flowchart of an information processing method according to an embodiment of the present disclosure, the information processing method may be executed by the computer device, and as shown in fig. 3, the method may include:
s301, content information of the multimedia data and target comment information aiming at the multimedia data are obtained, wherein the content information comprises text information of the multimedia data.
The content information of the multimedia data acquired by the computer device at least comprises text information of the multimedia data, wherein the text information comprises texts extracted from the multimedia data, such as ocr, asr, title and the like of the multimedia data, and the text information can also comprise texts related to the multimedia data, such as theme (topic) tags, creation types and the like of the multimedia data. In one embodiment, the content information may further include image information of the multimedia data, the image information may include part or all of image frames extracted from the multimedia data, and the target comment information acquired by the computer device may be any one selected from all comment information included in the multimedia data, so that the computer device acquires global description information for the text information and the target comment information based on the acquisition of the content information and the target comment information of the multimedia data by the computer device. It should be noted that the text information included in the content information of the multimedia data acquired by the computer device further includes the already replied comment information in the multimedia data and the corresponding reply, so that the computer device can generate the reply information of the target comment information by combining the already replied comment information in the process of generating the reply information of the target comment information, and the reasonability of the reply information generated by the computer device can be further improved.
After the computer equipment acquires the content information and the target comment information of the multimedia data, word segmentation processing can be respectively carried out on the text information and the target comment information in the content information, and therefore global description information for the text information and the target comment information is obtained based on the semantics of all the word segmentation.
S302, a first coding sequence is obtained, wherein the first coding sequence comprises word vectors of all participles in the text information and word vectors of all participles in the target comment information.
S303, according to the word vector of any participle and the word vector of any participle, calculating the similarity between any participle and any other participle, and carrying out weighting processing on the word vector of each participle based on the similarity.
And S304, taking a vector sequence formed by the weighted word vectors as global description information aiming at the text information and the target comment information.
In steps S302 to S304, when the computer device acquires the first coding sequence, after acquiring text information and target comment information of the multimedia data, the computer device may perform word segmentation on the target comment information to obtain a target comment information pairPerforming corresponding word segmentation sequence, and performing word segmentation processing on text information (such as ocr, asr, title, topic tag, creation type, operation type, existing comment, reply and other text information) of the multimedia data to obtain a word segmentation sequence of the text information; furthermore, the computer equipment can perform sequence splicing on the word segmentation sequence of the target comment information and the word segmentation sequence of the text information to obtain a target splicing sequence. After the target splicing sequence is obtained, the computer equipment adds a separator between the word segmentation sequence of the target comment information and the word segmentation sequence of the text information, and adds a start character at the start position of the target splicing sequence. Wherein the separator may be [ SEP]The initial character is [ CLS]Then if the word segmentation sequence of the target comment information is an a sequence (assumed to be X)1X2X3) The word segmentation sequence of the text information is a B sequence (assumed to be Y)1Y2Y3) Then, after the computer device splices the word segmentation sequence of the target comment information and the word segmentation sequence of the text information and adds the separator and the initial character, the obtained target splicing sequence is CLSX1X2X3SEPY1Y2Y3
In an embodiment, when the computer device performs sequence splicing on a word segmentation sequence of the target comment information and a word segmentation sequence of the text information to obtain a target spliced sequence, the computer device may further obtain a sequence length of the word segmentation sequence of the target comment information and a sequence length of the word segmentation sequence of the text information, where the sequence length of the word segmentation sequence of the target comment information is the sequence X of the sequence a described above1X2X3The sequence length of the word segmentation sequence of the text information is the B sequence Y1Y2Y3The length of the sequence of (c). After the computer device respectively obtains the sequence length of the word segmentation sequence of the target comment information and the sequence length of the word segmentation sequence of the text information, the sequence length of the word segmentation sequence of the target comment information and the sum of the sequence lengths of the word segmentation sequences of the text information can be further determined, namely, the sum is the sequence A and the sequence X1X2X3With the B sequence Y1Y2Y3The sum of the lengths is smaller than or equal to the length threshold, and the computer equipment can directly splice the word segmentation sequence of the target comment information and the word segmentation sequence of the text information to obtain a sequence as a target splicing sequence; and if the sum of the lengths is larger than the length threshold, the computer equipment can perform sequence segmentation on the word segmentation sequence of the text information based on the sequence length and the length threshold of the word segmentation sequence of the target comment information, and splice each segmentation sequence with the word segmentation sequence of the target comment information respectively to obtain each spliced sequence which is a target spliced sequence. In one embodiment, the length threshold may be 512, for example, then if the computer device determines that the A sequence X is as described above1X2X3With the B sequence Y1Y2Y3The sum of the lengths is less than or equal to 512, then the computer device determines that the obtained target splicing sequence is CLS X1X2X3SEP Y1Y2Y3(ii) a In another implementation, if the computer device determines that the A sequence X1X2X3With the B sequence Y1Y2Y3If the sum of the lengths of the A sequence and the X sequence is greater than 512, the computer device determines the A sequence and the X sequence1X2X3Is 300, then the computer device is in sequence Y for B1Y2Y3When performing sequence segmentation, the B sequence Y can be segmented based on the length threshold 5121Y2Y3Dividing according to the length of the subsequence with the length of 212, if the B sequence Y1Y2Y3Is divided into sequences Y1Y2(corresponding to a sequence length of 212), and sequence Y3(the corresponding sequence length is 106), then the computer device finally obtains the target splicing sequence including: CLS X1X2X3SEP Y1Y2And CLS X1X2X3SEP Y3
After the computer equipment obtains the target splicing sequence, the computer equipment can encode all the participles in the target splicing sequence to obtain a first encoding sequence, so that the first encoding sequence comprises word vectors of all the participles in the text information of the multimedia data and word vectors of all the participles in the target comment information, and after the computer equipment obtains the first encoding sequence and determines the word vectors of all the participles in the text information of the first encoding sequence and the word vectors of all the participles in the target comment information, a self-attention (self-attention) mechanism can be adopted to enable all the participles to obtain global description information from multiple angles, and further the semantics of the text information and the target comment information can be deeply understood. In one embodiment, when the computer device obtains the global description information by using the self-attention mechanism, the determination may be performed according to the word vector of any participle in the first coding sequence and the word vector of any other participle. It can be understood that, based on the acquisition of the global description information, the understanding of the computer device on the overall semantics of the text information of the multimedia data and the target comment information can be effectively improved, and then the rationality of the subsequently generated reply information for the target comment information can be improved. In one embodiment, after obtaining the global description information, the computer device may use the global description information as a start character of the first coding sequence, and indicate the global description information as the start character by using a target character, where the target character may be a character CLS. It can be understood that the computer device may obtain the global description information by reading the start character CLS of the first coding sequence, and after obtaining the global description information, the computer device may identify a reply type for the target comment information by using the global description information, and generate the reply information by using a reply policy matching the reply type, that is, turn to execute step S305.
In one embodiment, when the computer device performs encoding processing on each participle in the target concatenation sequence to obtain the first encoding sequence, the computer device may perform vector conversion on the corresponding participle in the target concatenation sequence directly based on a mapping relationship between the participle and a vector, thereby obtaining the first encoding sequence. Or, the computer device encodes each participle in the target concatenation sequence to obtain the first coding sequence, and may also introduce other participle information into the participle vector when encoding each participle of the target concatenation sequence, and in a specific implementation, the computer device may further generate a position vector and/or a type vector for each participle vector when obtaining the participle vector of each participle, so as to improve the information richness of each participle vector of the first coding sequence obtained by the computer device, and also enable the computer device to introduce more participle information of each participle of the target concatenation sequence in the encoding process, and enable the computer device to obtain the accuracy of the global description information based on the first coding sequence obtained by encoding based on the introduction of more information. In one embodiment, the position vector is used to indicate a position in the target concatenation sequence under the corresponding participle vector, and the type vector is used to indicate which text information (or target comment information) the participle corresponding to the participle vector is, for example, which of the ocr, asr, title, topic tag, authoring type, operation type, existing comment and corresponding reply, or target comment information the participle corresponding to the participle vector is.
S305, recognizing the reply type aiming at the target comment information by using the global description information, and acquiring a reply strategy matched with the reply type.
In one embodiment, the first coding sequence may be generated by invoking a comment reply model, where the comment reply model is a model obtained by deep learning training, the first coding sequence is obtained by encoding text information of multimedia data included in content information and target comment information by an encoder of the comment reply model, where the model structure of the comment reply model may be as shown in fig. 4a, the comment reply model includes an encoder, a decision network and a decoder, as shown in fig. 4a, the decision network and the decoder are respectively connected to the encoder, and the encoder is configured to perform encoding processing on a target concatenation sequence composed of the text information of the multimedia data and the target comment information and obtain a first coding sequence, and the decision network is a linear layer (e.g., a linear layer labeled by 40 in fig. 4 a) for encoding the first coding sequence output by the encoder, the decoder determines the information type of the target comment information (or the reply type of the target comment information), and generates reply information based on the discrimination result of the discrimination network and the first coding sequence obtained by the encoder. The first coding sequence is obtained by inputting the text message and the target comment message into an Embedding network (a serialization network) for serialization, and the Embedding network may be an independent network in the comment reply model or may be internally disposed in an encoder of the comment reply model.
Then, when the computer equipment obtains the first coding sequence through the encoder, the global description information can be used as the initial character included by the first coding sequence for recording, and when the computer equipment adopts the global description information to identify the reply type aiming at the target comment information, the computer equipment can call a discrimination network in a comment reply model to identify the global description information to obtain a type discrimination score; the judgment network can firstly identify global description information to obtain an initial score of a numerical value type, then the judgment network can call a sigmoid function (a smoothing function) to convert the initial score into a probability value of 0-1, the probability value is used for indicating the probability that the target comment information can find an accurate answer based on the first coding sequence, in one embodiment, the converted probability value is a type judgment score, after the computer equipment obtains the type judgment score aiming at the target comment information, the information type of the target comment information can be determined according to the type judgment score, in one embodiment, after the encoder generates the first coding sequence, the first coding sequence can be directly input into the judgment network, so that the judgment network can obtain the global description information by identifying the initial character of the first coding sequence, and determining a reply type for the target comment information based on the global description information, or after the encoder generates the first coding sequence, inputting only the global description information recorded in the initial character of the first coding sequence into a discrimination network, so that the discrimination network identifies the reply type for the target comment information based on the global description information.
In one embodiment, since one information type is associated with one reply type, the information type includes an objective fact type or a general type, then the computer device may obtain a corresponding reply type for the target comment information based on the determined information type of the target comment information, that is, the reply type for the target comment information determined by the computer device includes a reply type for the target comment information of the objective fact type and a reply type for the target comment information of the general type. After the computer device determines a reply type for the target comment information, a reply strategy matched with the determined reply type can be obtained, wherein when the computer device obtains the reply strategy matched with the reply type, if the computer device determines that a type discrimination score determined based on a discrimination network is greater than or equal to a preset score threshold, the computer device determines that an accurate correct answer corresponding to the target comment information can be obtained based on the first coding sequence, then the computer device can use the reply type associated with the objective fact type as the reply type of the target comment information, and use a strategy for indicating information reply through the comment reply model as the reply strategy, so that the comment reply model (namely, a decoder included in the comment reply information) can be further called to generate the reply information. In another implementation manner, if the type discrimination score determined by the computer device based on the discrimination network is smaller than the preset score threshold, it indicates that the computer device cannot find an answer corresponding to the target comment information based on the first coding sequence, and then the computer device may use the reply type associated with the general type as the reply type of the target comment information, and use a policy for instructing information reply through the generative model as the reply policy, so that the generative model may be invoked to generate the reply information.
In an embodiment, an encoder in the comment reply model may be Albert (a Lite BERT, a lightweight word vector coding model) to improve a training convergence speed and effect, and the decision network is a linear network layer, and the decoder is a bilinear network layer, where the comment reply model mentioned above is a trained model, and a training process for the comment reply model may be obtained by performing joint training on the encoder, the decision network, and the decoder, or may be obtained by performing respective training on the encoder, the decision network, and the decoder in the comment reply model, where, when the comment reply model is trained respectively, a cross entropy loss function may be used to train the decision network in the comment reply model.
S306, generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy, and outputting the reply information.
When the computer device generates the reply message according to the obtained reply policy, if the reply policy indicates that the reply message is generated by invoking the comment reply model, the computer device may generate the reply message by invoking a decoder included in the comment reply model, where as shown in fig. 4a, the decoder is composed of two linear layers, that is, the decoder in the comment reply model includes a first linear layer and a second linear layer. In one embodiment, one of the two linear layers forming the decoder is used to map all the word vectors included in the first encoded sequence to obtain a score (or probability) of each word vector in the first encoded sequence being a recovery start position, and the other linear layer is also used to map word vectors in the first encoded sequence to obtain a score (or probability) of each word vector in the first encoded sequence being a recovery end position. That is to say, when the computer device generates reply information of the target comment information based on the content information of the multimedia data according to the reply strategy, the computer device may adopt a first linear layer in a decoder in the comment reply model to perform recognition processing on a first coding sequence acquired from an encoder to determine the probability that a word vector corresponding to each participle constituting the first coding sequence is the start position of the reply information, and adopt a second linear layer in the decoder in the comment reply model to perform recognition processing on the first coding sequence acquired from the encoder to determine the probability that a word vector corresponding to each participle constituting the first coding sequence is the end position of the reply information; then, further, the computer device may intercept a part of the coding sequence from the first coding sequence according to the probability that each word vector in the first coding sequence is the reply start position and the probability that each word vector in the first coding sequence is the reply end position, and perform decoding processing on the part of the coding sequence to obtain reply information of the target comment information.
In one embodiment, when the decoder determines the probability that each word vector in the first coding sequence corresponds to the start position (or the end position), the obtained probability values are values which are converted by a sigmoid function, are in an interval of 0 to 1, and are independent of each other. Likewise, in training the decoder, cross entropy may also be used as a loss function for training. Then, when the computer device intercepts part of the coding sequence from the first coding sequence according to the probability that each word vector in the first coding sequence is the reply start position and the probability that each word vector in the first coding sequence is the reply end position, the computer device can determine the start end position meeting the reply limitation condition based on the determined probabilities, further sort the start end position meeting the reply limitation condition, and select the word vector corresponding to the start end position with the maximum combination probability for generating the reply information. In a specific implementation, the computer device may select a reply start position and a reply end position from the first coding sequence, which satisfy a reply limitation condition, where the reply limitation condition includes one or both of: the reply ending position is larger than the reply starting position, and the length of the coding sequence determined based on the selected reply starting position and the reply ending position is larger than the length thresholdA value; further, the computer device may select a word vector as the reply start position and a word vector as the reply end position, which have the highest combined probability, based on the probability that the selected reply start position corresponds to the word vector and the probability that the corresponding reply end position corresponds to the word vector, and may use the selected word vector and the word vectors between the selected word vectors as the partial code sequence. For example, if the computer device determines that the first code sequence is CLS X1X2X3SEP Y1Y2Y3If the word vector corresponding to the starting position meeting the limitation condition determined by the computer equipment comprises X1And X3The word vector corresponding to the ending position comprises Y1And Y2Then, based on the principle of maximizing the corresponding selected combination probability, the computer device determines the word vector corresponding to the maximized combination probability as X1And Y2The computer device may then determine that the partial code sequence selected from the first code sequence is CLS X1X2X3SEP Y1Y2And may derive a reply message based on decoding the portion of the encoded sequence.
In one embodiment, if the reply policy indicates that the computer device is to generate reply information by calling the generation model, when the computer device generates reply information of the target comment information based on content information of the multimedia data according to the reply policy, the computer device may call the generation model to perform encoding processing on an image information of the multimedia data included in the content information and a word segmentation sequence of text information of the multimedia data included in the content information to obtain a second encoding sequence, where the second encoding sequence includes word vectors of words in the text information; and then, calling the generation model, adding corresponding global description information to each word vector in the second coding sequence to obtain a new second coding sequence, and then, after obtaining the new second coding sequence, the computer equipment can obtain a target plot label corresponding to the multimedia data and generate reply information of the target comment information by adopting the new second coding sequence and the target plot label. In one embodiment, when the computer device calls the generation model to encode the image information of the multimedia data included in the content information and the word segmentation sequence of the text information of the multimedia data included in the content information to obtain the second encoding sequence, the computer device may first obtain an image vector of each image in the multimedia data and a sequence length of the word segmentation sequence; thus, when the sequence length is less than or equal to the length threshold value, the image vector is added to the initial character of the word segmentation sequence, such as [ CLS ] described above, and a second coding sequence is obtained; in addition, the computer device may perform sequence segmentation on the word sequence based on the length threshold when the sequence length is greater than the length threshold, and add an image vector at a start character of each segmented sequence to obtain a new segmented sequence, each obtained new segmented sequence being a second coding sequence. Likewise, the length threshold may be the same as or different from the length threshold involved in determining the first code sequence, i.e., the length threshold may also be 512, etc.
In one embodiment, since the text information of the multimedia data includes a video title, a tag, ocr, asr, an existing comment and reply, etc., when obtaining the word segmentation sequence of the text information, the computer device may first perform word segmentation on the video title, the tag, ocr, asr, the existing comment and reply, respectively, and then perform separated concatenation on the same source word segmentation using a separator (such as the above-mentioned [ SEP ]), thereby obtaining the word segmentation sequence of the text information. After the word segmentation sequence of the text information is obtained by the computer equipment, each word segmentation in the word segmentation sequence can be coded, and a second coding sequence is formed by the word vector of each word segmentation. Similarly, in order to enrich the semantics of each word vector in the second coding sequence, after the second coding sequence is generated, a corresponding position vector and a corresponding type vector can be added to each word vector in the second coding sequence, and after the image vector of the multimedia data is obtained, the computer device can perform linear transformation on the image vector to ensure that the image vector is consistent with the dimension of each word vector in the second coding sequence, and then the images after the linear transformation can be added to the initial characters of the second coding sequence in a connected manner. Moreover, the computer device can also perform digit supplement based on the length threshold, and if the computer device determines that the sequence length of the participle sequence is smaller than the length threshold, a placeholder can be added to the participle sequence to enable the sequence length of the participle sequence after the placeholder is added to be equal to the length threshold, wherein the placeholder can be [ PAD ], for example.
In one embodiment, the generative model is a recurrent neural network structure, as shown in figure 4b, the generative model includes an encoder (e.g., the network labeled 41 in FIG. 4 b), a story classification network (e.g., the network labeled 42 in FIG. 4 b), and a generative network (e.g., the network labeled 43 in FIG. 4 b), wherein, the encoder is used for generating the second coding sequence, the encoder adopts a transformerecoder structure (a coding structure) to convert the input information (the text information and the image information of the multimedia data) into a vector representation with abstract semantics through a multi-layer attribute operation (or self-attribute operation) to convert the original vector (i.e. the vector sequence, such as the second coding sequence), this input vector represents that each vector in the sequence contains both semantic information for a word and current context information. The attention operation is executed by a self-attention module (self-attention module), the self-attention operation is used for calculating the similarity between every two vectors, normalizing the similarity of each vector and all other vectors into a weight with the sum of 1, then weighting and summing the obtained weight and other vectors to obtain a current vector, and further obtaining the global description information of each participle in a coding sequence, then each vector can obtain the information carried by other vectors after the self-attention operation is carried out, so that each vector can obtain the global description information related to the vector, and the generation model can obtain corresponding grammar, semantics and the like.
The episode classification network is used for enabling computer equipment to obtain a target episode label corresponding to the multimedia data, wherein the episode classification network takes a vector (such as any word vector in a first coding sequence) output by an encoder as input, performs dot multiplication on any word vector in the first coding sequence and all episode type vectors, obtains a score for each episode type at the moment, obtains a probability value of 0-1 interval through a sigmoid function, and the probability value represents the probability that the multimedia data belongs to the episode type, and selects an episode type larger than a preset threshold value as a target episode type for output after obtaining the values of all episode type probabilities. When the plot classification network is trained, a corresponding vector is randomly generated for each plot type for training, and a type label (label) is trained by using a mvlit-hot vector (a multi-classification vector) during training, wherein the mulit-hot vector is a vector which sets the corresponding position of a plot type to be 1 if the plot type is included, and otherwise, the vector is set to be 0.
When the computer equipment acquires the target plot label of the multimedia data by adopting the trained plot classification network, the label vector corresponding to any plot label can be acquired first, the new second coding sequence and any label vector are subjected to vector operation to obtain the matching degree between the new second coding sequence and any label vector, and then the plot label corresponding to the label vector with the corresponding matching degree being more than or equal to the matching degree threshold value can be selected and taken out to serve as the target plot label of the multimedia data. Then, after the computer device obtains the target plot label, the generating network in the generating model can be called, and the reply information of the target comment information is generated by adopting the new second coding sequence and the target plot label. In one embodiment, the network structure of the generating network in the generating model adopts a transformerdecoder structure (a decoding structure), the input of which comprises all vector sequences (i.e. the second encoding sequence) output by the encoder and target plot tags (label vectors) output by the plot classification network, and the initial state of the decoder adopts the plot type label vectors mentioned above, so as to generate comments related to the plot type of the video (multimedia data), thereby generating a plurality of comments related to different plot types for the multimedia data. And the vector sequence output by the encoder is used for attention operation of the decoder, so that the decoder can conveniently acquire more relevant information, and the comment generation result is more reasonable and smooth.
When the computer device generates reply information based on the second coding sequence and the target plot label, the specific decoding process of the generated network is to select a plurality of video plot vectors (plot type labels) with the maximum probability, and start encoding at the decoder side by taking the selected video plot vectors as initialization vectors of a transform decoder. The decoding macro step is to predict a first reply word segmentation (token) according to the initialization vector, predict a second token according to the initialization vector and the first token, and so on until the predicted token is a termination symbol or the prediction length reaches a threshold value. The method comprises the specific steps of firstly carrying out one-layer linear mapping on a taken video plot vector, then calculating similarity by using the mapped vector and a transformerencoder output vector, calculating by using a similarity score and the encorder vector to obtain a vector related to the video plot, carrying out dot multiplication on the vector and a word list matrix to obtain probability distribution of a first token, selecting k tokens with the highest probability as the first token, inputting the k tokens with the plot vector into the transformerencoder together to predict a second token, and repeating the steps to obtain an output similar to a tree structure, wherein each route from a root node to a leaf node of the tree is a comment, the score of each comment is the product of the probabilities of the tokens on the route, and finally selecting k tokens with the highest score. Then, after the model prediction is completed, the computer device may pick out the comment of topk from each video episode type and output the comment as a reply to the target comment information.
That is, the computer device may predict the ith reply participle based on the tag vector corresponding to the target episode tag and the new second encoding order; i is more than or equal to 1 and is a positive integer; then, the ith reply participle can be predicted to obtain the (i + 1) th reply participle, and when the obtained (i + j) th reply participle meets the prediction termination condition, the computer equipment can generate reply information of the target comment information based on the (i + j) predicted reply participles; j is more than or equal to 1 and is a positive integer. When the computer device predicts and obtains the ith reply word segmentation based on the label vector corresponding to the target plot label and the new second coding sequence, the computer device can map the label vector of the target plot label to obtain a mapping vector of the label vector, and calculate the similarity between the new second coding sequence and the mapping vector; generating a label vector of the plot label associated with the target plot label according to the similarity and the new second coding sequence; and then, based on the generated label vector of the associated plot label and the generated vocabulary matrix, generating probability distribution that each participle in the vocabulary matrix is selected as the ith reply participle, and selecting one or more participles from the vocabulary matrix based on the probability distribution to serve as the ith reply participle. And the tree structure can be reasoned, and the final reply information is selected for output.
In the embodiment of the application, after the content information and the target comment information of the multimedia data are obtained, the computer device can obtain the first coding sequence based on the text information of the multimedia data and the target comment information included in the content information, and after the computer device obtains the first coding sequence, the computer device can obtain the global description information aiming at the text information and the target comment information according to the similarity between the word vector of any participle in the first coding sequence and the word vector of any participle, so that the computer device can determine different reply strategies aiming at the target comment information based on the global description information, for the comment information of the objective fact type, the corresponding reply can be determined from the text information of the multimedia data based on the comment reply model, and the accuracy of the comment information of the objective fact type is greatly improved, in addition, because the computer equipment refers to the existing comment and reply when generating the reply information and refers to the related information of the multimedia data under different modalities when constructing the coding sequence, the accuracy and the diversity of the computer equipment when generating the corresponding reply information based on the coding sequence are also ensured.
Based on the description of the above embodiment of the information processing method, an embodiment of the present invention also provides an information processing apparatus, which may be a computer program (including a program code) running in the above computer device. The information processing apparatus may be configured to execute the information processing method as described in fig. 2 and fig. 3, referring to fig. 5, the information processing apparatus including: an acquisition unit 501 and a processing unit 502.
An obtaining unit 501, configured to obtain content information of multimedia data and target comment information for the multimedia data;
the obtaining unit 501 is further configured to obtain global description information, where the global description information is used to describe information semantics of the content information and the target comment information;
the processing unit 502 is configured to identify a reply type for the target comment information by using the global description information, and acquire a reply policy matched with the reply type;
the processing unit 502 is further configured to generate reply information of the target comment information based on the content information of the multimedia data according to the reply policy, and output the reply information.
In an embodiment, the obtaining unit 501 is specifically configured to:
acquiring a first coding sequence, wherein the first coding sequence comprises word vectors of all participles in the text information and word vectors of all participles in the target comment information;
calculating the similarity between any participle and any other participle according to the word vector of any participle and the word vector of any other participle, and performing weighting processing on the word vector of each participle based on the similarity;
and taking a vector sequence formed by the word vectors after weighting processing as global description information aiming at the text information and the target comment information.
In an embodiment, the obtaining unit 501 is specifically configured to:
performing word segmentation processing on the target comment information to obtain a word segmentation sequence corresponding to the target comment information, and performing word segmentation processing on the text information to obtain a word segmentation sequence of the text information;
performing sequence splicing on the word segmentation sequence of the target comment information and the word segmentation sequence of the text information to obtain a target splicing sequence;
and coding each participle in the target splicing sequence to obtain a first coding sequence.
In one embodiment, the global description information is used as a starting character of a first coding sequence, the first coding sequence is generated by calling a comment reply model, the comment reply model is a model obtained by deep learning training, the first coding sequence is obtained by coding text information of the multimedia data included in the content information and the target comment information by a coder of the comment reply model, and the coder is connected with a discrimination network; the processing unit 502 is specifically configured to:
calling a discrimination network in the comment reply model to identify and process the global description information to obtain a type discrimination score; the global description information is obtained by acquiring a starting character of the first coding sequence through the encoder;
and determining the information types of the target comment information according to the type discrimination scores, wherein one information type is associated with one reply type, and the information type comprises at least one of an objective fact type and a general type.
In one embodiment, if the reply policy indicates that information is replied by invoking a comment reply model, the comment reply model further includes a decoder, the decoder is connected to the encoder, the encoder is configured to encode text information of the multimedia data included in the content information and the target comment information to obtain a first encoded sequence, and the decoder includes a first linear layer and a second linear layer; the processing unit 502 is specifically configured to:
adopting the first linear layer in a decoder in the comment reply model to identify the first coding sequence obtained from the encoder, and determining the probability that the word vector corresponding to each participle forming the first coding sequence is the initial position of reply information;
adopting the second linear layer in a decoder in the comment reply model to identify the first coding sequence obtained from the encoder, and determining the probability that the word vector corresponding to each participle forming the first coding sequence is the end position of reply information;
and intercepting part of the coding sequence from the first coding sequence according to the probability that each word vector in the first coding sequence is respectively a reply starting position and a reply ending position, and decoding the part of the coding sequence to obtain reply information of the target comment information.
In one embodiment, if the reply policy indicates that the reply message is generated by calling a generative model; the processing unit 602 is specifically configured to:
calling the generation model to encode the image information of the multimedia data included in the content information and the word segmentation sequence of the text information of the multimedia data included in the content information to obtain a second encoding sequence; the second coding sequence comprises word vectors of all participles in the text information;
calling the generation model, and adding corresponding global description information to each word vector in the second coding sequence to obtain a new second coding sequence;
and acquiring a target plot label corresponding to the multimedia data, and generating reply information of the target comment information by adopting the new second coding sequence and the target plot label.
In this embodiment of the application, the obtaining unit 501 may obtain content information and target comment information of multimedia data and global description information for describing information semantics of the content information and the target comment information when reply information needs to be generated for comment information of the multimedia data, further, the processing unit 502 may identify a reply type for the target comment information by using the global description information, so that the processing unit 502 may generate the reply type for the target comment information according to a reply policy matched with the reply type and the content information of the multimedia data, may implement difference of information types based on comment information, generate reply information for comment information of different information types by using a differentiated reply policy, and generate reply information by using a differentiated reply policy based on difference of information types of comment information, the diversity in the generation process of the reply information is improved, and in addition, the content information of the multimedia data is fully considered in the generation process of the reply information, so that the reasonability and the accuracy of the generated reply information can be improved.
Fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device in the present embodiment as shown in fig. 6 may include: one or more processors 601; one or more input devices 602, one or more output devices 603, and memory 604. The processor 601, the input device 602, the output device 603, and the memory 604 are connected by a bus 605. The memory 604 is used for storing a computer program comprising program instructions, and the processor 601 is used for executing the program instructions stored by the memory 604.
The memory 604 may include a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 604 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), etc.; the memory 604 may also comprise a combination of the above types of memory.
The processor 601 may be a Central Processing Unit (CPU). The processor 601 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or the like. The PLD may be a field-programmable gate array (FPGA), a General Array Logic (GAL), or the like. The processor 601 may also be a combination of the above structures.
In an embodiment of the present invention, the memory 604 is used for storing a computer program, the computer program includes program instructions, and the processor 601 is used for executing the program instructions stored in the memory 604 to implement the steps of the corresponding methods as described above in fig. 2 and fig. 3.
In one embodiment, the processor 601 is configured to call the program instructions to perform:
acquiring content information of multimedia data and target comment information aiming at the multimedia data;
acquiring global description information, wherein the global description information is used for describing information semantics of the content information and the target comment information;
recognizing a reply type aiming at the target comment information by adopting the global description information, and acquiring a reply strategy matched with the reply type;
and generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy, and outputting the reply information.
Embodiments of the present invention provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method embodiments as shown in fig. 2 or fig. 3. The computer-readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the invention has been described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (18)

1. An information processing method characterized by comprising:
acquiring content information of multimedia data and target comment information aiming at the multimedia data;
acquiring global description information, wherein the global description information is used for describing information semantics of the content information and the target comment information;
recognizing a reply type aiming at the target comment information by adopting the global description information, and acquiring a reply strategy matched with the reply type;
and generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy, and outputting the reply information.
2. The method of claim 1, wherein the content information includes text information of the multimedia data; the acquiring of the global description information includes:
acquiring a first coding sequence, wherein the first coding sequence comprises word vectors of all participles in the text information and word vectors of all participles in the target comment information;
calculating the similarity between any participle and any other participle according to the word vector of any participle and the word vector of any other participle, and performing weighting processing on the word vector of each participle based on the similarity;
and taking a vector sequence formed by the word vectors after weighting processing as global description information aiming at the text information and the target comment information.
3. The method of claim 2, wherein said obtaining a first coding sequence comprises:
performing word segmentation processing on the target comment information to obtain a word segmentation sequence corresponding to the target comment information, and performing word segmentation processing on the text information to obtain a word segmentation sequence of the text information;
performing sequence splicing on the word segmentation sequence of the target comment information and the word segmentation sequence of the text information to obtain a target splicing sequence;
and coding each participle in the target splicing sequence to obtain a first coding sequence.
4. The method of claim 3, wherein the performing sequence splicing on the word segmentation sequence of the target comment information and the word segmentation sequence of the text information to obtain a target splicing sequence comprises:
acquiring the sequence length of the word segmentation sequence of the target comment information and the sequence length of the word segmentation sequence of the text information;
if the sequence length of the word segmentation sequence of the target comment information and the sum of the sequence lengths of the word segmentation sequences of the text information are less than or equal to a length threshold, directly splicing the word segmentation sequence of the target comment information and the word segmentation sequence of the text information to obtain a sequence serving as a target splicing sequence;
and when the sum of the lengths is larger than the length threshold, performing sequence segmentation on the word segmentation sequence of the text information based on the sequence length of the word segmentation sequence of the target comment information and the length threshold, and splicing each segmentation sequence with the word segmentation sequence of the target comment information respectively to obtain each spliced sequence which is the target spliced sequence.
5. The method of claim 3, wherein the method further comprises:
adding separators between the word segmentation sequence of the target comment information and the word segmentation sequence of the text information;
and adding global description information as a starting character at the starting position of the target splicing sequence.
6. The method of claim 1, wherein the global description information is used as a starting character of a first coding sequence, the first coding sequence is generated by calling a comment reply model, the comment reply model is a model obtained by deep learning training, the first coding sequence is obtained by encoding text information of the multimedia data included in the content information and the target comment information by an encoder of the comment reply model, and the encoder is connected with a discrimination network; the identifying, by using the global description information, a reply type for the target comment information includes:
calling a discrimination network in the comment reply model to identify and process the global description information to obtain a type discrimination score; the global description information is obtained by acquiring a starting character of the first coding sequence through the encoder;
and determining the information types of the target comment information according to the type discrimination scores, wherein one information type is associated with one reply type, and the information type comprises at least one of an objective fact type and a general type.
7. The method of claim 6, wherein the obtaining a reply policy that matches the reply type comprises:
when the type discrimination score is larger than or equal to a preset score threshold value, taking a reply type associated with an objective fact type as a reply type of the target comment information, and taking a strategy for indicating information reply through a comment reply model as a reply strategy;
and when the type discrimination score is smaller than the preset score threshold, taking the reply type associated with the general type as the reply type of the target comment information, and taking a strategy for indicating information reply through a generation model as a reply strategy.
8. The method of claim 1, wherein if the reply policy indicates that the information is replied by invoking a comment reply model, the comment reply model further comprises a decoder, the decoder is connected to the encoder, the encoder is configured to encode the text information of the multimedia data included in the content information and the target comment information to obtain a first encoded sequence, and the decoder comprises a first linear layer and a second linear layer; generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy comprises the following steps:
adopting the first linear layer in a decoder in the comment reply model to identify the first coding sequence obtained from the encoder, and determining the probability that the word vector corresponding to each participle forming the first coding sequence is the initial position of reply information;
adopting the second linear layer in a decoder in the comment reply model to identify the first coding sequence obtained from the encoder, and determining the probability that the word vector corresponding to each participle forming the first coding sequence is the end position of reply information;
and intercepting part of the coding sequence from the first coding sequence according to the probability that each word vector in the first coding sequence is respectively a reply starting position and a reply ending position, and decoding the part of the coding sequence to obtain reply information of the target comment information.
9. The method of claim 8, wherein said truncating a portion of the code sequence from the first code sequence based on the probability that each word vector in the first code sequence is a respective beginning position of a reply and ending position of a reply comprises:
selecting a reply starting position and a reply ending position which meet a reply limitation condition from the first coding sequence, wherein the reply limitation condition comprises one or two of the following items: the reply ending position is greater than the reply starting position, and the length of the coding sequence determined based on the selected reply starting position and the reply ending position is greater than a length threshold value;
and selecting the word vector with the maximum combination probability as the reply starting position and the word vector as the reply ending position according to the probability of the selected reply starting position corresponding to the word vector and the probability of the corresponding reply ending position corresponding to the word vector, and taking the selected word vector and the word vector between the selected word vectors as partial coding sequences.
10. The method of claim 1, wherein if the reply policy indicates a reply to the message by invoking a generative model; generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy comprises the following steps:
calling the generation model to encode the image information of the multimedia data included in the content information and the word segmentation sequence of the text information of the multimedia data included in the content information to obtain a second encoding sequence; the second coding sequence comprises word vectors of all participles in the text information;
calling the generation model, and adding corresponding global description information to each word vector in the second coding sequence to obtain a new second coding sequence;
and acquiring a target plot label corresponding to the multimedia data, and generating reply information of the target comment information by adopting the new second coding sequence and the target plot label.
11. The method of claim 10, wherein the invoking the generative model encodes a segmentation sequence of the image information of the multimedia data included in the content information and the text information of the multimedia data included in the content information to obtain a second encoded sequence, comprising:
acquiring an image vector of each image in the multimedia data and the sequence length of the word segmentation sequence;
when the sequence length is less than or equal to a length threshold value, adding the image vector to the initial character of the word segmentation sequence to obtain a second coding sequence;
and when the sequence length is greater than the length threshold, performing sequence segmentation on the word segmentation sequence based on the length threshold, and adding the image vector at the initial character of each segmentation sequence to obtain a new segmentation sequence, wherein each obtained new segmentation sequence is the second coding sequence.
12. The method of claim 10, wherein the obtaining the target episode tag corresponding to the multimedia data comprises:
acquiring a label vector corresponding to any plot label;
carrying out vector operation on the new second coding sequence and any label vector to obtain the matching degree between the new second coding sequence and any label vector;
and selecting the plot label corresponding to the label vector with the corresponding matching degree greater than or equal to the matching degree threshold value as the target plot label of the multimedia data.
13. The method of claim 10, wherein said generating a reply message to said targeted commentary message using said new second encoded sequence and said targeted episode tag comprises:
predicting to obtain the ith reply word segmentation based on the label vector corresponding to the target plot label and the new second coding sequence; i is more than or equal to 1 and is a positive integer;
adopting the ith reply participle prediction to obtain an ith +1 reply participle;
when the obtained i + j-th reply participle meets the prediction termination condition, generating reply information of the target comment information based on the i + j reply participles obtained through prediction; j is more than or equal to 1 and is a positive integer.
14. The method of claim 13, wherein predicting the ith reply participle based on the tag vector corresponding to the target episode tag and the new second coding order comprises:
mapping the label vector of the target plot label to obtain a mapping vector of the label vector, and calculating the similarity between the new second coding sequence and the mapping vector;
generating a label vector of the plot label associated with the target plot label according to the similarity and the new second coding sequence;
and generating probability distribution of each participle in the word list matrix as the ith reply participle based on the generated label vector of the associated plot label and the word list matrix, and selecting one or more participles from the word list matrix as the ith reply participle based on the probability distribution.
15. An information processing apparatus characterized by comprising:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring content information of multimedia data and target comment information aiming at the multimedia data;
the obtaining unit is further configured to obtain global description information, where the global description information is used to describe information semantics of the content information and the target comment information;
the processing unit is used for identifying a reply type aiming at the target comment information by adopting the global description information and acquiring a reply strategy matched with the reply type;
and the processing unit is further used for generating reply information of the target comment information based on the content information of the multimedia data according to the reply strategy and outputting the reply information.
16. A computer device comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1 to 14.
17. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 14.
18. A computer program product, characterized in that the computer product comprises a computer program or computer instructions, which when executed by a processor is adapted to carry out the method according to any one of claims 1 to 14.
CN202111133730.5A 2021-09-26 2021-09-26 Information processing method, information processing device, computer equipment and storage medium Pending CN114329005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111133730.5A CN114329005A (en) 2021-09-26 2021-09-26 Information processing method, information processing device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111133730.5A CN114329005A (en) 2021-09-26 2021-09-26 Information processing method, information processing device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114329005A true CN114329005A (en) 2022-04-12

Family

ID=81045199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111133730.5A Pending CN114329005A (en) 2021-09-26 2021-09-26 Information processing method, information processing device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114329005A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114900A (en) * 2022-08-29 2022-09-27 北京达佳互联信息技术有限公司 Text comment association method and device, electronic equipment, storage medium and product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114900A (en) * 2022-08-29 2022-09-27 北京达佳互联信息技术有限公司 Text comment association method and device, electronic equipment, storage medium and product

Similar Documents

Publication Publication Date Title
CN113591902B (en) Cross-modal understanding and generating method and device based on multi-modal pre-training model
CN112069302A (en) Training method of conversation intention recognition model, conversation intention recognition method and device
CN113792177B (en) Scene character visual question-answering method based on knowledge-guided deep attention network
CN110795549B (en) Short text conversation method, device, equipment and storage medium
CN112632314A (en) Image retrieval method, system, device and medium
CN112804558B (en) Video splitting method, device and equipment
CN114328807A (en) Text processing method, device, equipment and storage medium
CN116756577B (en) Model training method, device, equipment and storage medium
WO2023226239A1 (en) Object emotion analysis method and apparatus and electronic device
CN113392265A (en) Multimedia processing method, device and equipment
CN114282055A (en) Video feature extraction method, device and equipment and computer storage medium
CN114091466A (en) Multi-modal emotion analysis method and system based on Transformer and multi-task learning
CN114360502A (en) Processing method of voice recognition model, voice recognition method and device
US20230034414A1 (en) Dialogue processing apparatus, learning apparatus, dialogue processing method, learning method and program
CN111368066B (en) Method, apparatus and computer readable storage medium for obtaining dialogue abstract
CN114329005A (en) Information processing method, information processing device, computer equipment and storage medium
CN117349402A (en) Emotion cause pair identification method and system based on machine reading understanding
CN113704466B (en) Text multi-label classification method and device based on iterative network and electronic equipment
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium
CN115019137A (en) Method and device for predicting multi-scale double-flow attention video language event
CN114648005A (en) Multi-fragment machine reading understanding method and device for multitask joint learning
CN114330701A (en) Model training method, device, computer equipment, storage medium and program product
CN114328910A (en) Text clustering method and related device
CN114282537A (en) Social text-oriented cascade linear entity relationship extraction method
CN117540007B (en) Multi-mode emotion analysis method, system and equipment based on similar mode completion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination