CN113590983A - Description text generation method and device and text processing model training method - Google Patents

Description text generation method and device and text processing model training method Download PDF

Info

Publication number
CN113590983A
CN113590983A CN202110121602.2A CN202110121602A CN113590983A CN 113590983 A CN113590983 A CN 113590983A CN 202110121602 A CN202110121602 A CN 202110121602A CN 113590983 A CN113590983 A CN 113590983A
Authority
CN
China
Prior art keywords
text
comment
target
label
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110121602.2A
Other languages
Chinese (zh)
Inventor
霍腾飞
刘志强
张金超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110121602.2A priority Critical patent/CN113590983A/en
Publication of CN113590983A publication Critical patent/CN113590983A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for generating a description text, a computer device, and a storage medium. The method comprises the following steps: acquiring a shared content text and an auxiliary description text corresponding to a target task; the text type of the auxiliary description text is one of a comment type and a tag type; respectively carrying out semantic coding processing on the shared content text and the auxiliary description text to obtain a corresponding shared vector sequence and an auxiliary vector sequence; randomizing the sharing vector sequence, and determining a sharing hidden variable based on a randomizing result; semantic decoding processing is carried out on the basis of the shared hidden variables and the auxiliary vector sequence, a target description text corresponding to the target task is output, the text type of the target description text is one of comment type and label type, and the text type of the target description text is different from that of the auxiliary description text. By adopting the method, the generation efficiency of the target description text can be improved.

Description

Description text generation method and device and text processing model training method
Technical Field
The application relates to the technical field of computers, in particular to a method and a device for generating a description text and a training method of a text processing model.
Background
With the development of network technologies, users can be exposed to various text information, such as music comments, news tags, blog comments, report tags, paper comments, and the like, anytime and anywhere. Currently, text is mainly manually written, for example, music comments can be manually input in the corresponding comment input area through a music application, or news tags can be manually added in the corresponding news adding area through a news application. However, with the development of internet technology, the amount of text information accessible to people is increasing, and the demand for creative types of text information is also increasing, for example, people want to be able to read more music comments and news labels. Therefore, a text author needs to manually author a large amount of text content, and the mode of manually writing the text is inefficient.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a description text generation method, an apparatus, a computer device, and a storage medium capable of improving generation efficiency of a target description text in view of the above technical problems.
A description text generation method, the method comprising:
acquiring a shared content text and an auxiliary description text corresponding to a target task; the text type of the auxiliary description text is one of a comment type and a tag type;
semantic coding processing is respectively carried out on the shared content text and the auxiliary description text to obtain a corresponding shared vector sequence and an auxiliary vector sequence;
randomizing the sharing vector sequence, and determining a sharing hidden variable based on a randomizing result;
semantic decoding processing is carried out on the basis of the shared hidden variables and the auxiliary vector sequence, a target description text corresponding to the target task is output, the text type of the target description text is one of comment type and label type, and the text type of the target description text is different from that of the auxiliary description text.
In one embodiment, the obtaining of the shared content text and the auxiliary description text corresponding to the target task includes:
determining a target task and acquiring a clue keyword corresponding to the target task;
and retrieving based on the clue keywords to obtain a shared content text and an auxiliary description text which are matched with the target task.
In one embodiment, the descriptive text generation is performed by a text processing model; the text processing model comprises a comment generation network and a label generation network; the comment generation network is used for generating a target description text belonging to a comment type according to the shared content text and the auxiliary description text belonging to a tag type; and the label generation network is used for generating a target description text belonging to the label type according to the shared content text and the auxiliary description text belonging to the comment type.
In one embodiment, the text processing model further includes a content coding network, and the semantic coding processing is performed on the shared content text and the auxiliary description text respectively to obtain a corresponding shared vector sequence and an auxiliary vector sequence, including:
performing semantic coding processing on the shared content text through the content coding network to obtain a shared vector sequence corresponding to the shared content text;
when the auxiliary description text belongs to the label type, performing semantic coding processing on the auxiliary description text through the comment generation network to obtain a corresponding first auxiliary vector sequence;
when the auxiliary description text belongs to the comment type, performing semantic coding processing on the auxiliary description text through the tag generation network to obtain a corresponding second auxiliary vector sequence;
the randomizing the sharing vector sequence and determining a sharing hidden variable based on a randomizing result comprise:
when the auxiliary description text belongs to the type of the label, generating a network through the comment, randomizing the sharing vector sequence, and determining a first sharing hidden variable based on a processing result;
when the auxiliary description text belongs to a comment type, generating a network through the tag, randomizing the sharing vector sequence, and determining a second sharing hidden variable based on a processing result;
performing semantic decoding processing based on the shared hidden variable and the auxiliary vector sequence, and outputting a target description text corresponding to the target task, including:
when the auxiliary description text belongs to the label type, generating a network through the comment, and decoding the first auxiliary vector sequence based on the first shared hidden variable to obtain a target description text belonging to the comment type;
and when the auxiliary description text belongs to the comment type, generating a network through the tag, and decoding the second auxiliary vector sequence based on the second shared hidden variable to obtain a target description text belonging to the tag type.
In an embodiment, the decoding, by the comment generation network, the first auxiliary vector sequence based on the first shared hidden variable to obtain a target description text belonging to a comment type includes:
generating a network through the comment, and decoding the first auxiliary vector sequence based on the first shared hidden variable and the word vector of the target word output last time to obtain a current target end vector sequence;
generating a network through the comments, and determining a target word output at the current time according to the current target end vector sequence;
and generating a network through the comments, and forming a target description text belonging to the comment type based on each output target word.
In one embodiment, the training of the text processing model comprises:
acquiring a first sample training set, wherein the first sample training set comprises a first content text, a first sample comment and a first sample label, and the first sample comment and the first sample label correspond to the first content text;
carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence;
generating a network through comments in a text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment;
generating a network through a label in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; wherein the comment generating network and the tag generating network are in a dual structure;
determining a pair loss from a first difference between the first predictive tag and the first sample tag and a second difference between the first predictive comment and the first sample comment;
training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to a target task; the target description text is at least one of a target comment text and a target label text.
A descriptive text generating apparatus, the apparatus comprising:
the text acquisition module is used for acquiring a shared content text and an auxiliary description text corresponding to the target task; the text type of the auxiliary description text is one of a comment type and a tag type;
the randomization processing module is used for respectively carrying out semantic coding processing on the shared content text and the auxiliary description text to obtain a corresponding shared vector sequence and an auxiliary vector sequence; randomizing the sharing vector sequence, and determining a sharing hidden variable based on a randomizing result;
and the decoding module is used for performing semantic decoding processing on the basis of the shared hidden variable and the auxiliary vector sequence and outputting a target description text corresponding to the target task, wherein the text type of the target description text is one of a comment type and a tag type, and the text type of the target description text is different from that of the auxiliary description text.
In one embodiment, the text obtaining module is further configured to determine a target task and obtain a clue keyword corresponding to the target task; and retrieving based on the clue keywords to obtain a shared content text and an auxiliary description text which are matched with the target task.
In one embodiment, the randomization processing module further comprises a vector sequence determination module, configured to determine a shared word sequence corresponding to the shared content text, and determine an auxiliary word sequence corresponding to the auxiliary description text; respectively carrying out forward semantic coding and reverse semantic coding on the shared word sequence to obtain a forward shared coding vector sequence and a reverse shared coding vector sequence; performing sequence fusion processing on the forward sharing coding vector sequence and the reverse sharing coding vector sequence to obtain a sharing vector sequence corresponding to the shared content text; respectively carrying out forward semantic coding and reverse semantic coding on the auxiliary word sequence to obtain a forward auxiliary coding vector sequence and a reverse auxiliary coding vector sequence; and carrying out sequence fusion processing on the forward auxiliary coding vector sequence and the backward auxiliary coding vector sequence to obtain an auxiliary vector sequence corresponding to the auxiliary description text.
In one embodiment, the randomization processing module further comprises a hidden variable determination module for converting the sequence of shared vectors into a corresponding target probability distribution and determining a corresponding target mean and target variance based on the target probability distribution; when the auxiliary description text belongs to the label type, performing at least one sampling based on the target probability distribution according to the target mean and the target variance to obtain at least one shared hidden variable corresponding to the auxiliary description text; when the auxiliary description text belongs to the comment type, the target mean value is used as a shared hidden variable corresponding to the auxiliary description text.
In one embodiment, the decoding module is further configured to form a group of objects to be decoded by each shared hidden variable with the auxiliary vector sequence; and respectively performing semantic decoding processing on each group of objects to be decoded, and outputting at least one target description text corresponding to the target task, wherein each output target description text is different.
In one embodiment, the descriptive text processing device is used for executing a descriptive text processing method, and the descriptive text processing method is executed through a text processing model; the text processing model comprises a comment generation network and a label generation network; the comment generation network is used for generating a target description text belonging to a comment type according to the shared content text and the auxiliary description text belonging to a tag type; and the label generation network is used for generating a target description text belonging to the label type according to the shared content text and the auxiliary description text belonging to the comment type.
In one embodiment, the descriptive text processing device is further configured to perform semantic coding processing on the shared content text through the content coding network to obtain a shared vector sequence corresponding to the shared content text; when the auxiliary description text belongs to the label type, performing semantic coding processing on the auxiliary description text through the comment generation network to obtain a corresponding first auxiliary vector sequence; when the auxiliary description text belongs to the comment type, performing semantic coding processing on the auxiliary description text through the tag generation network to obtain a corresponding second auxiliary vector sequence; when the auxiliary description text belongs to the type of the label, generating a network through the comment, randomizing the sharing vector sequence, and determining a first sharing hidden variable based on a processing result; when the auxiliary description text belongs to a comment type, generating a network through the tag, randomizing the sharing vector sequence, and determining a second sharing hidden variable based on a processing result; when the auxiliary description text belongs to the label type, generating a network through the comment, and decoding the first auxiliary vector sequence based on the first shared hidden variable to obtain a target description text belonging to the comment type; and when the auxiliary description text belongs to the comment type, generating a network through the tag, and decoding the second auxiliary vector sequence based on the second shared hidden variable to obtain a target description text belonging to the tag type.
In an embodiment, the decoding module is further configured to generate a network through the comment, and decode the first auxiliary vector sequence based on the first shared hidden variable and a word vector of a target word output last time, so as to obtain a current target-end vector sequence; generating a network through the comments, and determining a target word output at the current time according to the current target end vector sequence; and generating a network through the comments, and forming a target description text belonging to the comment type based on each output target word.
In one embodiment, the descriptive text processing apparatus is further configured to obtain a first sample training set, where the first sample training set includes a first content text, and a first sample comment and a first sample tag corresponding to the first content text; carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence; generating a network through comments in a text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment; generating a network through a label in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; wherein the comment generating network and the tag generating network are in a dual structure; determining a pair loss from a first difference between the first predictive tag and the first sample tag and a second difference between the first predictive comment and the first sample comment; training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to a target task; the target description text is at least one of a target comment text and a target label text.
In one embodiment, when the target task is a music comment generation task, the shared content text includes at least one of a song name, a lyric text, a prosody description text and an author attribute text corresponding to a target song, the auxiliary description text is a music tag set corresponding to the target song, and the target description text is a music comment set corresponding to the target song; when the target task is a music tag generation task, the shared content text comprises at least one of a song name, a lyric text, a rhythm description text and an author attribute text corresponding to a target song, the auxiliary description text is a music comment set corresponding to the target song, and the target description text is a music tag set corresponding to the target song.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a shared content text and an auxiliary description text corresponding to a target task; the text type of the auxiliary description text is one of a comment type and a tag type;
semantic coding processing is respectively carried out on the shared content text and the auxiliary description text to obtain a corresponding shared vector sequence and an auxiliary vector sequence;
randomizing the sharing vector sequence, and determining a sharing hidden variable based on a randomizing result;
semantic decoding processing is carried out on the basis of the shared hidden variables and the auxiliary vector sequence, a target description text corresponding to the target task is output, the text type of the target description text is one of comment type and label type, and the text type of the target description text is different from that of the auxiliary description text.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a shared content text and an auxiliary description text corresponding to a target task; the text type of the auxiliary description text is one of a comment type and a tag type;
semantic coding processing is respectively carried out on the shared content text and the auxiliary description text to obtain a corresponding shared vector sequence and an auxiliary vector sequence;
randomizing the sharing vector sequence, and determining a sharing hidden variable based on a randomizing result;
semantic decoding processing is carried out on the basis of the shared hidden variables and the auxiliary vector sequence, a target description text corresponding to the target task is output, the text type of the target description text is one of comment type and label type, and the text type of the target description text is different from that of the auxiliary description text.
According to the description text generation method, the description text generation device, the computer equipment and the storage medium, the shared content text and the auxiliary description text corresponding to the target task are obtained, and the shared vector sequence and the auxiliary vector sequence can be obtained by encoding the shared content text and the auxiliary description text. By acquiring the shared vector sequence which can be shared between the two target tasks, the corresponding shared hidden variable can be determined based on the shared vector sequence, so that semantic decoding processing is performed based on the shared hidden variable and the auxiliary vector sequence, and a target description text corresponding to the target tasks is obtained. Because the target description text can be automatically output, compared with the traditional manual writing, the target description text generation method and device greatly improve the generation efficiency of the target description text.
In addition, because the corresponding target description text is obtained based on the shared hidden variable shared in the two target tasks, the method and the device can fully utilize the semantic information shared in the comment generation task and the label generation task, and therefore more accurate target description text is output based on the shared semantic information. And because the target description text of the comment type can be outputted in an auxiliary manner based on the auxiliary text of the tag type, and the target description text of the tag type can be outputted in an auxiliary manner based on the auxiliary text of the comment type, the accuracy of the target description text can be further improved.
A method of training a text processing model, the method comprising:
acquiring a first sample training set, wherein the first sample training set comprises a first content text, a first sample comment and a first sample label, and the first sample comment and the first sample label correspond to the first content text;
carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence;
generating a network through comments in a text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment;
generating a network through a label in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; wherein the comment generating network and the tag generating network are in a dual structure;
determining a pair loss from a first difference between the first predictive tag and the first sample tag and a second difference between the first predictive comment and the first sample comment;
training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to a target task; the target description text is at least one of a target comment text and a target label text.
An apparatus for training a text processing model, the apparatus comprising:
the encoding module is used for acquiring a first sample training set, wherein the first sample training set comprises a first content text, a first sample comment and a first sample label, and the first sample comment and the first sample label correspond to the first content text; carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence;
the output module is used for generating a network through comments in a text processing model to be trained, and coding and decoding the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment; generating a network through a label in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; wherein the comment generating network and the tag generating network are in a dual structure;
a training module to determine a dual loss based on a first difference between the first predictive label and the first sample label and a second difference between the first predictive comment and the first sample comment; training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to a target task; the target description text is at least one of a target comment text and a target label text.
In one embodiment, the text processing model further includes a connection network, the connection network is used for connecting the comment generating network and the tag generating network, and the training module is further used for performing first training on the comment generating network and the tag generating network in the text processing model based on the dual loss until a first stopping condition is met; acquiring a second sample training set, wherein the second sample training set comprises a second content text, a second sample label corresponding to the second content text, a third content text and a third sample comment corresponding to the third content text; performing second training on a connecting network in the text processing model obtained by executing the first training through the second sample training set until a second stopping condition is met; and performing third training on the text processing model obtained by executing the second training through the first sample training set and the second sample training set until a third stopping condition is met, so as to obtain the trained text processing model.
In one embodiment, the connectivity network comprises a tag connectivity network; the training module is further used for generating a network through comments in a text processing model obtained by executing first training, and coding and decoding the second content text and the second sample label to obtain a predicted comment vector sequence; converting the predicted comment vector sequence into a substitute comment vector sequence through a label connection network in a text processing model obtained by executing first training; generating a network through a label in a text processing model obtained by executing first training, and decoding the alternative comment vector sequence to obtain a corresponding second prediction label; and determining the label difference between the second prediction label and the second sample label, and performing second training on the label connection network through the label difference until a second stop condition is met.
In one embodiment, the connection network comprises a comment connection network; the training module is further configured to perform coding and decoding processing on the third content text and the third sample comment through a label generation network in a text processing model obtained by performing first training, so as to obtain a predicted label vector sequence; converting the predicted tag vector sequence into a substitute tag vector sequence through a comment connection network in a text processing model obtained by executing first training; generating a network through comments in a text processing model obtained by executing first training, and decoding the substitute label vector sequence to obtain a corresponding second prediction comment; and determining comment differences between the second predicted comment and the third sample comment, and performing second training on the comment connecting network through the comment differences until a second stopping condition is met.
In one embodiment, the training module is further configured to perform corresponding branch training on a comment generation network and a label connection network in a text processing model obtained by performing second training jointly based on the first content text and a first sample label corresponding to the first content text, and the second content text and a second sample label corresponding to the second content text; performing corresponding branch training on a label generation network and a comment connection network in a text processing model obtained by performing second training jointly based on the first content text and a first sample comment corresponding to the first content text, and the third content text and a third sample comment corresponding to the third content text; and finishing the training when the third stopping condition is met based on the training of each branch to obtain the trained text processing model.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a first sample training set, wherein the first sample training set comprises a first content text, a first sample comment and a first sample label, and the first sample comment and the first sample label correspond to the first content text;
carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence;
generating a network through comments in a text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment;
generating a network through a label in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; wherein the comment generating network and the tag generating network are in a dual structure;
determining a pair loss from a first difference between the first predictive tag and the first sample tag and a second difference between the first predictive comment and the first sample comment;
training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to a target task; the target description text is at least one of a target comment text and a target label text.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a first sample training set, wherein the first sample training set comprises a first content text, a first sample comment and a first sample label, and the first sample comment and the first sample label correspond to the first content text;
carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence;
generating a network through comments in a text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment;
generating a network through a label in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; wherein the comment generating network and the tag generating network are in a dual structure;
determining a pair loss from a first difference between the first predictive tag and the first sample tag and a second difference between the first predictive comment and the first sample comment;
training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to a target task; the target description text is at least one of a target comment text and a target label text.
According to the text processing model training method, the text processing model training device, the computer equipment and the storage medium, through obtaining the first sample training set, semantic coding processing can be carried out on the shared first content text in the first sample training set, and the corresponding sample content vector sequence is obtained, so that the comment generation network and the label generation network can carry out coding and decoding processing through the shared sample content vector sequence and the corresponding first sample label or first sample comment, and a first prediction comment and a first prediction label are obtained. By obtaining the first prediction comment and the first prediction label, dual loss can be determined according to a first difference between the first prediction label and the first sample label and a second difference between the first prediction comment and the first sample comment, and text processing is trained on the basis of the dual loss, so that a trained text processing model is obtained. Because the target description text can be automatically output through the trained text processing model, compared with the traditional manual writing, the generation efficiency of the target description text is greatly improved.
In addition, because the label generation network and the comment generation network in the text processing model can be trained through the shared first content text, semantic information which can be shared can be fully utilized in the model training process, and therefore the accuracy of the text processing model is improved.
Drawings
FIG. 1 is a diagram of an application environment in which a method for generating text is described in one embodiment;
FIG. 2 is a flow diagram that illustrates a method for generating text in one embodiment;
FIG. 3 is a flow diagram illustrating text acquisition in one embodiment;
FIG. 4 is a model diagram of a text processing model in one embodiment;
FIG. 5 is a flowchart illustrating a method for training a text processing model according to one embodiment;
FIG. 6A is a schematic diagram of the connections of a connection network in one embodiment;
FIG. 6B is a schematic diagram of a connection network according to another embodiment;
FIG. 7 is a flowchart illustrating a method for generating text in an exemplary embodiment;
FIG. 8 is a flowchart illustrating a method for training a text processing model in accordance with an exemplary embodiment;
FIG. 9 is a block diagram illustrating the structure of a text generation apparatus in one embodiment;
FIG. 10 is a block diagram showing a configuration of an apparatus for training a text processing model according to an embodiment;
FIG. 11 is a diagram illustrating an internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
FIG. 1 is a diagram of an application environment in which a method for generating text is described in one embodiment. Referring to fig. 1, the description text generation method is applied to a description text generation system 100. The descriptive text generating system 100 includes a terminal 102 and a server 104. The terminal 102 and the server 104 may be cooperatively used to execute the content text generation method of the present application, or may be used alone to execute the content text generation method of the present application. For example, the terminal 102 may send the shared content text and the auxiliary description text corresponding to the target task to the server 102, and the server 102 executes the description text generation method to obtain the target description text corresponding to the target task, and then returns the target description text to the terminal 102 to be displayed correspondingly by the terminal 102. The terminal 102 may also execute the description text generation method after acquiring the shared content text and the auxiliary description text. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
It is also noted that the present application relates to the field of Artificial Intelligence (AI) technology, which is a theory, method, technique and application system that utilizes a digital computer or a machine controlled by a digital computer to simulate, extend and extend human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The present application relates specifically to Natural Language Processing (NLP) and Machine Learning techniques (ML) in the field of artificial intelligence. Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
It should be understood that the use of "first," "second," and similar terms in the present disclosure are not intended to indicate any order, quantity, or importance, but rather are used to distinguish one element from another. The singular forms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one, unless the context clearly dictates otherwise.
In one embodiment, as shown in fig. 2, a description text generation method is provided, which is exemplified by applying the method to the computer device in fig. 1, which may be the terminal 102 or the server 104 in fig. 1. Referring to fig. 2, the description text generating method specifically includes the following steps:
step S202, obtaining a shared content text and an auxiliary description text corresponding to a target task; the text type of the auxiliary description text is one of a comment type and a tag type.
Specifically, when the target task is obtained, the computer device may directly extract the corresponding shared content text and the auxiliary description text from the target task, or may use the target task as an index to pull the corresponding shared content text and the auxiliary description text from other computer devices. The shared content text refers to an initial text which can be shared when a comment type target description text and a label type target description text are generated. The shared content text may be a text such as a sentence, a musical lyric, or a poetry title, or may be a chinese text or an english text. The auxiliary description text refers to a text for describing the shared content text, the text type of the auxiliary description text corresponds to one of a comment type and a tag type, and when the auxiliary description text belongs to the comment type, the auxiliary description text can be specifically a comment text obtained by commenting on the shared content text; when the auxiliary description text belongs to the tag type, the auxiliary description text may specifically be a tag text describing a classification attribute of the shared content text.
In one embodiment, the target task may specifically be a music comment generating task, a music tag generating task, a news comment generating task, or a news tag generating task. When the target task is a music comment generation task, the shared content text can be at least one of a song name, a lyric text, a rhythm description text and an author attribute text of a target song to be subjected to music comment generation; the auxiliary description text may be a music tag text that classifies attributes of the target song. When the target task is a music tag generation task, the shared content text can be at least one of a song name, a lyric text, a rhythm description text and an author attribute text of a target song to be subjected to music tag generation; the auxiliary description text may be a music comment text commenting on the target song. It is readily understood that the shared content text is a common input text required for different types of target tasks, and the auxiliary description text is an input text having a corresponding type selected by the computer device based on the task type of the target task.
In one embodiment, when the shared content text and the auxiliary description text are obtained, the computer device may perform word segmentation on the shared content text to obtain a shared word sequence composed of the words, and perform word segmentation on the auxiliary description text to obtain an auxiliary word sequence composed of the words. The computer device inputs the shared word sequence and the auxiliary word sequence into the trained text processing model for further processing. For the word segmentation processing of the shared content text or the auxiliary description text as the Chinese text, a word segmentation mode based on a dictionary or statistics can be adopted. For the word segmentation processing of the shared content text or the auxiliary description text as the English text, word segmentation modes such as splitting words according to the blank space can be adopted. The text processing model refers to a machine learning model used for executing a description text generation method.
In one embodiment, obtaining shared content text and auxiliary description text corresponding to a target task comprises: determining a target task and acquiring clue keywords corresponding to the target task; and searching based on the clue keywords to obtain a shared content text and an auxiliary description text which are matched with the target task.
Wherein the clue keywords are keywords for determining the shared content text and the auxiliary description text matching the target task. When the target task is a music comment generation task or a music tag generation task, the clue keyword may be a song name, a singer name, or a title of a content text to be generated, specifically.
Specifically, when the target task is determined, the computer device may obtain a thread keyword corresponding to the target task, and perform local or network retrieval based on the determined thread keyword to obtain a shared content text and an auxiliary description text corresponding to the current target task. Illustratively, referring to fig. 3, when the target task is a music review generation task and the clue keyword corresponding to the target task is a song name, the computer device may perform a network search based on the song name, determine a song identification corresponding to the song name, and obtain a complete lyric and a music tag text corresponding to the song name based on the song identification. Furthermore, the computer device takes the retrieved complete lyrics as a shared content text, inputs the music label text as an auxiliary description text into a pre-trained text processing model, so as to obtain music comments related to the lyrics content, music comments related to the lyrics style, music comments related to listener behaviors and the like output by the text processing model, and sends at least one generated music comment to the terminal for corresponding display. FIG. 3 illustrates a flow diagram for text acquisition in one embodiment.
In the embodiment, the shared content text can contain richer text information by performing text expansion processing on the key search terms, so that the target description texts at different viewing angles can be output based on the richer text information, and the diversity of the output target description texts is enriched.
And step S204, performing semantic coding processing on the shared content text and the auxiliary description text respectively to obtain a corresponding shared vector sequence and an auxiliary vector sequence.
The semantic coding processing is performed on the shared content text or the auxiliary description text, and is a process of converting the shared content text or the auxiliary description text into a vector. The vector sequence is obtained by semantically coding the word sequence of the shared content text or the auxiliary description text.
Specifically, the computer device can perform semantic coding on each word in the shared word sequence, extract semantic features of each word, convert each word into vector representation, and further obtain the shared vector sequence according to the vector representation corresponding to each word. Similarly, the computer device can perform semantic coding on each word in the auxiliary word sequence respectively, extract semantic features of each word, convert each word into vector representation, and further obtain the auxiliary vector sequence according to the vector representation corresponding to each word.
In one embodiment, the semantic encoding process is performed by a text processing model. The text processing model is a pre-trained machine learning model and comprises a content coding network, a comment generation network and a label generation network. The content coding network is used for converting the shared content text into a shared vector sequence; the comment generation network is used for converting the auxiliary description texts belonging to the label type into a first auxiliary vector sequence and outputting a comment type target description text according to the first auxiliary vector sequence and the shared vector sequence; and the label generation network is used for converting the auxiliary description text belonging to the comment type into a second auxiliary vector sequence and outputting the target description text of the label type according to the second auxiliary vector sequence and the shared vector sequence.
The content coding network, the comment generating network and the label generating network may adopt the same type of neural network model, or may adopt different types of neural network models. For example, the content encoding network, the comment generating network, and the tag generating network may be CNN (Convolutional Neural Networks) models or RNN (Recurrent Neural Networks) models. Or the content coding network, the comment generating network and the tag generating network respectively adopt different Neural network models, for example, the coding structure adopts an RNN model, the view feature extraction structure adopts a DNN (Deep Neural Networks) model, the decoding structure adopts a CNN model, and the like.
In one embodiment, the multi-layer neural network of the content coding network in the text processing model can perform semantic coding on the shared content text layer by layer to obtain coding hidden layer vectors output by each layer of neural network, and synthesize the coding hidden layer vectors output by each layer of neural network to obtain a corresponding shared vector sequence. Specifically, the computer device may input the spatial representation vector sequence corresponding to the shared word sequence to a first layer of neural network in a multi-layer neural network of the content coding network, perform semantic coding processing on the spatial representation vector sequence through the first layer of neural network, and output a coding hidden layer vector corresponding to the first layer of neural network. And then, taking the coding hidden layer vector output by the first layer of neural network as the input of the second layer of neural network, and performing semantic coding processing through the second layer of neural network to obtain the coding hidden layer vector output by the second layer of neural network. And repeating the steps until obtaining the coding hidden layer vector output by the last layer of neural network. The content coding network can fuse the coding hidden layer vectors output by each layer of neural network to obtain a shared vector sequence. Similarly, the tag generation network or the comment generation network may encode the auxiliary description text based on the above manner to obtain a corresponding auxiliary vector sequence.
The encoding hidden layer vector is a vector obtained by inputting a word sequence into a hidden layer in a multi-layer neural network of a content encoding network and transforming the word sequence through the hidden layer. The hidden layer is a term in the neural network model, is an intermediate layer relative to the input layer and the output layer, and comprises model parameters obtained by training the neural network model. The hidden layer of the content encoding network is here an intermediate layer with respect to the input layer of the content encoding network and the output layer of the encoding structure. The hidden layer of the content encoding network may include a plurality of neural network layers.
Step S206, the sharing vector sequence is randomized, and the sharing hidden variable is determined based on the randomization processing result.
Specifically, when the sharing vector sequence is generated, the content encoding network may randomize the sharing vector sequence, for example, the content encoding network may process the sharing vector sequence through a preset gaussian function, and convert the sharing vector sequence into a gaussian distribution; or the content coding network processes the shared vector sequence, and converts the shared vector sequence into gamma distribution. Further, the text processing model may determine a shared hidden variable based on the randomization process result.
In one embodiment, randomizing the shared vector sequence and determining the shared hidden variable based on the result of the randomization includes: converting the shared vector sequence into corresponding target probability distribution, and determining corresponding target mean and target variance based on the target probability distribution; when the auxiliary description text belongs to the label type, performing at least one sampling based on target probability distribution according to a target mean value and a target variance to obtain at least one shared hidden variable corresponding to the auxiliary description text; and when the auxiliary description text belongs to the comment type, taking the target mean value as a shared hidden variable corresponding to the auxiliary description text.
Specifically, the content encoding network in the text processing model may randomize the shared vector sequence, convert the shared vector sequence into a corresponding target probability distribution, and determine a target mean and a target variance of the target probability distribution. For example, the content encoding network may convert the sequence of sharing vectors into a gaussian distribution and determine the mean and variance of the gaussian distribution. When the auxiliary description text belongs to the type of the label, the comment generating network can sample the target probability distribution at least once according to the target mean value and the target variance to obtain at least one first shared hidden variable, and when the auxiliary description text belongs to the type of the comment, the label generating network takes the target mean value as a second shared hidden variable corresponding to the auxiliary description text. For example, in the above example, the comment generating network may perform sampling on the sharing vector sequence in the gaussian distribution for a preset number of times according to the mean and the variance of the gaussian distribution to obtain at least one first sharing hidden variable, and the tag generating network may use the mean of the gaussian distribution as a second sharing hidden variable.
In one embodiment, the content coding network in the text processing model may obtain the mean and variance of the gaussian distribution by the following equations. mu-Wμ*Hc+bμ,σ=Wσ*Hc+bσ. Wherein μ represents the mean of the gaussian distribution; σ represents the variance of the gaussian distribution; hcRepresenting a sequence of sharing vectors; wμ、bμ、Wσ、bσAre network parameters of the content encoding network. The comment generating network may obtain at least one first shared hidden variable by the following formula: z1 ═ μ + σ ∈, ε ∈ (0 to 1). The tag generation network may obtain the second shared hidden variable by the following formula: and z2 ═ μ.
In the above embodiment, by randomizing the shared vector sequence, the comment generating network can sample from a distribution of random variables, thereby introducing randomness. By introducing randomness, the comment generation model can acquire a plurality of different shared hidden variables, and therefore diversity of the generated target description text belonging to the comment type is improved on the basis of the plurality of different shared hidden variables. The target mean value reflects the average value of the randomization processing result, so that the target description text which is generated subsequently and belongs to the label type can be more accurate by taking the target mean value as the corresponding shared hidden variable.
And S208, performing semantic decoding processing based on the shared hidden variables and the auxiliary vector sequence, and outputting a target description text corresponding to the target task, wherein the text type of the target description text is one of a comment type and a tag type, and the text type of the target description text is different from that of the auxiliary description text.
The decoding is a process of converting an input auxiliary vector sequence into a target description text based on a shared hidden variable. Specifically, when the auxiliary description text belongs to the tag type, the comment generation network in the text processing model may perform decoding processing on the first shared hidden variable and the first auxiliary vector sequence, and output a target description text belonging to the comment type and corresponding to the target task. When the auxiliary description text belongs to the comment type, the tag generation network in the text processing model can decode the second shared hidden variable and the second auxiliary vector sequence and output the target description text which belongs to the tag type and corresponds to the target task.
For example, when the target task is to generate music comments, the shared content text is that "thought is a disease — zhangzheng. When you are on the other side of the mountain-crossing mountain, i have no end on an isolated road, how much people a lifetime has come before having found that the most important things are lost, "and the auxiliary description text is a music label 'injury' belonging to a label type, the music comment generated by the comment generation network decoding in the text processing model is that: "I want you to hold your arms in your arms, I want you. When the target task is to generate a music tag, the shared content text is "no exception-marinated teacher. Things are simulated in our mind countless times, and when you walk away, you experience sadness and may say a simple sentence to show the mind, but the simple sentence also makes us unconscious … ", and the auxiliary description text is a music comment belonging to a comment type, that is, when we are all the past of each other and all are lights of other banks, the music label obtained by network decoding generated by the label in the text processing model is: "cure, solitary".
In one embodiment, decoding, by a comment generation network, a first auxiliary vector sequence based on a first shared hidden variable to obtain a target description text belonging to a comment type includes: generating a network through the comment, and decoding the first auxiliary vector sequence based on the first shared hidden variable and the word vector of the target word output last time to obtain a current target end vector sequence; generating a network through the comment, and determining a target word output at the current time according to the current target end vector sequence; and forming a target description text belonging to the comment type based on each output target word through the comment generation network.
Specifically, when the first shared hidden variable is obtained, the computer device may input the first shared hidden variable to a decoding structure in the comment generation network, the comment generation network uses the first shared hidden variable as an initial decoding hidden layer vector output by a first-layer neural network of a multi-layer neural network in the decoding structure, and performs decoding processing on the first auxiliary vector sequence based on the initial decoding hidden layer vector to obtain a word vector of a target word output by the first-layer neural network. And the computer equipment triggers the second-layer neural network by taking the first shared hidden variable, the initial decoding hidden vector output by the first-layer neural network and the word vector of the target word as the input of the second-layer neural network, determines the decoding hidden vector corresponding to the second-layer neural network based on the first shared hidden variable, the initial decoding hidden vector output by the first-layer neural network and the word vector of the target word, and decodes the first auxiliary vector sequence based on the decoding hidden vector corresponding to the second-layer neural network to obtain the word vector of the target word output by the second-layer neural network. And analogizing in sequence until the word vector of the target word output by the last layer of neural network is obtained, so that the comment text generation network can synthesize the word vectors of the target words output by the neural networks of all layers to obtain the target description text belonging to the comment type.
In one embodiment, the comment generation network may decode the first auxiliary vector sequence for the first shared hidden variable through the following formula, so as to obtain a target description text belonging to the comment type.
x′j=argmaxd′∈Dp(d′|x′0:j-1,y,c)
p(xj|x′0:j-1,y,c)=softmax(sj,cj,z1)
sj=LSTM(sj-1,e(x′j-1))
cj=Attention(sj,Hy) Z1 ═ μ + σ e, ε ∈ (0 to 1)
Wherein, x'jEvaluating and generating a target word output by a network at a representative time step j; y represents an auxiliary description text; c represents shared content text; d represents a preset comment set; argmax is a function that parameterizes a function; softmax is a function that maps an input to a real number between 0 and 1; LSTM is a long-short term memory artificial neural network; e (x'j-1) Representing time stepj, commenting to generate a word vector of a target word output by the network; c. CjRepresenting a context semantic vector obtained by an attention mechanism at time step j; hyRepresents a first auxiliary vector sequence; z1 represents a first shared hidden variable.
In the above embodiment, a decoding structure in the network is generated through the comment, the first auxiliary vector sequence is decoded according to the word vector of the target word output by the decoding structure last time, the word vector of the current target word is obtained, and the target word output by the comment generation network at the current time is determined according to the word vector of the current target word. And generating each target word output by the network according to the comments, and generating a corresponding target description text. Therefore, when the first auxiliary vector sequence is decoded through the decoding structure, the information of all hidden layers can be fused to learn better hidden layer representation, loss of effective information in model processing is reduced, and the accuracy of generation of the target description text is greatly improved.
In an embodiment, when the tag generation network obtains the second shared hidden variable and the second auxiliary vector sequence, the tag generation network may screen out a most matched target tag text from a preset tag text library according to the second shared hidden variable and the second auxiliary vector sequence, and output the target tag text as a target description text. The tag generation network can decode the second shared hidden variable and the second auxiliary vector sequence through the following formula, and output a target description text which is corresponding to the target task and belongs to the tag type.
y′=argmaxs′∈Sp(s′|y,c),p(yi|y,c)=softmax(Hx,z2)
Wherein y' represents an object description text belonging to the tag type; s represents a preset label text library; hxRepresents a second auxiliary vector sequence; z2 represents a second shared hidden variable.
In the description text generation method, the shared content text and the auxiliary description text corresponding to the target task are obtained, and the shared vector sequence and the auxiliary vector sequence can be obtained by encoding the shared content text and the auxiliary description text. By acquiring the shared vector sequence which can be shared between the two target tasks, the corresponding shared hidden variable can be determined based on the shared vector sequence, so that semantic decoding processing is performed based on the shared hidden variable and the auxiliary vector sequence, and a target description text corresponding to the target tasks is obtained. Because the target description text can be automatically output, compared with the traditional manual writing, the target description text generation method and device greatly improve the generation efficiency of the target description text.
In addition, because the corresponding target description text is obtained based on the shared hidden variable shared in the two target tasks, the method and the device can fully utilize the semantic information shared in the comment generation task and the label generation task, and therefore more accurate target description text is output based on the shared semantic information. In addition, the target description text of the comment type can be output in an auxiliary mode based on the auxiliary text of the label type, and the target description text of the label type can be output in an auxiliary mode based on the auxiliary text of the comment type, so that the accuracy of the target description text can be further improved.
In one embodiment, the semantic coding processing is performed on the shared content text and the auxiliary description text respectively to obtain a corresponding shared vector sequence and an auxiliary vector sequence, and the semantic coding processing includes: determining a shared word sequence corresponding to the shared content text, and determining an auxiliary word sequence corresponding to the auxiliary description text; respectively carrying out forward semantic coding and reverse semantic coding on the shared word sequence to obtain a forward shared coding vector sequence and a reverse shared coding vector sequence; carrying out sequence fusion processing on the forward sharing coding vector sequence and the reverse sharing coding vector sequence to obtain a sharing vector sequence corresponding to the shared content text; respectively carrying out forward semantic coding and reverse semantic coding on the auxiliary word sequence to obtain a forward auxiliary coding vector sequence and a reverse auxiliary coding vector sequence; and carrying out sequence fusion processing on the forward auxiliary coding vector sequence and the reverse auxiliary coding vector sequence to obtain an auxiliary vector sequence corresponding to the auxiliary description text.
Specifically, when the shared content text or the auxiliary description text is obtained, the computer device may perform word segmentation processing on the shared content text or the auxiliary description text through a preset word segmentation algorithm to obtain a corresponding word sequence. The preset word segmentation algorithm can be freely set according to requirements, and for example, the preset word segmentation algorithm can be an ICTCCLASS algorithm, a jieba algorithm, a HanNLP algorithm or the like. Further, a content coding network in the text processing model can perform forward semantic coding and reverse semantic coding on a shared word sequence of a shared content text to obtain a forward shared coding vector sequence and a reverse shared coding vector sequence, and perform sequence fusion on the forward shared coding vector sequence and the reverse shared coding vector sequence based on a preset sequence fusion mode to obtain a shared vector sequence corresponding to the shared content text. For example, a linear superposition fusion processing mode can be adopted to perform sequence fusion on the forward sharing coded vector sequence and the reverse sharing coded vector sequence to obtain a sharing vector sequence.
Similarly, when the auxiliary description text belongs to the tag type, forward semantic coding and reverse semantic coding can be performed on the auxiliary word sequence of the auxiliary description text through a coding structure in a comment generation network to obtain a forward auxiliary coding vector sequence and a reverse auxiliary coding vector sequence, and sequence fusion is performed on the forward auxiliary coding vector sequence and the reverse auxiliary coding vector sequence based on a preset sequence fusion mode to obtain a first auxiliary vector sequence corresponding to the auxiliary description text. When the auxiliary description text belongs to the comment type, the word sequence of the auxiliary description text can be coded through a coding structure in the tag generation network, and a corresponding second auxiliary vector sequence is obtained.
In one embodiment, the content encoding network, the encoding structure in the comment generating network, or the encoding structure in the tag generating network may use a neural network LSTM (Long Short-Term Memory, Long Short-Term Memory artificial neural network) to perform forward semantic encoding and reverse semantic encoding on the shared content text or the auxiliary description text. The following describes, by way of example, a process of performing forward semantic coding and reverse semantic coding on a sequence of procedure words sharing a content text layer by layer through a multi-layer neural network of LSTM: taking the word sequence of the shared content text as the sequence with the length of m as an example, first, the computer device can divide the shared content text intoWord, get shared word sequence x ═ (x)1,x2,…,xm) And converting the shared word sequence into a continuous space representation vector sequence e (x) (e (x)) by a word embedding mode1),e(x2),...,e(xm))e(x)=(e(x1),e(x2),...,e(xm))e(x)=(e(x1),e(x2),...,e(xm) Through the LSTM multi-layer neural network, based on the encoding hidden layer vector output by the preorder layer, the word sequence is semantically encoded layer by layer to obtain the corresponding forward shared encoding vector sequence
Figure BDA0002922209120000221
Wherein the content of the first and second substances,
Figure BDA0002922209120000222
LSTMforwordrepresenting forward semantic coding;
Figure BDA0002922209120000223
a coded hidden layer vector representing the ith layer neural network output of the LSTM; e (x)i) Representing the ith spatial representation vector in the sequence of spatial representation vectors. Similarly, the computer equipment carries out reverse semantic coding on the shared content text word sequence layer by layer through the LSTM multilayer neural network to obtain a reverse coding vector sequence
Figure BDA0002922209120000224
Wherein the content of the first and second substances,
Figure BDA0002922209120000225
Figure BDA0002922209120000226
encoding hidden layer vector representing the i-th layer neural network output of LSTM, LSTMbackwordThe expression is reverse semantic coded, so that the coded connection sequence obtained after the forward shared coded vector sequence and the reverse shared coded vector sequence are sequenced is Hc={h1,h2,…,hm}. It is easily understood that the auxiliary description text can also be encoded based on the above manner。
In the embodiment, forward semantic coding and reverse semantic coding are performed on the shared content text and the auxiliary description text, so that the coded shared vector sequence and the auxiliary vector sequence can contain richer semantic information, and the target description text obtained by subsequently decoding the shared vector sequence and the auxiliary vector sequence can be more accurate.
In one embodiment, when the auxiliary description text belongs to a tag type, performing semantic decoding processing based on the shared hidden variable and the auxiliary vector sequence, and outputting a target description text corresponding to a target task, including: respectively forming a group of objects to be decoded by each shared hidden variable and the auxiliary vector sequence; and respectively performing semantic decoding processing on each group of objects to be decoded, and outputting at least one target description text corresponding to the target task, wherein each output target description text is different.
Specifically, when the comment generating network samples to obtain a plurality of first shared hidden variables, the comment generating network may form a group of objects to be decoded by each first shared hidden variable and the first auxiliary vector sequence, so that the comment generating network may perform semantic decoding processing on each group of objects to be decoded according to the above manner, and output at least one target description text corresponding to the target task. For example, when the target task is to generate music commentary, and the shared content text is "power of smiling-the courtesy. When rains are happily in sunny days, the courage of the sun is occasionally lost, even if wings are soaked in frustration, dreams never stop flying far, but cry inevitably, like a person leaning into a basin and getting into the rain, the person goes forward, looks at a raining rainbow … ", and when an auxiliary description text is" inspirational ", music comments generated by a comment generation network in a text processing model can be" i feel very good at "," lyrics very inspirational ", and" power of smiling very strong ", etc. It is easy to understand that the tag generation network can decode each group of objects to be decoded respectively to obtain the corresponding target description text, or decode the objects to be decoded simultaneously to obtain a plurality of target description texts. The present embodiment is not limited thereto.
In this embodiment, the randomized shared vector sequence is sampled to obtain a plurality of first shared hidden variables, and the first auxiliary vector sequence can be decoded based on different first shared hidden variables, so as to obtain a plurality of different target description texts, thereby greatly improving the diversity of the generated target description texts.
In one embodiment, the descriptive text generation method is performed by a text processing model; the text processing model comprises a comment generation network and a label generation network; the comment generation network is used for generating a target description text belonging to the comment type according to the shared content text and the auxiliary description text belonging to the label type; and the label generation network is used for generating a target description text belonging to the label type according to the shared content text and the auxiliary description text belonging to the comment type.
Specifically, the text processing model may include a comment generation network and a tag generation network, wherein the comment generation network is used to generate the object description text belonging to the comment type, and the tag generation network is used to generate the object description text belonging to the tag type. For example, referring to fig. 4, a content coding network in a text processing model may be used to perform semantic coding processing on a shared content text, so as to obtain a shared vector sequence; the comment generation network in the text processing model can comprise an encoding structure and a decoding structure, so that the target description text of the comment type is output based on the encoding structure and the decoding structure; the tag generation network in the text processing model may include an encoding structure and a classification structure, such that the target description text of the tag type is output based on the encoding structure and the classification structure. FIG. 4 illustrates a model diagram of a text processing model in one embodiment.
In this embodiment, by setting the comment generating network and the tag generating network, the target description text of the network output comment type can be generated based on the comment, and the target description text of the network output tag type can be generated based on the tag, so that the text types of the output target description text are greatly enriched.
In one embodiment, the text processing model further includes a content coding network, which performs semantic coding processing on the shared content text and the auxiliary description text respectively to obtain a corresponding shared vector sequence and an auxiliary vector sequence, including: performing semantic coding processing on the shared content text through a content coding network to obtain a shared vector sequence corresponding to the shared content text; when the auxiliary description text belongs to the label type, performing semantic coding processing on the auxiliary description text through a comment generation network to obtain a corresponding first auxiliary vector sequence; when the auxiliary description text belongs to the comment type, performing semantic coding processing on the auxiliary description text through a tag generation network to obtain a corresponding second auxiliary vector sequence; randomizing the sharing vector sequence, and determining a sharing hidden variable based on the randomization result, comprising: when the auxiliary description text belongs to the label type, generating a network through comments, randomizing the sharing vector sequence, and determining a first sharing hidden variable based on a processing result; when the auxiliary description text belongs to the comment type, generating a network through a tag, randomizing a sharing vector sequence, and determining a second sharing hidden variable based on a processing result; performing semantic decoding processing based on the shared hidden variables and the auxiliary vector sequence, and outputting a target description text corresponding to a target task, wherein the semantic decoding processing comprises the following steps: when the auxiliary description text belongs to the label type, a network is generated through the comment, and the first auxiliary vector sequence is decoded based on the first shared hidden variable to obtain a target description text belonging to the comment type; and when the auxiliary description text belongs to the comment type, generating a network through the label, and decoding the second auxiliary vector sequence based on the second shared hidden variable to obtain the target description text belonging to the label type.
Specifically, when the shared content text is obtained, the content coding network in the text processing model may perform semantic coding processing on the shared content text to obtain a shared vector sequence. When the auxiliary description text of the label type is obtained, the comment generation network in the text processing model can carry out semantic coding processing on the auxiliary description text of the label type to obtain a corresponding first auxiliary vector sequence. When the comment type auxiliary description text is obtained, the label generation network in the text processing model can perform semantic coding processing on the label type auxiliary description text to obtain a corresponding second auxiliary vector sequence.
Further, when the shared vector sequence and the first auxiliary vector sequence are obtained, the comment generation model may randomize the shared vector sequence and sample the shared vector sequence that is subjected to the randomization, so as to obtain a first shared hidden vector, so that the comment generation model performs semantic decoding processing on the first auxiliary vector sequence based on the first shared hidden vector, and obtains a target description text belonging to a comment type.
Further, when the shared vector sequence and the second auxiliary vector sequence are obtained, the tag generation model can perform randomization processing on the shared vector sequence, and perform semantic decoding processing on the second auxiliary vector sequence based on the second shared hidden vector by using the target mean value corresponding to the result of the randomization processing as the second shared hidden vector, so as to obtain a target description text belonging to the tag type. It is easy to understand that, the randomization processing can also be directly performed on the sharing vector sequence based on the content encoding network in the text processing model, so as to obtain the randomization processing result. The present embodiment is not limited thereto.
In the embodiment, the content coding network, the comment generating network and the label generating network are arranged, so that the text processing model obtained based on the content coding network, the comment generating network and the label generating network can generate the target description text of the label type and also can generate the target description text of the comment type, and the diversity of the generated target description text is greatly enriched.
In one embodiment, the training step of the text processing model comprises: acquiring a first sample training set, wherein the first sample training set comprises a first content text, a first sample comment corresponding to the first content text and a first sample label; carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence; generating a network through comments in a text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment; generating a network through a label in a text processing model to be trained, and coding and decoding the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; the comment generation network and the label generation network are of a structure which is in dual with each other; determining a dual loss from a first difference between the first predictive tag and the first sample tag, and a second difference between the first predictive comment and the first sample comment; training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to the target task; the target description text is at least one of a target comment text and a target label text.
Specifically, the text processing model may also be trained before outputting the target description text based on the text processing model. Further, when a first sample training set is obtained, the text processing model to be trained may perform encoding processing on a first content text in the first sample training set to obtain a sample content vector sequence, perform encoding and decoding processing on the sample content vector sequence and the first sample label through a comment generation network to obtain a corresponding first prediction comment, and perform encoding and decoding processing on the sample content vector sequence and the first sample comment through a label generation network to obtain a corresponding first prediction label. Further, the computer device determines dual loss according to a first difference between a prediction label and a first sample label and a second difference between a first prediction comment and a first sample comment, trains the text processing model towards a direction of reducing the dual loss, and ends the training until a training end condition is met to obtain the trained text processing model.
In this embodiment, the text processing model is trained, so that the trained text processing model can generate a more accurate target description text.
In one embodiment, the target task is a music comment generation task or a music tag generation task; when the target task is a music comment generation task, the shared content text comprises at least one of a song name, a lyric text, a rhythm description text and an author attribute text corresponding to a target song, the auxiliary description text is a music label set corresponding to the target song, and the target description text is a music comment set corresponding to the target song; when the target task is a music tag generation task, the shared content text comprises at least one of a song name, a lyric text, a rhythm description text and an author attribute text corresponding to the target song, the auxiliary description text is a music comment set corresponding to the target song, and the target description text is a music tag set corresponding to the target song.
Specifically, when the target task is a music comment generation task, the shared content text may include at least one of a song name, a lyric text, a prosody description text, and an author attribute text corresponding to a target song of the music comment to be generated, and the auxiliary description text may be a music tag set corresponding to the target song, so that the text processing model may generate the music comment in the music comment set corresponding to the target song based on the shared content text and the auxiliary description text. The lyric text is used for recording music lyrics, for example, the lyric text can be 'cloud shower in sunny days, the courage of the sun is lost, and even if the wings are wetted, the dream does not stop flying far …'. The prosodic text is text for describing the musical prosody, and for example, the prosodic text may be "the tempo of the first measure of the present song is four-eight beats, and the tempo of the second measure is five-four beats". The author attribute text is text in which information related to an author is recorded. Wherein, the author can be a word writer, a music writer, a singer and the like; the information related to the author may be author name, author hobbies, author style, and author character, etc. For example, the author attribute text may be "make word-little square, hobby-make word, style-good at ancient style, character-gentle; composition-Xiaoming, hobby-composition, style-good creation of a soft type of melody, character-gentle; singing-little red, hobby-singing, style-good talking and singing, personality-fierce. The music tag set refers to a set including at least one music tag, for example, the music tag set may be { cure, solitary }. A music review collection refers to a collection that contains at least one music review, for example, a music review collection may be "four-and-eight-beat true very popular aesthetic", "hope your dream never stops flying far", and so on.
Correspondingly, when the target task is a music tag generation task, the shared content text may also include at least one of a song name, a lyric text, a rhythm description text, and an author attribute text corresponding to the target song, and the auxiliary description text is a music comment set corresponding to the target song, so that the target description text is a music tag set corresponding to the target song.
In the embodiment, the comment generation network can generate music comments of multiple dimensions based on the shared content texts of multiple dimensions by acquiring the shared content texts of multiple dimensions, so that the diversity of the music comments is greatly improved. In addition, by acquiring the shared content texts in multiple dimensions, the label generation network can synthesize the shared content texts in multiple dimensions to generate more appropriate music labels.
In an embodiment, as shown in fig. 5, a method for training a text processing model is provided, where the method specifically includes the following steps:
s502, a first sample training set is obtained, wherein the first sample training set comprises a first content text, a first sample comment corresponding to the first content text and a first sample label.
Specifically, the first sample training set refers to training data required for model training, and based on the first sample training set, the text processing model may adjust model parameters accordingly. The first sample training set includes a first content text, and a first sample comment and a first sample label corresponding to the first content text. Wherein the first content text refers to shared content text required to generate the first predictive tag or the first predictive comment. The first sample comment text refers to a comment text that is obtained by commenting on the first content text. The first tag text refers to a tag text obtained by classifying text attributes of the first content text. For example, the first content text may be at least one of a song title, a lyric text, a prosody description text, and an author attribute text of a sample music for which a predicted music review is to be generated; the first sample comment text can be comment text obtained by commenting on sample music; the first sample tag text may be a tag text that classifies attributes of the sample music. The first content text of the first training set, and the first sample comment and the first sample label corresponding to the first content text may be obtained from a plurality of public data sets.
Since one first content text may have a corresponding first sample label and first sample comment, the one first content text, and the first sample label and first sample comment corresponding to the first content text may be taken as one training data pair, so that the computer device may train the text processing model to be trained based on a plurality of training data pairs in the first sample training set.
S504, semantic coding processing is carried out on the first content text, and a corresponding sample content vector sequence is obtained.
S506, generating a network through the comments in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment.
S508, generating a network through a label in the text processing model to be trained, and coding and decoding the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; the comment generation network and the label generation network are of a structure which is in mutual duality.
Specifically, when the text processing model is subjected to model training, a first content text, a first sample label and a first sample comment in a training data pair may be input to the text processing model, so that the text processing model generates a network through the comment, and generates a first prediction comment based on the first content text and the first sample label; and enabling the text processing model to generate a first prediction label based on the first content text and the first sample comment through a label generation network, and adjusting model parameters according to a first difference between the first prediction label and the input first sample label and a second difference between the first prediction comment and the input first sample comment. More specifically, the text processing model may perform semantic coding processing on the first content text to obtain a sample content vector sequence corresponding to the first content text, and perform randomization processing on the sample content vector sequence to obtain a randomization processing result. The comment generation network in the text processing model to be trained can perform semantic coding processing on the first sample label, determine a corresponding first sample hidden vector based on a randomization processing result, and perform decoding processing on the first sample hidden vector and the first sample label performing the semantic coding processing to obtain a corresponding first prediction comment. The first prediction comment is a prediction comment text output by the text processing model to be trained according to the first content text and the first sample label.
Furthermore, the label generation network in the text processing model to be trained can perform semantic coding processing on the first sample comment, determine a corresponding second sample hidden vector based on a randomization processing result, and perform decoding processing on the second sample hidden vector and the first sample comment which is subjected to the semantic coding processing to obtain a corresponding first prediction label. The first prediction label refers to a prediction label text output by the text processing model to be trained according to the first content text and the first sample comment. The comment generation network and the label generation network are of a dual structure, that is, the label generation task corresponding to the label generation network and the comment task corresponding to the comment generation network are dual tasks, so that a closed-loop feedback system is formed, and therefore mutual supervision can be performed by using the association between the comment generation network and the label generation network, and dual combined training is achieved.
S510, determining a dual loss according to a first difference between the first prediction label and the first sample label and a second difference between the first prediction comment and the first sample comment.
S512, training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to the target task; the target description text is at least one of a target comment text and a target label text.
Specifically, when the first prediction label and the first prediction comment are obtained, the computer device can determine a first difference between the first prediction label and the input first sample label, determine a second difference between the first prediction comment and the input first sample comment, determine a dual loss based on the first difference and the second difference, and adjust the model parameters through the dual loss until the training end condition is met. For example, when the first content text in the training data pair is "thinking is a disease-Zhang Zhen Yue". When you are on the other side of the mountain-crossing and mountain-crossing, i'm have no end on an isolated road, ' the first sample label is ' hurt ' and the first sample comment is ' i want to hold you in your arms and i want you ' the first sample comment, the text processing model to be trained can generate a first prediction label ' sentiment ' based on the first content text and the first sample comment, generate a first prediction comment ' i want you very much ' based on the first content text and the first sample label, so that the computer device can determine the first difference through ' hurt ' and ' sentiment ', determine a second difference through ' i want you to hold you in your arms ' and ' i want ' very much ', and carry out dual joint training on a text generation network and a label generation network in the text processing model based on the first difference and the second difference. It is readily appreciated that since the first sample label and the first sample comment both correspond to the first content text, it may be considered that the text processing model is first trained via labeled training data.
In one embodiment, the dual loss L includes a comment-generated network loss term LgenGenerating network loss item L by labelclassAnd a dual constraint loss term Ldua
L=Lgen+γLclass+βLdua
Lgen=-logP(x|y,c)
Lclass=-logP(y|x,c)
Wherein x represents the first sample comment; c represents a first content text; y represents a first sample label; p (x | y, c) represents a conditional probability of outputting the first sample comment when input as the first sample tag and the first content text; p (y | x, c) represents the conditional probability of outputting the first sample tag when the input is the first sample comment and the first content text.
Further, the computer device may determine the dual constraint loss term L bydua. Specifically, for the text generation network and the label generation network in the text processing model, training can be performed through the following formulas, that is, parameter learning is performed:
Figure BDA0002922209120000301
Figure BDA0002922209120000302
wherein the f function represents a tag generation network; thetax,c→yGenerating learnable parameters in the network on behalf of the tag; p (y' | x, c; theta)x,c→y) Represents when the input x, c and the parameter are thetax,c→yThen, outputting the conditional probability of y'; y' represents the first prediction tag. The g function represents a comment generation network; thetay,c→xGenerating learnable parameters in the network on behalf of the reviews; p (x' | y, c; theta)y,c→x) Representing when the inputs y, c and the parameter are thetay,c→xThen, outputting the conditional probability of x'; x' represents the first predicted comment.
Because the text generation network and the label generation network are in a dual structure, the text generation network and the label generation network have a probabilistic correlation, and the probabilistic correlation can be represented by the following formula:
Figure BDA0002922209120000303
the purpose of model training is to obtain the parameter thetax,c→yAnd a parameter thetay,c→xSo that for the first training set of samples, the predicted results output by the text-generating network and the label-generating network are used as values close to true, and therefore, for supervised learning, the text-processing model can be trained by minimizing the first difference and the second difference:
Figure BDA0002922209120000304
Figure BDA0002922209120000311
wherein the l1 function represents a first difference; the l2 function represents a second difference; n represents the number of training data pairs contained in the first set of training samples.
Therefore, the equation constraint of dual training can be obtained by integrating the probability correlation and minimizing the first difference and the second difference:
constraint 1:
Figure BDA0002922209120000312
constraint 2:
Figure BDA0002922209120000313
constraint 3:
Figure BDA0002922209120000314
based on the equality constraint of dual training, the dual constraint loss term can be obtained:
Ldua=(logP(x|c)+logP(y|x,c;θx,c→y)-logP(y|c)-logP(x|y,c;θy,c→x))2
in the method for training the text processing model, the semantic coding processing can be performed on the shared first content text in the first sample training set by obtaining the first sample training set to obtain the corresponding sample content vector sequence, so that the comment generating network and the label generating network can perform coding and decoding processing on the shared sample content vector sequence and the corresponding first sample label or first sample comment to obtain the first prediction comment and the first prediction label. By obtaining the first prediction comment and the first prediction label, dual loss can be determined according to a first difference between the first prediction label and the first sample label and a second difference between the first prediction comment and the first sample comment, and text processing is trained on the basis of the dual loss, so that a trained text processing model is obtained. Because the target description text can be automatically output through the trained text processing model, compared with the traditional manual writing, the generation efficiency of the target description text is greatly improved.
In addition, because the label generation network and the comment generation network in the text processing model can be trained through the shared first content text, semantic information which can be shared can be fully utilized in the model training process, and therefore the accuracy of the text processing model is improved.
In one embodiment, the text processing model further comprises a connection network for connecting the comment generating network and the tag generating network, the method further comprising: acquiring a second sample training set, wherein the second sample training set comprises a second content text, a second sample label corresponding to the second content text, a third content text and a third sample comment corresponding to the third content text; training the text processing model based on the dual loss until the training end condition is met, and obtaining the trained text processing model, wherein the training end condition comprises the following steps: performing first training on a comment generation network and a label generation network in a text processing model based on dual loss until a first stopping condition is met; performing second training on a connecting network in the text processing model obtained by executing the first training through a second sample training set until a second stopping condition is met; and performing third training on the text processing model obtained by performing the second training through the first sample training set and the second sample training set until a third stopping condition is met, so as to obtain the trained text processing model.
Specifically, based on the particularity of the dual structure, model training may be performed based on unlabeled training data, for example, the unlabeled first content text and the unlabeled first sample label may be input to a comment generating network, the comment generating network outputs a first predicted comment, the first predicted comment is input to the label generating network, and the label generating network outputs the first predicted label, so that the computer device may train the text processing model by the difference between the first predicted label and the first sample label. However, because the utilization degree of the unlabeled training data is limited due to the limitations of the discrete target gradient propagation and the like, in order to further improve the utilization rate of the unlabeled training data, a connection network may be added to the text processing model, and the comment generation network and the label generation network are connected through the connection network.
For example, referring to fig. 6A, to avoid being limited by discrete target gradient propagation, a tag connection network and a comment connection network may be added in the text processing model. The label connection network points the comment generation network to the label generation network, and is used for converting the predicted comment vector sequence generated by the comment generation network into a substitute comment vector sequence and inputting the substitute comment vector sequence into the label generation network. The comment connecting network is pointed to the comment generating network by the label generating network, and is used for converting the predicted label vector sequence generated by the label generating network into an alternative label vector sequence and inputting the alternative label vector sequence into the comment generating network. The predicted comment vector sequence refers to a word vector sequence corresponding to a predicted comment to be generated by the comment generation network. The alternative comment vector sequence refers to a word vector sequence which can be directly decoded through a tag generation network to obtain a predicted tag. The predictive tag vector sequence refers to a word vector sequence corresponding to a predictive tag to be generated by the tag generation network. The substitution tag vector sequence means that the word vector sequence of the prediction comment can be obtained by directly decoding through the comment generation network. FIG. 6A illustrates a connection diagram for connecting networks in one embodiment.
Easily understood, through the addition of the tag connection network, the computer equipment can directly convert the predicted comment vector sequence in the comment generation model into the alternative comment vector sequence in the tag generation model through the tag connection network, so that the process that the comment generation model outputs the predicted comment based on the predicted comment vector sequence, the predicted comment is input into the tag generation model, the predicted comment is subjected to semantic coding by the tag generation model, the alternative comment vector sequence is obtained, and the limitation of discrete target gradient propagation is reduced. Similarly, through the increase of the comment connection network, the predicted tag vector sequence in the tag generation model can also be directly converted into a substitute tag vector sequence in the comment generation model, so that the limit of discrete target gradient propagation is reduced.
Further, in order to further improve the accuracy of the text processing model, after the first training is performed on the comment generating network and the tag generating network in the text processing model based on the dual loss, a second training and a third training may be performed on the text processing model. And the computer equipment acquires a second sample training set, performs second training on a connecting network in the text processing model obtained by executing the first training through the second sample training set until a second stopping condition is met, performs third training on the text processing model obtained by executing the second training through the first training set and the second training set until a third stopping condition is met, and obtains the trained text processing model.
In one embodiment, when the first training is performed until the first stop condition is satisfied, that is, the training end condition is considered to be satisfied, the text processing model for performing the first training is used as the trained text processing model, and when the third training is performed until the third stop condition is satisfied, the text processing model for performing the third training is used as the trained text processing model, that is, the training end condition is considered to be satisfied. The present embodiment is not limited thereto.
In this embodiment, by adding the connection network, the limitation of gradient propagation of the discrete target can be reduced, and the utilization rate of the unmarked training data is improved.
In one embodiment, the connectivity network comprises a tag connectivity network; and performing second training on the connection network in the text processing model obtained by executing the first training through a second sample training set until a second stopping condition is met, wherein the second training comprises the following steps: generating a network through comments in a text processing model obtained by executing the first training, and performing coding and decoding processing on a second content text and a second sample label to obtain a predicted comment vector sequence; converting the predicted comment vector sequence into a substitute comment vector sequence through a label connection network in a text processing model obtained by executing first training; generating a network through a label in a text processing model obtained by executing the first training, and decoding the alternative comment vector sequence to obtain a corresponding second prediction label; and determining the label difference between the second predicted label and the second sample label, and performing second training on the label connection network through the label difference until a second stop condition is met.
The second sample training set comprises a second content text, a second sample label corresponding to the second content text, a third content text and a third sample comment corresponding to the third content text.
Specifically, when the first training is executed until a first stopping condition is met, the computer device performs encoding and decoding processing on a second content text and a second sample label through a comment generation network and a content encoding network in a text processing model obtained by executing the first training in a manner of encoding and decoding the content text and the sample label text to obtain a predicted comment vector sequence, and converts the predicted comment vector sequence into a substitute comment vector sequence through a label connection network in the text processing model obtained by executing the first training, and inputs the substitute comment vector sequence to the label generation network, so that the label generation network in the text processing model can decode the substitute comment vector in the decoding manner to obtain a second predicted label. Further, the computer device determines the label difference between a second predicted label and a second sample label, and performs second training on the label connection network towards the direction of reducing the label difference until a second stopping condition is met, so as to obtain the label connection network after the pre-training is completed.
It is noted that, in the second training process, the computer device fixes the network parameters of the label generation network and the comment generation network in the text processing model, and only adjusts the network parameters of the connection network.
In one embodiment, the connection network may be a fully-connected layer, and referring to fig. 6A, when the comment generation network generates a sequence of predicted comment vectors (which may also be referred to as a comment probability distribution), the comment generation network suspends the process of determining corresponding predicted comments based on the sequence of predicted comment vectors, instead of connecting the network through tags, converting the predicted comment vectors into alternative comment vectors HxTherefore, the label generation network can decode the alternative comment vector based on the randomization processing result output by the content coding network to obtain a second prediction label, determine a corresponding label difference according to the second prediction label and the second sample label, and determine a corresponding label loss based on the label difference: LRyAnd secondly training the label connection network in a direction of reducing label loss. Wherein y' represents a second predictive label; y represents a second sample label.
In the above embodiment, by performing the second training on the tag connection network, the tag connection module after performing the second training can convert the predicted comment vector into a more accurate alternative comment vector.
In one embodiment, the connection network comprises a comment connection network; and performing second training on the connection network in the text processing model obtained by executing the first training through a second sample training set until a second stopping condition is met, wherein the second training comprises the following steps: generating a network through a label in a text processing model obtained by executing the first training, and performing coding and decoding processing on a third content text and a third sample comment to obtain a predicted label vector sequence; converting the predicted tag vector sequence into a substitute tag vector sequence through a comment connection network in a text processing model obtained by executing first training; generating a network through comments in a text processing model obtained by executing the first training, and decoding the substitute label vector sequence to obtain a corresponding second prediction comment; and determining comment differences between the second predicted comment and the third sample comment, and performing second training on the comment connecting network through the comment differences until a second stopping condition is met.
Specifically, the computer device may perform encoding and decoding processing on the third content text and the third sample comment by executing a label generation network in the text processing model obtained by the first training to obtain a predicted label vector sequence, and perform decoding processing on the predicted label vector sequence by executing a comment generation network in the text processing model obtained by the first training to obtain a second predicted comment. Further, the computer device determines comment differences between the second prediction comment and the third sample comment, and conducts second training on the comment connecting network towards the direction of reducing the comment differences until a second stopping condition is met, and the pre-trained comment connecting network is obtained. In one embodiment, the second training of the text processing model includes second training of a label connectivity network and second training of a comment connectivity network. Correspondingly, the second stopping condition includes that the label connection network executing the second training meets the preset stopping condition, or the comment connection network executing the second training meets the preset stopping condition.
In one embodiment, referring to fig. 6B, when the tag generation network generates a sequence of predicted tag vectors (also referred to as a tag probability distribution), the tag generation network suspends the process of determining corresponding predicted tags based on the sequence of predicted tag vectors, instead of connecting to the network through comments, converting the predicted tag vectors to an alternative tag vector HyTherefore, the comment generation network can decode the substitute label vector based on the randomization processing result output by the content coding network to obtain a second prediction comment, determine a corresponding comment difference according to the second prediction comment and the second sample comment, and determine a corresponding comment loss based on the comment difference: LRx-logP (x' -x) and toward reducing tag loss, networking tagsAnd performing second training. Wherein x' represents a second predicted comment; x represents the second sample comment. Fig. 6B shows a connection diagram of a connection network in another embodiment.
In one embodiment, the text processing model can be trained through the first sample training set for the second time, the computer device splits the paired first sample labels and first sample comments, and the label connection network is trained through the first content text and the corresponding first sample labels for the second time; and performing second training on the comment connecting network through the first content text and the corresponding first sample comment. That is, the computer device performs second training on the tag connection network by using the first content text and the first sample tag in the first sample training set as the second content text and the second sample tag, and performs second training on the comment connection network by using the first content text and the first sample comment in the first sample training set as the third content text and the third sample comment.
In the above embodiment, the comment connection network is subjected to the second training, so that the comment connection module after performing the second training can convert the predicted label vector into a more accurate substitute label vector.
In one embodiment, performing third training on the text processing model obtained by performing the second training through the first sample training set and the second sample training set until a third stopping condition is met to obtain a trained text processing model, including: performing corresponding branch training on the comment generation network and the label connection network in the text processing model obtained by performing second training jointly based on the first content text and the first sample label corresponding to the first content text, and the second content text and the second sample label corresponding to the second content text; performing corresponding branch training on a label generation network and a comment connection network in a text processing model obtained by performing second training jointly based on the first content text and a first sample comment corresponding to the first content text, and the third content text and a third sample comment corresponding to the third content text; and finishing the training when the third stopping condition is met based on the training of each branch to obtain the trained text processing model.
Specifically, after the second training is performed, in order to further train the text processing model by fully utilizing the unlabeled training data to improve the accuracy of the text processing model, the computer device may further perform fine tuning on each network in the text processing model obtained by performing the second training based on the first sample training set and the second sample training set. The computer device takes the first content text and the first sample label corresponding to the first content text as a pair of training data pairs, takes the second content text and the second sample label corresponding to the second content text as a pair of training data pairs, and performs corresponding branch training on the comment generation network and the label connection network jointly based on the training data pairs. Correspondingly, the computer device takes the first content text and the first sample comment corresponding to the first content text as a pair of training data pairs, takes the third content text and the third sample comment corresponding to the third content text as a pair of training data pairs, performs corresponding branch training on the label generation network and the comment connection network in a combined manner, and ends the training when it is determined that a third stopping condition is met based on each branch training, so as to obtain a trained text processing model.
In one embodiment, because the comment generating network and the label generating network form a dual structure and form a closed loop in the data conversion process, the unsupervised second training and third training can be performed through the characteristic, and therefore the unmarked data is fully utilized. For better understanding of the unsupervised training, the following description will take as an example the third training of the text processing model based on the first content text and the first sample label corresponding to the first content text. Since the comment generating network and the tag generating network constitute a dual structure, there is the following relationship:
g(f(x,c))≡x
f(g(y,c))≡y
the f function represents a label generation network, the g function represents a comment generation network, x represents a sample comment, c represents a content text, and y represents a sample label.
When the first content text and the first sample label are input, the first content text and the first sample label can be coded and decoded through the comment generation network, so that probability distribution corresponding to a third predicted comment to be output is obtained, namely a predicted comment vector corresponding to the third predicted comment to be output is obtained:
Hy=Encoder(y)
Px=Decoder(Hy,z)
wherein HyThe first sample label is coded to obtain a coding result; pxThe direction of the predicted comment corresponding to the third predicted comment to be output is shown; z represents a shared hidden vector corresponding to the first content text.
For convenience of description, in the process of performing the third training, the predicted comment of the comment generation network output is referred to as a third predicted comment, and the predicted label of the label generation network output is referred to as a third predicted label. When the predicted comment vector corresponding to the third predicted comment to be output is obtained, the predicted comment vector corresponding to the third predicted comment to be output can be converted into a substitute comment vector corresponding to the third predicted comment to be output through the tag connection network, so that the process that the discrete target is input into the tag generation network and is encoded on the basis of the tag generation network is skipped:
H′x=Connection(Px)
y′=argmaxs′∈S(Py(s′))
Py=Classifier(Hx′,z)
wherein, H'xRepresenting an alternative comment vector corresponding to a third predicted comment to be output; the alternative comment vector corresponding to the third prediction comment to be output means that it can be used for direct decoding, resulting in a probability distribution of the prediction tag. S represents a label text library; z represents a shared hidden vector; y' represents a third prediction tag.
Further, when the passing mark is obtainedAnd when the label generation model is used for decoding a third prediction label obtained by replacing the comment vector, the computer equipment can determine the corresponding label branch loss based on the difference between the third prediction label and the first sample label: lbyAnd performing third training on the comment generating network and the label connecting network in the direction of reducing the label branch loss. Similarly, when performing corresponding branch training on the label generation network and the comment connection network in the text processing model obtained by performing the second training jointly based on the first content text and the first sample comment corresponding to the first content text, and the third content text and the third sample comment corresponding to the third content text, the computer device may further determine, when obtaining a third predicted comment output by the comment generation model, a corresponding comment branch loss based on a difference between the third predicted comment and the corresponding first sample comment (or the third sample comment): lbxAnd performing third training on the comment generating network and the label connecting network towards the direction of reducing the comment branch loss.
In the above embodiment, the third training is performed on the text processing model through the connection network obtained by performing the second training, and the process of re-encoding the discrete target after inputting the discrete target can be skipped, so that not only is the encoding efficiency of the model improved, but also the propagation limit of the gradient of the discrete target is reduced, so that the unmarked training data can be fully utilized to perform unsupervised training on the text processing model, and the accuracy of the text processing model is greatly improved.
In addition, because the supervised first training can be performed on the text processing model based on the paired first sample label and first sample comment, and the unsupervised second training and third training can be performed on the text processing model based on the unpaired second label and third sample comment, various types of training data are fully utilized, and the accuracy of the text processing model is improved based on the fully utilized training data.
In another specific embodiment, as shown in fig. 7, the description text generation method provided by the present application includes the following steps:
s702, determining a target task and acquiring a clue keyword corresponding to the target task.
S704, retrieving based on the clue keywords to obtain a shared content text and an auxiliary description text which are matched with the target task; the text type of the auxiliary description text is one of a comment type and a tag type.
And S706, determining a shared word sequence corresponding to the shared content text, and determining an auxiliary word sequence corresponding to the auxiliary description text.
S708, respectively performing forward semantic coding and reverse semantic coding on the shared word sequence to obtain a forward shared coding vector sequence and a reverse shared coding vector sequence; and carrying out sequence fusion processing on the forward sharing coding vector sequence and the reverse sharing coding vector sequence to obtain a sharing vector sequence corresponding to the shared content text.
S710, respectively carrying out forward semantic coding and reverse semantic coding on the auxiliary word sequence to obtain a forward auxiliary coding vector sequence and a reverse auxiliary coding vector sequence; and carrying out sequence fusion processing on the forward auxiliary coding vector sequence and the reverse auxiliary coding vector sequence to obtain an auxiliary vector sequence corresponding to the auxiliary description text.
And S712, converting the sharing vector sequence into corresponding target probability distribution, and determining corresponding target mean and target variance based on the target probability distribution.
And S714, when the auxiliary description text belongs to the label type, performing at least one sampling based on the target probability distribution according to the target mean value and the target variance to obtain at least one shared hidden variable corresponding to the auxiliary description text, and forming a group of objects to be decoded by each shared hidden variable and the auxiliary vector sequence respectively.
S716, decoding the first auxiliary vector sequence based on the first shared hidden variable in a group of objects to be decoded and the word vector of the target word output last time to obtain a current target end vector sequence; determining a target word output at the current time according to the vector sequence of the target end at the current time; and constructing a target description text which corresponds to a group of objects to be decoded and belongs to the comment type based on the output target words.
S718, when the auxiliary description text belongs to the comment type, taking the target mean value as a shared hidden variable corresponding to the auxiliary description text; and performing semantic decoding processing based on the shared hidden variables and the auxiliary vector sequence, and outputting a target description text belonging to the label type.
In the above embodiment, the shared content text and the auxiliary description text corresponding to the target task are obtained, and the shared content text and the auxiliary description text can be encoded to obtain a shared vector sequence and an auxiliary vector sequence. By acquiring the shared vector sequence which can be shared between the two target tasks, the corresponding shared hidden variable can be determined based on the shared vector sequence, so that semantic decoding processing is performed based on the shared hidden variable and the auxiliary vector sequence, and a target description text corresponding to the target tasks is obtained. Because the target description text can be automatically output, compared with the traditional manual writing, the target description text generation method and device greatly improve the generation efficiency of the target description text. In another embodiment, as shown in fig. 8, the method for training the text processing model provided by the present application includes the following steps:
s802, a first sample training set is obtained, wherein the first sample training set comprises a first content text, a first sample comment corresponding to the first content text and a first sample label; and carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence.
S804, a network is generated through the comments in the text processing model to be trained, and encoding and decoding processing is carried out on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment.
S806, generating a network through a label in the text processing model to be trained, and coding and decoding the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; the comment generation network and the label generation network are of a structure which is in mutual duality.
S808, determining a dual loss according to a first difference between the first prediction label and the first sample label and a second difference between the first prediction comment and the first sample comment; and performing first training on the comment generation network and the label generation network in the text processing model based on the dual loss until a first stopping condition is met.
And S810, acquiring a second sample training set, wherein the second sample training set comprises a second content text, a second sample label corresponding to the second content text, a third content text and a third sample comment corresponding to the third content text.
S812, generating a network through comments in a text processing model obtained by executing the first training, and coding and decoding a second content text and a second sample label to obtain a predicted comment vector sequence; and converting the predicted comment vector sequence into a substitute comment vector sequence through a label connection network in the text processing model obtained by executing the first training.
S814, generating a network through a label in the text processing model obtained by executing the first training, and decoding the alternative comment vector sequence to obtain a corresponding second prediction label; and determining the label difference between the second predicted label and the second sample label, and performing second training on the label connection network through the label difference until a second stop condition is met.
S816, a network is generated through executing a label in a text processing model obtained through the first training, and encoding and decoding processing is carried out on a third content text and a third sample comment to obtain a predicted label vector sequence; and converting the predicted label vector sequence into a substitute label vector sequence by a comment connection network in the text processing model obtained by executing the first training.
S818, generating a network through the comments in the text processing model obtained by executing the first training, and decoding the substitute label vector sequence to obtain a corresponding second prediction comment; and determining comment differences between the second predicted comment and the third sample comment, and performing second training on the comment connecting network through the comment differences until a second stopping condition is met.
S820, based on the first content text and the first sample label corresponding to the first content text, and the second content text and the second sample label corresponding to the second content text, performing corresponding branch training jointly on a comment generation network and a label connection network in a text processing model obtained by executing second training; and performing corresponding branch training jointly on a label generation network and a comment connection network in the text processing model obtained by performing the second training based on the first content text and the first sample comment corresponding to the first content text, and the third content text and the third sample comment corresponding to the third content text.
And S822, finishing the training when the third stopping condition is met based on the training of each branch, so as to obtain the trained text processing model.
It should be understood that although the various steps in the flowcharts of fig. 2, 5, 7-8 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 5, 7-8 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least some of the other steps.
The application also provides an application scene, and the application scene applies the description text generation method. Specifically, the description text generation method is applied to the application scenario as follows:
when the target task is a news comment generation task, the computer equipment can acquire clue keywords corresponding to the target task, and network retrieval is performed on the basis of the clue keywords to obtain complete news content and news tags corresponding to the clue keywords. The computer equipment inputs the complete news content and news labels into the text processing model, and the text processing model outputs at least one news comment based on the input news content and news labels according to the description text generation method. Similarly, when the target task is a news tag generation task, the computer device may input news content and news comments to the text processing model, so that the text processing model may output a news tag correspondingly.
The application further provides an application scenario applying the description text generation method. Specifically, the description text generation method is applied to the application scenario as follows:
when the target task is a poetry generating task, the computer equipment can obtain a poetry title and a poetry label (also called a poetry situation) of the poetry to be generated, input the poetry title and the poetry label into the text processing model, and output the corresponding poetry by the text processing model. Correspondingly, when the target task is a poetry label generation task, the computer equipment can obtain a poetry title and complete poetry content, so that the text processing model can output the corresponding poetry label based on the poetry title and the complete poetry content.
It is easy to understand that the above-mentioned description text generation method can be used not only for outputting music comments and music labels, news comments and news labels, poems and poem labels, but also for outputting blog comments and blog labels, novel comments and novel labels, etc. It is to be understood that the above scenarios are illustrative only and are not intended to limit the present application.
In one embodiment, as shown in fig. 9, there is provided a descriptive text generating apparatus 900, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, specifically including: a text acquisition module 902, a randomization module 904, and a decoding module 906, wherein:
a text obtaining module 902, configured to obtain a shared content text and an auxiliary description text corresponding to the target task; the text type of the auxiliary description text is one of a comment type and a tag type.
A randomization processing module 904, configured to perform semantic coding processing on the shared content text and the auxiliary description text, respectively, to obtain a corresponding shared vector sequence and an auxiliary vector sequence; and randomizing the sharing vector sequence, and determining a sharing hidden variable based on the randomization processing result.
And the decoding module 906 is configured to perform semantic decoding processing based on the shared hidden variable and the auxiliary vector sequence, and output a target description text corresponding to the target task, where a text type of the target description text is one of a comment type and a tag type, and the text type of the target description text is different from that of the auxiliary description text.
In one embodiment, the text obtaining module 902 is further configured to determine a target task and obtain a clue keyword corresponding to the target task; and searching based on the clue keywords to obtain a shared content text and an auxiliary description text which are matched with the target task.
In one embodiment, the randomization processing module 904 further includes a vector sequence determination module 9041, configured to determine a shared word sequence corresponding to the shared content text, and determine an auxiliary word sequence corresponding to the auxiliary description text; respectively carrying out forward semantic coding and reverse semantic coding on the shared word sequence to obtain a forward shared coding vector sequence and a reverse shared coding vector sequence; carrying out sequence fusion processing on the forward sharing coding vector sequence and the reverse sharing coding vector sequence to obtain a sharing vector sequence corresponding to the shared content text; respectively carrying out forward semantic coding and reverse semantic coding on the auxiliary word sequence to obtain a forward auxiliary coding vector sequence and a reverse auxiliary coding vector sequence; and carrying out sequence fusion processing on the forward auxiliary coding vector sequence and the reverse auxiliary coding vector sequence to obtain an auxiliary vector sequence corresponding to the auxiliary description text.
In one embodiment, the randomization processing module 904 further includes a hidden variable determination module 9042, configured to convert the sequence of sharing vectors into a corresponding target probability distribution, and determine a corresponding target mean and a target variance based on the target probability distribution; when the auxiliary description text belongs to the label type, performing at least one sampling based on target probability distribution according to a target mean value and a target variance to obtain at least one shared hidden variable corresponding to the auxiliary description text; and when the auxiliary description text belongs to the comment type, taking the target mean value as a shared hidden variable corresponding to the auxiliary description text.
In one embodiment, the decoding module 906 is further configured to combine each shared hidden variable with an auxiliary vector sequence to form a group of objects to be decoded; and respectively performing semantic decoding processing on each group of objects to be decoded, and outputting at least one target description text corresponding to the target task, wherein each output target description text is different.
In one embodiment, the descriptive text processing apparatus 900 is configured to execute a descriptive text processing method, the descriptive text processing method being executed by a text processing model; the text processing model comprises a comment generation network and a label generation network; the comment generation network is used for generating a target description text belonging to the comment type according to the shared content text and the auxiliary description text belonging to the label type; and the label generation network is used for generating a target description text belonging to the label type according to the shared content text and the auxiliary description text belonging to the comment type.
In one embodiment, the descriptive text processing apparatus 900 is further configured to perform semantic coding processing on the shared content text through the content coding network, so as to obtain a shared vector sequence corresponding to the shared content text; when the auxiliary description text belongs to the label type, performing semantic coding processing on the auxiliary description text through a comment generation network to obtain a corresponding first auxiliary vector sequence; when the auxiliary description text belongs to the comment type, performing semantic coding processing on the auxiliary description text through a tag generation network to obtain a corresponding second auxiliary vector sequence; when the auxiliary description text belongs to the label type, generating a network through comments, randomizing the sharing vector sequence, and determining a first sharing hidden variable based on a processing result; when the auxiliary description text belongs to the comment type, generating a network through a tag, randomizing a sharing vector sequence, and determining a second sharing hidden variable based on a processing result; when the auxiliary description text belongs to the label type, a network is generated through the comment, and the first auxiliary vector sequence is decoded based on the first shared hidden variable to obtain a target description text belonging to the comment type; and when the auxiliary description text belongs to the comment type, generating a network through the label, and decoding the second auxiliary vector sequence based on the second shared hidden variable to obtain the target description text belonging to the label type.
In one embodiment, the decoding module 906 is further configured to generate a network through the comment, and decode the first auxiliary vector sequence based on the first shared hidden variable and the word vector of the target word output last time to obtain a current target-end vector sequence; generating a network through the comment, and determining a target word output at the current time according to the current target end vector sequence; and forming a target description text belonging to the comment type based on each output target word through the comment generation network.
In one embodiment, the descriptive text processing apparatus 900 is further configured to obtain a first sample training set, where the first sample training set includes the first content text, and the first sample comment and the first sample tag corresponding to the first content text; carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence; generating a network through comments in a text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment; generating a network through a label in a text processing model to be trained, and coding and decoding the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; the comment generation network and the label generation network are of a structure which is in dual with each other; determining a dual loss from a first difference between the first predictive tag and the first sample tag, and a second difference between the first predictive comment and the first sample comment; training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to the target task; the target description text is at least one of a target comment text and a target label text.
In one embodiment, the description text processing apparatus 900 is further configured to, when the target task is a music comment generation task, include at least one of a song name, a lyric text, a prosody description text, and an author attribute text corresponding to the target song, where the auxiliary description text is a music tag set corresponding to the target song, and the target description text is a music comment set corresponding to the target song; when the target task is a music tag generation task, the shared content text comprises at least one of a song name, a lyric text, a rhythm description text and an author attribute text corresponding to the target song, the auxiliary description text is a music comment set corresponding to the target song, and the target description text is a music tag set corresponding to the target song.
As shown in fig. 10, there is provided a training apparatus 1000 for a text processing model, which may adopt a software module or a hardware module, or a combination of the two modules, as a part of a computer device, and specifically includes: an encoding module 1002, an output module 1004, and a training module 1006, wherein:
the encoding module 1002 is configured to obtain a first sample training set, where the first sample training set includes a first content text, and a first sample comment and a first sample tag that correspond to the first content text; carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence;
the output module 1004 is used for generating a network through comments in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment; generating a network through a label in a text processing model to be trained, and coding and decoding the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; the comment generation network and the label generation network are of a structure which is in dual with each other;
a training module 1006 for determining a dual loss based on a first difference between the first predictive label and the first sample label, and a second difference between the first predictive comment and the first sample comment; training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to the target task; the target description text is at least one of a target comment text and a target label text.
In one embodiment, the text processing model further includes a connection network, the connection network is used for connecting the comment generating network and the tag generating network, and the training module 1006 is further used for performing a first training on the comment generating network and the tag generating network in the text processing model based on the dual loss until a first stopping condition is met; acquiring a second sample training set, wherein the second sample training set comprises a second content text, a second sample label corresponding to the second content text, a third content text and a third sample comment corresponding to the third content text; performing second training on a connecting network in the text processing model obtained by executing the first training through a second sample training set until a second stopping condition is met; and performing third training on the text processing model obtained by performing the second training through the first sample training set and the second sample training set until a third stopping condition is met, so as to obtain the trained text processing model.
In one embodiment, the connectivity network comprises a tag connectivity network; the training module 1006 is further configured to generate a network by executing a comment in the text processing model obtained by the first training, and perform encoding and decoding processing on the second content text and the second sample tag to obtain a predicted comment vector sequence; converting the predicted comment vector sequence into a substitute comment vector sequence through a label connection network in a text processing model obtained by executing first training; generating a network through a label in a text processing model obtained by executing the first training, and decoding the alternative comment vector sequence to obtain a corresponding second prediction label; and determining the label difference between the second predicted label and the second sample label, and performing second training on the label connection network through the label difference until a second stop condition is met.
In one embodiment, the connection network comprises a comment connection network; the training module 1006 is further configured to perform coding and decoding processing on the third content text and the third sample comment by using a label generation network in the text processing model obtained by performing the first training, so as to obtain a predicted label vector sequence; converting the predicted tag vector sequence into a substitute tag vector sequence through a comment connection network in a text processing model obtained by executing first training; generating a network through comments in a text processing model obtained by executing the first training, and decoding the substitute label vector sequence to obtain a corresponding second prediction comment; and determining comment differences between the second predicted comment and the third sample comment, and performing second training on the comment connecting network through the comment differences until a second stopping condition is met.
In one embodiment, the training module 1006 is further configured to perform corresponding branch training on a comment generation network and a label connection network in the text processing model obtained by performing the second training, based on the first content text and a first sample label corresponding to the first content text, and the second content text and a second sample label corresponding to the second content text; performing corresponding branch training jointly on a label generation network and a comment connection network in a text processing model obtained by performing second training on the basis of the first content text and a first sample comment corresponding to the first content text, and the third content text and a third sample comment corresponding to the third content text; and finishing the training when the third stopping condition is met based on the training of each branch to obtain the trained text processing model.
For specific limitations of the training apparatus describing the text generation apparatus and the text processing model, reference may be made to the above limitations of the training method describing the text generation method and the text processing model, which are not described herein again. The various modules in the training apparatus described above describing the text generation apparatus and the text processing model may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store descriptive text processing data and model training data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a training method describing a text generation method and a text processing model.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A description text generation method, characterized in that the method comprises:
acquiring a shared content text and an auxiliary description text corresponding to a target task; the text type of the auxiliary description text is one of a comment type and a tag type;
semantic coding processing is respectively carried out on the shared content text and the auxiliary description text to obtain a corresponding shared vector sequence and an auxiliary vector sequence;
randomizing the sharing vector sequence, and determining a sharing hidden variable based on a randomizing result;
semantic decoding processing is carried out on the basis of the shared hidden variables and the auxiliary vector sequence, a target description text corresponding to the target task is output, the text type of the target description text is one of comment type and label type, and the text type of the target description text is different from that of the auxiliary description text.
2. The method according to claim 1, wherein the semantic coding the shared content text and the auxiliary description text respectively to obtain a corresponding shared vector sequence and an auxiliary vector sequence comprises:
determining a shared word sequence corresponding to the shared content text, and determining an auxiliary word sequence corresponding to the auxiliary description text;
respectively carrying out forward semantic coding and reverse semantic coding on the shared word sequence to obtain a forward shared coding vector sequence and a reverse shared coding vector sequence;
performing sequence fusion processing on the forward sharing coding vector sequence and the reverse sharing coding vector sequence to obtain a sharing vector sequence corresponding to the shared content text;
respectively carrying out forward semantic coding and reverse semantic coding on the auxiliary word sequence to obtain a forward auxiliary coding vector sequence and a reverse auxiliary coding vector sequence;
and carrying out sequence fusion processing on the forward auxiliary coding vector sequence and the backward auxiliary coding vector sequence to obtain an auxiliary vector sequence corresponding to the auxiliary description text.
3. The method of claim 1, wherein randomizing the sequence of shared vectors and determining a shared hidden variable based on the result of the randomization comprises:
converting the shared vector sequence into a corresponding target probability distribution, and determining a corresponding target mean and a target variance based on the target probability distribution;
when the auxiliary description text belongs to the label type, performing at least one sampling based on the target probability distribution according to the target mean and the target variance to obtain at least one shared hidden variable corresponding to the auxiliary description text;
when the auxiliary description text belongs to the comment type, the target mean value is used as a shared hidden variable corresponding to the auxiliary description text.
4. The method according to claim 3, wherein when the auxiliary description text belongs to a tag type, performing semantic decoding processing based on the shared hidden variable and the auxiliary vector sequence to output a target description text corresponding to the target task includes:
respectively forming a group of objects to be decoded by each shared hidden variable and the auxiliary vector sequence;
and respectively performing semantic decoding processing on each group of objects to be decoded, and outputting at least one target description text corresponding to the target task, wherein each output target description text is different.
5. The method of claim 1, wherein the method is performed by a text processing model; the text processing model comprises a comment generation network and a label generation network; the comment generation network is used for generating a target description text belonging to a comment type according to the shared content text and the auxiliary description text belonging to a tag type; and the label generation network is used for generating a target description text belonging to the label type according to the shared content text and the auxiliary description text belonging to the comment type.
6. The method according to claim 5, wherein the text processing model further includes a content coding network, and the semantic coding processing is performed on the shared content text and the auxiliary description text respectively to obtain a corresponding shared vector sequence and an auxiliary vector sequence, including:
performing semantic coding processing on the shared content text through the content coding network to obtain a shared vector sequence corresponding to the shared content text;
when the auxiliary description text belongs to the label type, performing semantic coding processing on the auxiliary description text through the comment generation network to obtain a corresponding first auxiliary vector sequence;
when the auxiliary description text belongs to the comment type, performing semantic coding processing on the auxiliary description text through the tag generation network to obtain a corresponding second auxiliary vector sequence;
the randomizing the sharing vector sequence and determining a sharing hidden variable based on a randomizing result comprise:
when the auxiliary description text belongs to the type of the label, generating a network through the comment, randomizing the sharing vector sequence, and determining a first sharing hidden variable based on a processing result;
when the auxiliary description text belongs to a comment type, generating a network through the tag, randomizing the sharing vector sequence, and determining a second sharing hidden variable based on a processing result;
performing semantic decoding processing based on the shared hidden variable and the auxiliary vector sequence, and outputting a target description text corresponding to the target task, including:
when the auxiliary description text belongs to the label type, generating a network through the comment, and decoding the first auxiliary vector sequence based on the first shared hidden variable to obtain a target description text belonging to the comment type;
and when the auxiliary description text belongs to the comment type, generating a network through the tag, and decoding the second auxiliary vector sequence based on the second shared hidden variable to obtain a target description text belonging to the tag type.
7. The method according to any one of claims 1 to 6, wherein the target task is a music comment generation task or a music tag generation task;
when the target task is a music comment generation task, the shared content text comprises at least one of a song name, a lyric text, a rhythm description text and an author attribute text corresponding to a target song, the auxiliary description text is a music label set corresponding to the target song, and the target description text is a music comment set corresponding to the target song;
when the target task is a music tag generation task, the shared content text comprises at least one of a song name, a lyric text, a rhythm description text and an author attribute text corresponding to a target song, the auxiliary description text is a music comment set corresponding to the target song, and the target description text is a music tag set corresponding to the target song.
8. A method for training a text processing model, the method comprising:
acquiring a first sample training set, wherein the first sample training set comprises a first content text, a first sample comment and a first sample label, and the first sample comment and the first sample label correspond to the first content text;
carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence;
generating a network through comments in a text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment;
generating a network through a label in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; wherein the comment generating network and the tag generating network are in a dual structure;
determining a pair loss from a first difference between the first predictive tag and the first sample tag and a second difference between the first predictive comment and the first sample comment;
training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to a target task; the target description text is at least one of a target comment text and a target label text.
9. The method of claim 8, wherein the text processing model further comprises a connection network for connecting the comment generating network and the tag generating network, the method further comprising:
acquiring a second sample training set, wherein the second sample training set comprises a second content text, a second sample label corresponding to the second content text, a third content text and a third sample comment corresponding to the third content text;
training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model, wherein the training is finished by the method comprising the following steps:
performing first training on the comment generating network and the label generating network in the text processing model based on the dual loss until a first stopping condition is met;
performing second training on a connecting network in the text processing model obtained by executing the first training through the second sample training set until a second stopping condition is met;
and performing third training on the text processing model obtained by executing the second training through the first sample training set and the second sample training set until a third stopping condition is met, so as to obtain the trained text processing model.
10. The method of claim 9, wherein the connectivity network comprises a tag connectivity network; performing second training on the connection network in the text processing model obtained by executing the first training through the second sample training set until a second stopping condition is met, including:
generating a network through comments in a text processing model obtained by executing first training, and performing coding and decoding processing on the second content text and the second sample label to obtain a predicted comment vector sequence;
converting the predicted comment vector sequence into a substitute comment vector sequence through a label connection network in a text processing model obtained by executing first training;
generating a network through a label in a text processing model obtained by executing first training, and decoding the alternative comment vector sequence to obtain a corresponding second prediction label;
and determining the label difference between the second prediction label and the second sample label, and performing second training on the label connection network through the label difference until a second stop condition is met.
11. The method of claim 9, wherein the connection network comprises a comment connection network; performing second training on the connection network in the text processing model obtained by executing the first training through the second sample training set until a second stopping condition is met, including:
performing encoding and decoding processing on the third content text and the third sample comment through a label generation network in a text processing model obtained by executing first training to obtain a predicted label vector sequence;
converting the predicted tag vector sequence into a substitute tag vector sequence through a comment connection network in a text processing model obtained by executing first training;
generating a network through comments in a text processing model obtained by executing first training, and decoding the substitute label vector sequence to obtain a corresponding second prediction comment;
and determining comment differences between the second predicted comment and the third sample comment, and performing second training on the comment connecting network through the comment differences until a second stopping condition is met.
12. The method according to claim 9, wherein performing a third training on the text processing model obtained by performing the second training through the first sample training set and the second sample training set until a third stopping condition is met to obtain a trained text processing model, comprises:
performing corresponding branch training on the comment generation network and the label connection network in the text processing model obtained by performing second training jointly based on the first content text and the first sample label corresponding to the first content text, and the second content text and the second sample label corresponding to the second content text;
performing corresponding branch training on a label generation network and a comment connection network in a text processing model obtained by performing second training jointly based on the first content text and a first sample comment corresponding to the first content text, and the third content text and a third sample comment corresponding to the third content text;
and finishing the training when the third stopping condition is met based on the training of each branch to obtain the trained text processing model.
13. A descriptive text generating apparatus, characterized in that the apparatus comprises:
the text acquisition module is used for acquiring a shared content text and an auxiliary description text corresponding to the target task; the text type of the auxiliary description text is one of a comment type and a tag type;
the randomization processing module is used for respectively carrying out semantic coding processing on the shared content text and the auxiliary description text to obtain a corresponding shared vector sequence and an auxiliary vector sequence; randomizing the sharing vector sequence, and determining a sharing hidden variable based on a randomizing result;
and the decoding module is used for performing semantic decoding processing on the basis of the shared hidden variable and the auxiliary vector sequence and outputting a target description text corresponding to the target task, wherein the text type of the target description text is one of a comment type and a tag type, and the text type of the target description text is different from that of the auxiliary description text.
14. An apparatus for training a text processing model, the apparatus comprising:
the encoding module is used for acquiring a first sample training set, wherein the first sample training set comprises a first content text, a first sample comment and a first sample label, and the first sample comment and the first sample label correspond to the first content text; carrying out semantic coding processing on the first content text to obtain a corresponding sample content vector sequence;
the output module is used for generating a network through comments in a text processing model to be trained, and coding and decoding the sample content vector sequence and the first sample label to obtain a corresponding first prediction comment; generating a network through a label in the text processing model to be trained, and performing coding and decoding processing on the sample content vector sequence and the first sample comment to obtain a corresponding first prediction label; wherein the comment generating network and the tag generating network are in a dual structure;
a training module to determine a dual loss based on a first difference between the first predictive label and the first sample label and a second difference between the first predictive comment and the first sample comment; training the text processing model based on the dual loss until the training end condition is met, and obtaining a trained text processing model; the trained text processing model is used for generating a target description text corresponding to a target task; the target description text is at least one of a target comment text and a target label text.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.
CN202110121602.2A 2021-01-28 2021-01-28 Description text generation method and device and text processing model training method Pending CN113590983A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110121602.2A CN113590983A (en) 2021-01-28 2021-01-28 Description text generation method and device and text processing model training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110121602.2A CN113590983A (en) 2021-01-28 2021-01-28 Description text generation method and device and text processing model training method

Publications (1)

Publication Number Publication Date
CN113590983A true CN113590983A (en) 2021-11-02

Family

ID=78238147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110121602.2A Pending CN113590983A (en) 2021-01-28 2021-01-28 Description text generation method and device and text processing model training method

Country Status (1)

Country Link
CN (1) CN113590983A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114395A (en) * 2022-04-15 2022-09-27 腾讯科技(深圳)有限公司 Content retrieval and model training method and device, electronic equipment and storage medium
CN116579339A (en) * 2023-07-12 2023-08-11 阿里巴巴(中国)有限公司 Task execution method and optimization task execution method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114395A (en) * 2022-04-15 2022-09-27 腾讯科技(深圳)有限公司 Content retrieval and model training method and device, electronic equipment and storage medium
CN115114395B (en) * 2022-04-15 2024-03-19 腾讯科技(深圳)有限公司 Content retrieval and model training method and device, electronic equipment and storage medium
CN116579339A (en) * 2023-07-12 2023-08-11 阿里巴巴(中国)有限公司 Task execution method and optimization task execution method
CN116579339B (en) * 2023-07-12 2023-11-14 阿里巴巴(中国)有限公司 Task execution method and optimization task execution method

Similar Documents

Publication Publication Date Title
CN110321417B (en) Dialog generation method, system, readable storage medium and computer equipment
CN110717017B (en) Method for processing corpus
CN113762322A (en) Video classification method, device and equipment based on multi-modal representation and storage medium
WO2021174898A1 (en) Method and device for compositing action sequence of virtual object
CN109635253A (en) Text style conversion method, device and storage medium, computer equipment
CN110796160A (en) Text classification method, device and storage medium
CN113723166A (en) Content identification method and device, computer equipment and storage medium
CN112131883A (en) Language model training method and device, computer equipment and storage medium
CN113407663B (en) Image-text content quality identification method and device based on artificial intelligence
Yazdian et al. Gesture2Vec: Clustering gestures using representation learning methods for co-speech gesture generation
Wang et al. Cross-modal enhancement network for multimodal sentiment analysis
CN113590983A (en) Description text generation method and device and text processing model training method
CN113392265A (en) Multimedia processing method, device and equipment
CN114282055A (en) Video feature extraction method, device and equipment and computer storage medium
CN113761841A (en) Method for converting text data into acoustic features
Khurram et al. Dense-captionnet: a sentence generation architecture for fine-grained description of image semantics
CN117574904A (en) Named entity recognition method based on contrast learning and multi-modal semantic interaction
CN113641854B (en) Method and system for converting text into video
CN111368531B (en) Translation text processing method and device, computer equipment and storage medium
Wang et al. MT-TCCT: Multi-task learning for multimodal emotion recognition
CN114611529B (en) Intention recognition method and device, electronic equipment and storage medium
CN115169472A (en) Music matching method and device for multimedia data and computer equipment
CN115130461A (en) Text matching method and device, electronic equipment and storage medium
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium
CN112115718B (en) Content text generation method and device and music comment text generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40054069

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination