CN114861610A

CN114861610A - Title generation method and device, storage medium and electronic equipment

Info

Publication number: CN114861610A
Application number: CN202210502505.2A
Authority: CN
Inventors: 保俊杉
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-08-05

Abstract

The application discloses a title generation method and device, a storage medium and electronic equipment, and belongs to the field of computers. Wherein, the method comprises the following steps: determining a plurality of title styles of a title to be generated; inputting the title materials and the title styles into a pre-trained T5 model to generate a plurality of alternative titles, wherein each alternative title corresponds to one title style; and selecting one of the candidate titles to be output as a target title. The title generation method and device solve the technical problem that titles in preset styles cannot be automatically generated in the related art, and improve the title generation efficiency.

Description

Title generation method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of computers, and in particular, to a title generation method and apparatus, a storage medium, and an electronic device.

Background

In the related art, whether media content such as articles or short videos pushed by a terminal is clicked by a user is often related to the attraction of title content. In the related art, the pushed title is often too straight and white to attract the attention of the user.

In the related art, the problem that the original title is changed into an attractive title is popular in academia and difficult to overcome, but the rewriting of the title in the related art does not meet the requirements and effects of industry landing because the accuracy of the rewritten title is not high enough and the diversity (generating titles of different styles) is not enough. Due to the difficult acquireability of the training set (it is difficult to find a title corresponding to another genre), the effect of generating a title of a specific genre based on the original media content is general, and the generation of a title of a specific genre is desired mainly by manual editing.

In view of the above problems in the related art, no effective solution has been found at present.

Disclosure of Invention

In order to solve the technical problems or at least partially solve the technical problems, the application provides a title generation method and device, a storage medium and an electronic device.

According to an aspect of an embodiment of the present application, there is provided a title generation method, including: determining a plurality of title styles of a title to be generated; inputting the title materials and the title styles into a pre-trained T5 model to generate a plurality of alternative titles, wherein each alternative title corresponds to one title style; and selecting one of the candidate titles to be output as a target title.

Further, inputting the title material and the plurality of title styles into a pre-trained T5 model, and generating a plurality of alternative titles comprises: for each title style, generating structural parameters of a conditional layer normalized LN in the pre-trained T5 model according to the title style; inputting the title material into an adjusted pre-trained T5 model, and outputting a title character string and probability information of each title character in the title character string; and performing bundle searching on the title character strings based on the probability information to generate corresponding alternative titles.

Further, generating structural parameters of the conditional layer normalized LN in the pre-trained T5 model according to the headline style includes: converting the title style into a vector matrix with a row and M columns, wherein M is a positive integer; sampling the vector matrix multiplied by an N-dimensional full-connection layer to obtain an M-row and N-column full-connection matrix, wherein N is a positive integer; and summing M rows in the full-connection matrix to obtain the structural parameters.

Further, prior to inputting the title material and the plurality of title styles into the pre-trained T5 model, the method further comprises: extracting a sample title material and a sample title, and configuring a first title style of the sample title; inputting the sample title material, the first title style and the sample title into an initial T5 model and outputting an intermediate text; calculating gradient descent parameters of the initial T5 model according to the sample titles and the intermediate texts, and updating the initial T5 model based on the gradient descent parameters to obtain the pre-trained T5 model.

Further, calculating a gradient descent parameter of the initial T5 model from the sample title and the intermediate text comprises: calculating a content difference value between the intermediate text and the sample title, and calculating a style difference value between the intermediate text and the sample title; configuring the content difference value and the style difference value as a content loss value and a style loss value of the initial T5 model, respectively; and carrying out weighted summation on the content loss value and the style loss value to obtain a gradient descent parameter of the initial T5 model.

Further, calculating the style difference value between the intermediate text and the sample title comprises: sampling the intermediate text to generate an intermediate title with the same size as the sample title; adopting a pre-trained long-short term memory network (LSTM) to identify a second title style of the intermediate title; and calculating the style difference value between the first title style and the second title style.

Further, before determining a plurality of title genres of a title to be generated, the method further comprises: determining a target video of a title to be generated; and extracting subtitle information in the target video, and determining the subtitle information as the title material.

According to another aspect of the embodiments of the present application, there is also provided a title generation apparatus, including: the processing module is used for determining a plurality of title styles of the title to be generated; a generating module, configured to input the title material and the multiple title styles into a pre-trained T5 model, and generate multiple candidate titles, where each candidate title corresponds to one title style; and the selection module is used for selecting one of the multiple candidate titles to be output as the target title.

Further, the generating module includes: a first generating unit, configured to generate, for each headline style, a structural parameter of a conditional layer normalized LN in the pre-trained T5 model according to the headline style; the output unit is used for inputting the title material into the adjusted pre-trained T5 model and outputting the title character string and probability information of each title character in the title character string; and the second generating unit is used for carrying out bundle searching on the title character string based on the probability information and generating a corresponding alternative title.

Further, the first generation unit includes: the conversion subunit is used for converting the title style into a vector matrix with a row and M columns, wherein M is a positive integer; the first calculation subunit is used for sampling the vector matrix multiplied by an N-dimensional full-connection layer to obtain an M-row and N-column full-connection matrix, wherein N is a positive integer; and the second calculation subunit is used for summing M rows in the full-connection matrix to obtain the structural parameters.

Further, the apparatus further comprises: a configuration module, configured to extract a sample title material and a sample title and configure a first title style of the sample title before the generation module inputs the subtitle information and the plurality of title styles into a pre-trained T5 model; the output module is used for inputting the sample title material, the first title style and the sample title into an initial T5 model and outputting an intermediate text; and the updating module is used for calculating gradient descent parameters of the initial T5 model according to the sample titles and the intermediate texts, and updating the initial T5 model based on the gradient descent parameters to obtain the pre-trained T5 model.

Further, the update module includes: the first calculating unit is used for calculating the content difference value between the intermediate text and the sample title and calculating the style difference value between the intermediate text and the sample title; a configuration unit, configured to configure the content difference value and the style difference value as a content loss value and a style loss value of the initial T5 model, respectively; and the second calculation unit is used for performing weighted summation on the content loss value and the style loss value to obtain a gradient descent parameter of the initial T5 model.

Further, the first calculation unit includes: the generating subunit is used for sampling the intermediate text and generating an intermediate title with the same size as the sample title; the identification subunit is used for identifying a second title style of the intermediate title by adopting a pre-trained long-short term memory network LSTM; and the calculating subunit is used for calculating the style difference value between the first title style and the second title style.

Further, the apparatus further comprises: the determining module is used for determining a target video of the title to be generated before the processing module determines a plurality of title styles of the title to be generated; and the extraction module is used for extracting the subtitle information in the target video and determining the subtitle information as the title material.

According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program that executes the above steps when the program is executed.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein: a memory for storing a computer program; a processor for executing the steps of the method by running the program stored in the memory.

Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the steps of the above method.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

in the embodiment of the application, a plurality of title styles of titles to be generated are determined, title materials and the title styles are input into a pre-trained T5 model, a plurality of alternative titles are generated, wherein each alternative title corresponds to one title style, one alternative title is selected from the alternative titles and output as a target title, a scheme for automatically generating the titles in the preset style is realized, the technical problem that the titles in the preset style cannot be automatically generated in the related art is solved, and the title generation efficiency is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 is a flowchart of a title generation method according to an embodiment of the present invention;

FIG. 2 is a block diagram of layer normalization in the T5 model according to an embodiment of the present invention;

FIG. 3 is a diagram of a framework for training a T5 model according to an embodiment of the present invention;

fig. 4 is a block diagram of a title generation apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments, and the illustrative embodiments and descriptions thereof of the present application are used for explaining the present application and do not constitute a limitation to the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another similar entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In this embodiment, a title generation method is provided, and fig. 1 is a flowchart of a title generation method according to an embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:

step S102, determining a plurality of title styles of a title to be generated;

the embodiment can be applied to the title generation scene of videos, news, advertisements and fusion media.

In an application scenario, the title is a video title, and before determining a plurality of title styles of the title to be generated, the method further includes: determining a target video of a title to be generated; and extracting subtitle information in the target video, and determining the subtitle information as the title material.

Alternatively, an OCR (Optical Character Recognition) algorithm may be used to recognize subtitles in a video frame of the target video. Title genres may be mapped or characterized by fields, such as 01 for a first genre, 02 for a second genre, and so on.

Optionally, the plurality of title styles include: question style, question and answer style, digital style, comment style, inverse style, and citation style. Different title styles respectively correspond to different crowds, and targeted push is performed on user accounts or video ends of corresponding types, so that the click rate of a target video of a designated crowd is increased, and by taking the target video as a video for treating cervical spondylosis as an example, the generated titles are shown in table 1 and respectively correspond to the titles of the direct type (no style) and the 6 title styles.

TABLE 1

Types of	Title
		Detailed description of the invention	Effective treatment method for cervical spondylosis
Questions of question	What do the cervical spondylosis?
		Question-answering	What symptoms are in the early stage of cervical spondylosis? Chinese medical science teaches you to confirm the diagnosis quickly
Comments	Chinese medical science teaches that you are too useful for treating cervical spondylosis and making net friends breathe directly
		Reverse rotation	The method has the effect of immediately taking effect after the horse
Reference to	Old traditional Chinese medicine: i have secret prescription for treating cervical vertebra
		Number of	4 methods for teaching you to quickly treat cervical spondylosis

Step S104, inputting title materials and a plurality of title styles into a pre-trained T5 model to generate a plurality of alternative titles, wherein each alternative title corresponds to one title style;

the T5(Text-to-Text Transfer Transformer) model of the present embodiment is a Text conversion model.

The title material of this embodiment may be extracted from a target media of a title to be generated, for example, extracted from subtitle information of a target video of a video title to be generated, or extracted from a related text of the target media, for example, extracted from a lyric text of a song in an audio format, and the like, and the title material of the title is extracted from introduction information of a certain target media in a search website.

Step S106, selecting one of the multiple candidate titles to be output as a target title;

through the steps, a plurality of title styles of the title to be generated are determined, the title material and the title styles are input into a pre-trained T5 model, a plurality of alternative titles are generated, wherein each alternative title corresponds to one title style, one alternative title is selected from the alternative titles and output as a target title, a scheme for automatically generating the title in the preset style is realized, the technical problem that the title in the preset style cannot be automatically generated in the related technology is solved, and the title generation efficiency is improved.

In one embodiment of this embodiment, inputting the title material and the plurality of title styles into a pre-trained T5 model, and generating the plurality of candidate titles comprises:

s11, generating a Layer Normalization (LN) structure parameter of the pre-trained T5 model according to the title style aiming at each title style;

fig. 2 is a schematic structural diagram of layer normalization in the embodiment of the present invention, x' is a result after the conditional layer normalization,

taking 768 dimensions as an example of the vector dimension of the T5 model, if the length is 40, a sentence can be expressed by a matrix of (40,768); if there are 8 sentences in total, then it can be expressed by a matrix of (8,40,768), x ^′ Is the result after conditional level normalization, which is also the matrix of (8,40,768), x is the sentence matrix, 8 sentences, i.e., the matrix of (8,40,768), μ _x Is the average of the same dimension for different words in each sentence, is a vector of (8, 768), σ _x Is the standard deviation of the same dimension for different words in each sentence, is a vector of (8, 768), γ is the structural parameter to be trained by the model, is a vector of (1, 768), β is the structural parameter to be trained by the model, is a vector of (1, 768).

In some examples, generating structural parameters of the conditional-level normalized LNs in the pre-trained T5 model from the headline style includes: converting the title style into a vector matrix with a row and M columns, wherein M is a positive integer; multiplying the sampling vector matrix by an N-dimensional full-connection layer to obtain an M-row and N-column full-connection matrix, wherein N is a positive integer; and summing M rows in the full-connection matrix to obtain the structural parameters.

Optionally, the structure parameters are γ and β in the layer normalization, and γ and β are generated by using the title style, so that the T5 model is influenced to generate the title of the corresponding style.

S12, inputting the title material into the adjusted pre-trained T5 model, and outputting a title character string and probability information of each title character in the title character string;

and generating structural parameters of the conditional layer normalized LN in the pre-trained T5 model by respectively adopting the caption style for each caption style, adjusting the structural parameters, inputting the same caption information into the adjusted pre-trained T5 model, repeating for 6 times if 6 caption styles exist, and outputting 6 different caption character strings and probabilities.

S13, performing cluster search on the title character string based on the probability information, and generating a corresponding candidate title.

The cluster search can be adopted to select the title characters formed by the optimal paths and convert the continuous probability of the title character strings into alternative titles.

In the embodiment, the T5 model is a text generation model, receives a sentence, generates a sentence, inputs a sentence, and outputs a sentence, and the embodiment modifies the Layer Normalization (LN) inside the T5 into conditional layer normalization (conditional layer normalization) to make the style type acceptable to the T5, so that during training, the model inputs the caption, style type, caption, and output of the sample video is the generated caption. When the model is used, the subtitle and the style type of the target video are input, and the title corresponding to the style is output.

This embodiment also requires training the initial T5 model before using the pre-trained T5 model. Before inputting the title material and the plurality of title styles into the pre-trained T5 model, the solution of the embodiment further comprises:

s21, extracting the sample title material and the sample title, and configuring a first title style of the sample title;

the first title style is an original title or a self-contained title of the sample video, may be non-style or qualified, and can be identified by a regular algorithm or a regular algorithm.

S22, inputting the sample title material, the first title style and the sample title into an initial T5 model, and outputting an intermediate text;

s23, calculating gradient descent parameters of the initial T5 model according to the sample titles and the intermediate texts, and updating the initial T5 model based on the gradient descent parameters to obtain a pre-trained T5 model.

In one embodiment, calculating the gradient descent parameters of the initial T5 model according to the sample title and the intermediate text comprises: calculating the content difference value between the intermediate text and the sample title, and calculating the style difference value between the intermediate text and the sample title; configuring the content difference value and the style difference value as a content loss value and a style loss value of the initial T5 model, respectively; and carrying out weighted summation on the content loss value and the lattice loss value to obtain a gradient descent parameter of the initial T5 model.

In the embodiment, cross entry is adopted to calculate content loss (contentloss) between the intermediate text and the sample title, during calculation, the intermediate text needs to be sampled first to generate an intermediate title with the same size as the sample title, and then a difference value is calculated.

And adding the content loss and the style loss through weights, performing gradient descent on a deep learning model in the T5 model, updating parameters, and finishing a training period after the deep learning gradient descends.

In one example, calculating the style difference value for the intermediate text and the sample title comprises: sampling the intermediate text to generate an intermediate title with the same size as the sample title; adopting a pre-trained Long Short-Term Memory network (LSTM) to identify a second title style of the middle title; a style difference value is calculated between the first title style and the second title style.

In this embodiment, the output intermediate text of the initial T5 is passed through the gumbelsoft max, then the output result of the gumbelsoft max is inputted into the LSTM, and finally the second title style of the intermediate title generated by the initial T5 is outputted, then the second title style is compared with the first title style of the sample title when T5 is inputted, that is, the first title style is inputted, the intermediate text is generated by the initial T5 model, this intermediate text is sampled by the gumbelsoft max into the intermediate title having the same size as the sample title, the intermediate title is inputted into the LSTM to discriminate the second title style of the intermediate text, and finally the difference value between the first title style and the second title style is calculated by crosscopy, and the difference value is determined as a style loss (styleoss).

Because the T5 model outputs the probability of generating each character (character, son and mother, punctuation mark) of the title, the probability is sampled by using the Gumbelsoft max to obtain a complete title, after a complete intermediate title is obtained, the previously trained LSTM style discrimination model with attention mechanism is input, the style loss is calculated, the content loss of T5 and the style loss of the LSTM are weighted and taken as the total loss of the model, the final model is obtained by gradient descent (the parameters of the LSTM style discrimination model are not updated, only the parameters of the T5 model are updated), and after the training of the T5 model is finished, the Gumbelsoft max and the LSTM used in the training process are deleted, namely, the target title can be generated by using the subtitle information in the target video and the title style of the title to be generated.

Taking 768 dimensions of vector of the T5 model as an example, if 8 sentences are trained together, the caption has at most 512 words and the caption has at most 32 words, the caption is a vector of (8,512,768), the style type is a vector of (6), the caption of a specific style is a vector of (8,32), and the output intermediate text is a vector of (8,32, 50000). (8,32,50000) means that each word in each of 8 sentences has 50000 choices, and the choice of 50000 of each word in each sentence with the highest probability is better to be changed into (8,32,1), namely (8,32), in the embodiment, the Gumbel softmax function is adopted, the output of the initial T5 model, namely the intermediate text is used as the input of the Gumbel softmax function, the output of (8,32) is obtained, and the sampling capable of gradient descent is realized. The LSTM in this embodiment functions to input a sentence (middle heading) and to discriminate the style of the sentence (middle heading), i.e., output the style type.

Fig. 3 is a schematic diagram of a framework for training a T5 model according to an embodiment of the present invention, which is applied in a scene of generating a video title, and includes: the T5 model, Gumbel softmax, LSTM, is used as a discriminator, and the training procedure and the application procedure thereof are explained as follows:

sample data sorting: extracting subtitles in a sample video by using an OCR (optical character recognition), and classifying original video titles by using a regular method and the like;

training an LSTM style discrimination model with attention mechanism (attention);

inputting the sample caption and the first caption style of the sample caption into a T5 model, rewriting a layer normalization (Layernormalization) structure in T5, generating gamma and beta by using the first caption style, further influencing the generation of a T5 style, and outputting a T5 result, namely an intermediate text;

sampling the intermediate text by using a gumbelsoft max to obtain an intermediate title;

inputting the sampling result into the LSTM for discrimination to obtain a second title style of the middle title;

calculating style loss based on the discrimination result, calculating content loss based on the intermediate text and the sample title, and performing weighted sum of the style loss and the content loss;

performing gradient descent training on the initial T5 model by adopting a weighted sum, and fixing model parameters by using an LSTM;

and finishing training to obtain a pre-trained T5 model for standby.

In another aspect, the step of automatically generating a multi-genre video title comprises:

selecting a target video, and extracting subtitle information in the target video by using an OCR (optical character recognition) technology;

inputting the subtitle information and the title style to be generated into a T5 model;

the T5 model generates a video title of a corresponding style; if 6 types of styles exist, repeating the above steps 6 times, and inputting different title styles each time, for example, each title style is mapped to 1 by field: query formula, 2: question-answer formula, 3: digital, 4: review formula, 5: reverse formula, 6: of the formula

And finally outputting the video titles of 6 styles.

Due to the fact that in the related art, unsupervised algorithms are not accurate enough, landing requirements cannot be met, supervised models exist, data are not enough, and for example, when a question type title is converted into a question and answer type title, a title pair of question type title-question and answer type title is needed in a training set. In reality, only 1 title exists in a video, and it is very time-consuming and labor-consuming to manually start a question-answer title from a question-answer title. The scheme of the embodiment can train a model with higher accuracy without a title pair. The general title style migration is to migrate from one style to another style, and if there are 6 styles, 15 models can be generated in a one-to-one correspondence manner, but the scheme of the embodiment can directly output titles of the 6 styles by only one model through controlling the input type. This embodiment defines 6 appealing styles of title, and incorporates the gumbelsoftmax + LSTM style discriminator into the T5 model during training to make the style of the final title more accurate.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a title generation apparatus is further provided for implementing the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 4 is a block diagram of a title generation apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus including: a processing module 40, a generating module 42, a selecting module 44, wherein,

a processing module 40, configured to determine a plurality of title styles of a title to be generated;

a generating module 42, configured to input the title material and the multiple title styles into a pre-trained T5 model, and generate multiple candidate titles, where each candidate title corresponds to one title style;

a selecting module 44, configured to select one of the candidate titles for output as the target title.

Optionally, the generating module includes: a first generating unit, configured to generate, for each headline style, a structural parameter of a conditional layer normalized LN in the pre-trained T5 model according to the headline style; the output unit is used for inputting the title material into the adjusted pre-trained T5 model and outputting the title character string and probability information of each title character in the title character string; and the second generating unit is used for carrying out bundle searching on the title character strings based on the probability information and generating corresponding candidate titles.

Optionally, the first generating unit includes: the conversion subunit is used for converting the title style into a vector matrix with a row and M columns, wherein M is a positive integer; the first calculation subunit is used for sampling the vector matrix multiplied by an N-dimensional full-connection layer to obtain an M-row and N-column full-connection matrix, wherein N is a positive integer; and the second calculation subunit is used for summing M rows in the full connection matrix to obtain the structural parameters.

Optionally, the apparatus further comprises: a configuration module, configured to extract a sample title material and a sample title before the generation module inputs the subtitle information and the plurality of title styles into a pre-trained T5 model, and configure a first title style of the sample title; the output module is used for inputting the sample title material, the first title style and the sample title into an initial T5 model and outputting an intermediate text; and the updating module is used for calculating gradient descent parameters of the initial T5 model according to the sample titles and the intermediate texts, and updating the initial T5 model based on the gradient descent parameters to obtain the pre-trained T5 model.

Optionally, the update module includes: the first calculating unit is used for calculating the content difference value between the intermediate text and the sample title and calculating the style difference value between the intermediate text and the sample title; a configuration unit, configured to configure the content difference value and the style difference value as a content loss value and a style loss value of the initial T5 model, respectively; and the second calculation unit is used for performing weighted summation on the content loss value and the style loss value to obtain a gradient descent parameter of the initial T5 model.

Optionally, the first computing unit includes: the generating subunit is used for sampling the intermediate text and generating an intermediate title with the same size as the sample title; the identification subunit is used for identifying a second title style of the intermediate title by adopting a pre-trained long-short term memory network LSTM; and the calculating subunit is used for calculating the style difference value between the first title style and the second title style.

Optionally, the apparatus further comprises: the determining module is used for determining a target video of the title to be generated before the processing module determines a plurality of title styles of the title to be generated; and the extraction module is used for extracting the subtitle information in the target video and determining the subtitle information as the title material.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Fig. 5 is a structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device includes a processor 51, a communication interface 52, a memory 53 and a communication bus 54, where the processor 51, the communication interface 52, and the memory 53 complete mutual communication through the communication bus 54, and the memory 53 is used for storing a computer program; the processor 51 is configured to implement the following steps when executing the program stored in the memory 53: determining a plurality of title styles of a title to be generated; inputting the title materials and the title styles into a pre-trained T5 model to generate a plurality of alternative titles, wherein each alternative title corresponds to one title style; and selecting one of the candidate titles to be output as a target title.

Further, generating structural parameters of the conditional layer normalized LN in the pre-trained T5 model according to the headline style includes: converting the title style into a vector matrix with a row and M columns, wherein M is a positive integer; sampling the vector matrix multiplied by an N-dimensional full connection layer to obtain M rows and N columns of full connection matrixes, wherein N is a positive integer; and summing M rows in the full-connection matrix to obtain the structural parameters.

Further, before determining a plurality of title styles of the title to be generated, the method further comprises: determining a target video of a title to be generated; and extracting subtitle information in the target video, and determining the subtitle information as the title material.

The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In yet another embodiment provided by the present application, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute the title generation method described in any of the above embodiments.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of generating a title as described in any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A title generation method, comprising:

determining a plurality of title styles of the title to be generated;

inputting the title materials and the title styles into a pre-trained T5 model to generate a plurality of alternative titles, wherein each alternative title corresponds to one title style;

and selecting one of the candidate titles to be output as a target title.

2. The method of claim 1, wherein inputting title material and the plurality of title styles into a pre-trained T5 model, generating a plurality of alternative titles comprises:

for each title style, generating structural parameters of a conditional layer normalized LN in the pre-trained T5 model according to the title style;

inputting the title material into an adjusted pre-trained T5 model, and outputting a title character string and probability information of each title character in the title character string;

and performing bundle searching on the title character strings based on the probability information to generate corresponding alternative titles.

3. The method of claim 2, wherein generating structural parameters of a conditional-level normalized LN in the pre-trained T5 model from the headline style comprises:

converting the title style into a vector matrix with a row and M columns, wherein M is a positive integer;

sampling the vector matrix multiplied by an N-dimensional full-connection layer to obtain an M-row and N-column full-connection matrix, wherein N is a positive integer;

and summing M rows in the full-connection matrix to obtain the structural parameters.

4. The method of claim 1, wherein prior to inputting the title material and the plurality of title styles into a pre-trained T5 model, the method further comprises:

extracting a sample title material and a sample title, and configuring a first title style of the sample title;

inputting the sample title material, the first title style and the sample title into an initial T5 model and outputting an intermediate text;

calculating gradient descent parameters of the initial T5 model according to the sample titles and the intermediate texts, and updating the initial T5 model based on the gradient descent parameters to obtain the pre-trained T5 model.

5. The method of claim 4, wherein calculating a gradient descent parameter of the initial T5 model from the sample header and the intermediate text comprises:

calculating a content difference value between the intermediate text and the sample title, and calculating a style difference value between the intermediate text and the sample title;

configuring the content difference value and the style difference value as a content loss value and a style loss value of the initial T5 model, respectively;

and carrying out weighted summation on the content loss value and the style loss value to obtain a gradient descent parameter of the initial T5 model.

6. The method of claim 5, wherein calculating the style difference value between the intermediate text and the sample title comprises:

sampling the intermediate text to generate an intermediate title with the same size as the sample title;

adopting a pre-trained long-short term memory network (LSTM) to identify a second title style of the intermediate title;

and calculating the style difference value between the first title style and the second title style.

7. The method of any of claims 1 to 6, wherein prior to determining a plurality of title styles for a title to be generated, the method further comprises:

determining a target video of a title to be generated;

and extracting subtitle information in the target video, and determining the subtitle information as the title material.

8. A title generation apparatus, comprising:

the processing module is used for determining a plurality of title styles of the title to be generated;

the generating module is used for inputting the title materials and the title styles into a pre-trained T5 model and generating a plurality of alternative titles, wherein each alternative title corresponds to one title style;

and the selection module is used for selecting one of the multiple candidate titles to be output as the target title.

9. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program is operative to perform the method steps of any of the preceding claims 1 to 7.

10. An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; wherein:

a memory for storing a computer program;

a processor for performing the method steps of any of claims 1 to 7 by executing a program stored on a memory.