CN114297354A

CN114297354A - Bullet screen generation method and device, storage medium and electronic device

Info

Publication number: CN114297354A
Application number: CN202111460420.4A
Authority: CN
Inventors: 司马华鹏; 华冰涛; 汤毅平; 汪成; 孙雨泽
Original assignee: Nanjing Silicon Intelligence Technology Co Ltd
Current assignee: Nanjing Silicon Intelligence Technology Co Ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-04-08
Anticipated expiration: 2041-12-02
Also published as: CN114297354B

Abstract

The embodiment of the application provides a bullet screen generation method and device, a storage medium and an electronic device, which can generate bullet screens based on a target text while synthesizing the target video corresponding to the target text, and directly display the bullet screens on the target video. The technical problems that in the related art, due to the fact that a bullet screen is formed without user input texts, a user cannot enjoy the bullet screen when watching, and experience is low are solved. Meanwhile, through the question-answer barrage and the comment barrage, the interactive experience between the user and the target video in different dimensions can be improved through richer barrage types.

Description

Bullet screen generation method and device, storage medium and electronic device

Technical Field

The application relates to the technical field of video production, in particular to a bullet screen generation method and device, a storage medium and an electronic device.

Background

The bullet screen is a text displayed on a video in the video playing process, and is one of the ways of interaction between a user and video content and between the user and the user. The bullet screen in the related art is completely dependent on the text input by the user, that is, only if the user inputs text to the video, the bullet screen corresponding to the text can be displayed on the video. However, if the user does not input text to the video before watching the video, for example, the user is the first user watching the video, or the other users watching the video do not input text before the user watches the video, then the user will not be exposed to the bullet screen on the video watched by the user, and the user cannot enjoy the bullet screen experience.

Aiming at the problem that in the related technology, due to the fact that a bullet screen is formed by text input by a user in the video playing process, the experience of the user for watching the video is poor, an effective solution is not provided in the related technology.

Disclosure of Invention

The application provides a bullet screen generation method and device, a storage medium and an electronic device, and aims to at least solve the technical problem that in the related art, the user experience of watching a video is poor due to the fact that a bullet screen is formed without inputting a text by the user in the video playing process.

In an embodiment of the present application, a bullet screen generating method is provided, including:

acquiring the importance of each sentence in a target text, wherein the target text is a text for synthesizing a target video, and the importance is a weighted average of the similarity of each sentence and other sentences in the target text;

dividing all sentences in the target text into at least three groups according to the sequence of the importance degrees from high to low;

dividing all sentences in the target text into a first class of sentences and a second class of sentences, wherein the first class of sentences are question-answering barrage materials, the second class of sentences are comment-type barrage materials, the first class of sentences refer to all sentences in one group with the highest sequence and partial sentences in the middle group, and the second class of sentences refer to all sentences in one group with the lowest sequence and the rest sentences in the middle group;

generating the target video based on the target text, generating corresponding questions and answers based on the first type of sentences, and generating first text and second text based on the second type of sentences;

displaying the question, the first text, the second text, and/or the answer in the target video.

In one implementation, the obtaining the importance of each sentence in the target text includes:

converting each sentence into a sentence vector;

calculating cosine values and physical distances between the sentence vectors of each sentence and the sentence vectors of other sentences in the target text to obtain the similarity between each sentence and the other sentences in the target text;

and calculating the weighted average of the similarity between each sentence and the rest sentences in the target text to obtain the importance of each sentence.

In one implementation, before the obtaining the importance of each sentence in the target text, the method further includes:

identifying a designated symbol in the target text;

dividing the target text into a plurality of sentences with the specified symbols.

In one implementation, the generating the target video based on the target text and the corresponding question and answer based on the first type of sentence and the generating the first text and the second text based on the second type of sentence includes:

generating the answer and the first text using stochastic decoding, and generating the question and the second text using deterministic decoding.

In one implementation, the generating the answer and the first text using stochastic decoding, and the generating the question and the second text using deterministic decoding comprises:

deleting the second text.

respectively calculating a first similarity of each group of the questions and the answers and a second similarity of each group of the first texts and the second texts;

based on the first similarity and the second similarity, reserving the question, the answer, the first text and the second text according to a specified screening proportion, wherein the specified screening proportion means that the number of reserved groups of the first text and the second text is larger than the number of reserved groups of the question and the answer.

In an embodiment of the present application, there is also provided a bullet screen generating apparatus, including:

the importance calculation module is configured to obtain the importance of each sentence in a target text, wherein the target text is a text for synthesizing a target video, and the importance is a weighted average of the similarity of each sentence and the rest sentences in the target text;

the grouping module is configured to divide all sentences in the target text into at least three groups according to the sequence of the importance degrees from high to low;

a sentence dividing module configured to divide all sentences in the target text into a first class of sentences and a second class of sentences, wherein the first class of sentences are question-answering barrage materials, the second class of sentences are comment-type barrage materials, the first class of sentences refer to all sentences in one group with the highest ranking and partial sentences in the middle group, and the second class of sentences refer to all sentences in one group with the lowest ranking and the rest sentences in the middle group;

a generating module configured to generate the target video based on the target text, generate corresponding questions and answers based on the first type of sentences, and generate first text and second text based on the second type of sentences;

a display module configured to display the question, the first text, the second text, and/or the answer in the target video.

In an embodiment of the present application, a bullet screen display method is further provided, in the target video according to any one of the above method embodiments, the answer is played through a preset avatar.

In an embodiment of the present application, a computer-readable storage medium is also proposed, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above-described method embodiments when executed.

In an embodiment of the present application, there is also provided an electronic device, comprising a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

By the method and the device, the bullet screens can be generated based on the target text while the target video corresponding to the target text is synthesized, and the bullet screens are directly displayed on the target video. The technical problems that in the related art, due to the fact that a bullet screen is formed without user input texts, a user cannot enjoy the bullet screen when watching, and experience is low are solved. Meanwhile, through the question-answer barrage and the comment barrage, the interactive experience between the user and the target video in different dimensions can be improved through richer barrage types.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is an interaction diagram of a conventional bullet screen generating method;

FIG. 2 is a flow chart of an alternative bullet screen generation method according to an embodiment of the present application;

FIG. 3 is a flow diagram of an alternative method of partitioning sentences according to an embodiment of the present application;

FIG. 4 is a flow diagram of an alternative method for calculating sentence importance according to an embodiment of the present application;

FIG. 5 is a flow chart of a method of alternative barrage screening according to an embodiment of the present application;

FIG. 6 is a schematic illustration of an alternative bullet screen display according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an alternative bullet screen generating device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an alternative bullet screen generating device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The video creator synthesizes corresponding video materials into a target video a based on the target text A, the text content of the target text A is displayed through the target video a, and the video creator releases the target video a to a website for a user to watch. When the user 1 watches the target video a, if the user 1 is the first user who watches the target video a, or if the user 2 watches the target video a before the user 1 watches the target video a, but the user 2 does not input text to the target video a, the target video a watched by the user 1 shows only the video content of the target video a (text content corresponding to the target text a) without a bullet screen as shown by (r) in fig. 1. During the process of watching the target video a, the user 1 inputs the text a1 to the target video a, and the text a1 may be the view of the user 1 to the target video a, so as to realize the interaction between the user 1 and the target video a. Accordingly, a bullet screen corresponding to the text a1 will be generated on the target video a. If the user 3 watches the target video a after the user 1, the target video a watched by the user 3 at this time shows not only the video content of the target video a (text content corresponding to the target text a) but also a bullet screen (text a1) as shown in fig. 1 (a). The user 3 can know the viewpoint of the target video a of other users (the user 1) by browsing the bullet screen so as to realize the interaction between the user 3 and the other users (the user 1), and the user 3 can quickly master the information related to the target video a through the bullet screen, so that the watching has directivity.

In the above scenario, since the target video a watched by the user 1 does not have the barrage, the user 1 cannot enjoy the barrage experience, resulting in poor experience of the user 1 in watching the target video a. In order to solve the above problem, as shown in fig. 2, an embodiment of the present application provides a bullet screen generating method, including:

s1, obtaining the importance of each sentence in the target text, wherein the target text is the text for synthesizing the target video, and the importance is the average value of the similarity of each sentence and the rest sentences in the target text.

In this embodiment, the target text is a text set by a video creator, and the corresponding generated video is the target video, where the text content of the target text is consistent with the video content of the target video, so that the text content of the target text can be displayed by playing the target video, and the target text is more visualized and easy to understand by a user.

In this embodiment, the importance of a sentence is used to reflect the degree of association between one sentence and the rest of the sentences in the text where the sentence is located, and the higher the degree of association between one sentence and the rest of the sentences is, the sentence can be considered as the center of the rest of the sentences, i.e. the higher the importance is, wherein the degree of association between the sentences can be expressed by the sentence similarity, i.e. the more similar two sentences are, the higher the degree of association between the two sentences is, and conversely, the less similar two sentences are, the lower the degree of association between the two sentences is. Thus, the importance of a sentence can be represented by the similarity between the sentence and the rest of the sentences in the text in which the sentence is located. Specifically, the importance of each sentence satisfies formula (1):

wherein, imp_iRepresenting the importance of the ith sentence in the target text, n representing n sentences in the target text, SIM_ijRepresenting the similarity between the ith sentence and the rest of the sentences in the target text.

In this embodiment, before calculating the importance of each sentence, n sentences in the target text are first determined, as shown in fig. 3, an embodiment of the present application provides a method for dividing sentences, including:

and S01, identifying the designated symbol in the target text.

S02, dividing the target text into a plurality of sentences by the designated symbols.

For example, the designation symbol corresponds to a division point of sentence division, and may be set to ". ","; ","! ","? The punctuation marks such as "may also be set as a serial number such as" 1, 2, 3, "one, two, three,", etc., and in this embodiment, no limitation is imposed on the designated symbols, and it is only necessary to ensure that the text between two adjacent designated symbols can represent a relatively complete semantic meaning. After all the designated symbols are recognized, the target text is divided by taking each designated symbol as a dividing point, wherein the text between two adjacent designated symbols is a divided sentence, the text between the first character and the first designated symbol of the target text is a sentence, and the text between the last designated symbol of the target text and the last character of the target text is a sentence.

In some embodiments, the above-described process of dividing the target text into n sentences may be accomplished by a model. For example, the pre-training process of the model may be: taking massive text materials as input, wherein each text material comprises at least two sentences, designated symbols exist between every two sentences, and each sentence is taken as output, so that the model is trained into a model capable of identifying the designated symbols in the text and dividing the sentences according to the designated symbols.

After n sentences are obtained by the method shown above, the similarity between each sentence and the rest of sentences is calculated based on the sentence vector of each sentence, as shown in fig. 4, specifically as follows:

and S11, converting each sentence into a sentence vector.

Sentences can be quantized by converting the sentences into sentence vectors and calculating the similarity between the sentences based on the sentence vectors, so that the calculation accuracy of the sentence similarity is effectively improved. The n sentences can be respectively converted into sentence vectors by corresponding models, for example, the models can adopt word2vec, glove, bert, and the like.

S12, calculating cosine values and physical distances between the sentence vectors of each sentence and the sentence vectors of the rest sentences in the target text to obtain the similarity between each sentence and the rest sentences in the target text.

In the present embodiment, in order to ensure the calculation accuracy of the similarity between each sentence and the remaining sentences, the similarity between each sentence and the remaining sentences may be obtained by comprehensively calculating cosine values between the sentences and physical distances between the sentencesSimilarity, i.e. SIM in equation (1)_ijSatisfies formula (2):

SIM_ij＝sim_ij*pos_ij (2)

wherein, sim_ijRepresenting cosine value, pos, between ith and jth sentences_ijRepresenting the physical distance between the ith sentence and the jth sentence.

Wherein, sim_ijSatisfies formula (3):

sim_ij＝cos(V_i、V_j) (3)

wherein, V_iSentence vector, V, representing the ith sentence_jA sentence vector representing the jth sentence.

pos_ijSatisfies formula (4):

wherein d is_ijThe physical distance between the ith sentence and the jth sentence is represented, n represents the number of all sentences in the target text, and the influence caused by the physical distance can be weakened by dividing the physical distance between the ith sentence and the jth sentence by the number n of all sentences as the finally used physical distance, particularly the influence degree of the sentences with the closer physical distance on the importance degree of the ith sentence is reduced.

And S13, calculating the average value of the similarity between each sentence and the rest sentences in the target text to obtain the importance of each sentence.

By sorting the formulas (1), (2), (3) and (4), the similarity between each sentence and the rest sentences in the target text can be obtained to satisfy the formula (5):

based on the above process, the importance of each sentence in the target text can be accurately calculated.

And S2, dividing all sentences in the target text into at least three groups according to the sequence of the importance degrees from high to low.

As can be seen from the above description of the importance of the sentence, the higher the importance of the sentence is, the higher the association degree between the sentence and the other sentences is, the more the sentence can represent the main content of the other sentences, and accordingly, it can be considered that the higher the importance of the sentence is, the more the sentence can reflect the main content of the text. In this embodiment, the bullet screen is directly carried in the target video, and should reflect the video content of the target video, and since the video content of the target video is consistent with the text content of the target text, the bullet screen can be directly generated based on the text content of the target text. In order to improve the effectiveness of the generated barrage, that is, to enable the generated barrage to reflect the main content of the target text, the barrage can be generated based on the importance of the sentence.

According to the importance of the sentences, the n sentences can be roughly divided into at least three groups, namely a group with higher importance, a group with lower importance and a group with intermediate importance. The n sentences may also be subdivided into more groups, for example five groups, according to the actual requirements. Each clause in the group with higher importance degree can represent the most main text content of the target text, and the importance degree of the text content represented by each clause in the other groups is gradually reduced.

In this embodiment, n sentences may be grouped in a number-sharing manner, or n sentences may be grouped in a designated ratio, for example, the ratio of division is set in a relationship proportional to the importance of the group, so as to ensure that the number of sentences in the group with higher importance is greater, and improve the accuracy of reflecting the text content of the target text.

And S3, dividing all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answering barrage materials, the second type of sentences are comment barrage materials, the first type of sentences refer to all sentences in one group with the highest sequence and partial sentences in the middle group, and the second type of sentences refer to all sentences in one group with the lowest sequence and the rest sentences in the middle group.

And (3) taking each sentence in each group obtained in the step (2) as a bullet screen material for generating a bullet screen, wherein the bullet screen can be divided into two types, namely a question-and-answer bullet screen and a comment bullet screen. In this embodiment, the question-and-answer barrage has substantial content, for example, if the sentence is "today is sunny", the question-and-answer barrage is "question: what weather is today? "," answer: in sunny days. "commenting barrage has no substantial content, usually representing an emotion, e.g., the sentence" Zhang flies to a battle field and runs to a killer ", the commenting barrage is" Wa! "," true severity! "and the like. Information related to the video content of the target video can be transmitted to the user through the question-and-answer barrage, so that the user can know the specific video content of the target video to a certain extent, for example, through the question-and-answer barrage problem: what weather is today? "," answer: in sunny days. The user can know that the target video is related to weather, and the thinking of the user can be triggered through the question and answer mode, so that the attention of the user to the target video is improved. The emotion related to the video content of the target video can be transferred to the user through the commenting barrage, so that the user can master the emotion mood and the spirit to be conveyed of the target video, for example, through the commenting barrage' wove! "," true severity! The user can know that the emotion to be transmitted by the target video is excited, the watching emotion of the user can be guided, the emotional resonance of the user is caused, and therefore the experience feeling of the user is improved.

In order to ensure the viewing experience of the user, the two types of barrages are generated based on the target text, in the embodiment, the sentences for generating the questioning and answering barrages are referred to as first-type sentences which are used as the materials for generating the questioning and answering barrages, and the sentences for generating the commenting barrages are referred to as second-type sentences which are used as the materials for generating the commenting barrages.

As can be seen from the above description of the question-answer barrage and the comment-type barrage, compared with the comment-type barrage, the question-answer barrage can reflect the video content of the target video more importantly, and on the basis of this, the sentences with higher importance are used as the first-class sentences, the sentences with lower importance are used as the second-class sentences, a part of the sentences with middle importance are used as the first-class sentences, and another part of the sentences with middle importance are used as the second-class sentences.

Illustratively, n sentences are divided into five groups, namely A, B, C, D, E groups in turn according to the importance degree, all sentences in two groups A and B with higher importance degrees and a part of sentences in a group C with middle importance degrees are taken as sentences of a first class, and two groups D and E with lower importance degrees and another part of sentences in a group C with middle importance degrees are taken as sentences of a second class.

And S4, generating the target video based on the target text, generating corresponding questions and answers based on the first type of sentences, and generating first text and second text based on the second type of sentences.

And generating a question-answering barrage based on the first type of sentences determined in the step S3, and generating a comment type barrage based on the second type of sentences determined in the step 3. Wherein two texts, namely a question and an answer, are generated based on each first-type sentence, the question corresponds to the answer, two texts, namely a first text and a second text, are generated based on each second-type sentence, and the first text and the second text represent the same emotion.

In some embodiments, the question and the answer may be generated using respective models, and the first text and the second text, illustratively, using a UNIfied pre-trained Language Model (UniLM), wherein the UniLM Model is obtained based on a bi-directional Encoder Representations (BERTs) modification. Firstly, a UniLM model is pre-trained by adopting a first type of bullet screen material sample (bullet screen material + question and answer corresponding to the bullet screen material) and a second type of bullet screen material sample (bullet screen material + first text and second text corresponding to the bullet screen material). The pre-training process of the first type of bullet screen material samples is explained, and the first type of bullet screen material samples are input: sentence A; and (3) outputting: the method comprises the steps of marking and splicing a question a + answer a 'of a first type of bullet screen material sample through a BERT model to obtain a' CLS 'sentence A [ SEP ] answer a [ SEP ] question a [ SEP ]', executing a sequence-to-sequence (Seq 2Seq) task through a UniLM model according to a special mode (using differential selection-authorization mask), wherein the input part of the input part is bidirectional, the output part of the input part of the first type of the bullet screen material sample of the first type of the answer of the first type of the bullet screen material sample of the first type of the bullet screen material sample of the input part of the first type of the input part of the first type of the bullet screen material of the first part of the answer of the first type of the input part of the bullet screen material of the first type of the input part of the answer of the first part of the input part of the bullet screen material of the input part of the first type of the input part of the first part of the input part of the first type of the first part of the input part of the first part of the bullet screen material of the input part of the first type of the first part of the bullet screen material of the input part of the first part of the. The pre-training process of the second type of bullet screen material sample is explained, and the second type of bullet screen material sample is' input: sentence B; and (3) outputting: and marking and splicing the second type of bullet screen material samples by a UniLM (unified modeling language) to obtain a [ CLS ] sentence B [ SEP ] second text B [ SEP ] first text B [ SEP ] ", executing a Seq2Seq task by the UniLM according to a special mode (using differential selection-annotation mask), wherein the annotation of an input part is bidirectional, the annotation of an output part is unidirectional, and each second type of bullet screen material sample pretrains the model according to the process. When the UniLM model obtained through pre-training is used, bullet screen materials (a first type sentence and a second type sentence) are input, questions and answers are output, and a first text and a second text are output, so that a bullet screen finally displayed on a target video is obtained.

The first-class sentences and the second-class sentences can be automatically classified through the UniLM model, namely P (answer, question | first-class sentences) and P (second text, first text | second-class sentences) are calculated, and therefore various types of barrages can be accurately and quickly obtained.

In some embodiments, in order to ensure diversity of answers and comment-type banners in the question-and-answer type banners and also ensure certainty of moods of questions and comment-type banners in the question-and-answer type banners, in the decoding process, the questions and the second text (the second text is used for determining moods) are generated using deterministic decoding, and the answers and the first text are generated using random decoding. Illustratively, the beam _ search strategy is selected for deterministic decoding and the random _ search strategy is selected for random decoding.

Further, since the second text lacks diversity, the second text may be deleted, leaving only the first text as the final commenting barrage.

In some embodiments, in order to control the number ratio of question-answering barrage to comment barrage, the data ratio of barrage material samples of the pre-trained UniLM model may be controlled, and the decoding strategy may be modified at the same time, which may refer to the process shown in fig. 5 as follows:

s412, respectively calculating a first similarity of each group of the questions and the answers and a second similarity of each group of the first texts and the second texts.

In the random _ search strategy, the similarity of the two generated texts is compared, that is, the similarity between the question and the answer in the same group (first similarity) and the similarity between the first text and the second text in the same group (second similarity) are compared, and the similarity between the texts can be obtained by calculating the cosine values of the two texts, for example. In general, the similarity between the answer and the question is high, and the similarity between the first text and the second text is low, so that it is possible to distinguish whether the two texts are the question and the answer, or the first text and the second text by the similarity.

And S413, reserving the question, the answer, the first text and the second text according to a specified screening proportion based on the first similarity and the second similarity, wherein the specified screening proportion means that the number of groups of the reserved first text and the second text is larger than that of the groups of the question and the answer.

In this embodiment, the specified screening ratio may be set according to actual requirements, for example, in order to improve the attention of the user to the target video, the specified screening ratio of the questions and the answers may be increased to keep more question-answering barrages. For another example, to increase the user's interest in the target video, the specified filtering ratio of the first text and the second text may be increased to retain more commenting barrages. For another example, for the above scenario in which the second text is deleted from the comment-type barrage, in order to balance the number of question-answer barrages and comment-type barrages, the specified screening ratio of the first text and the second text may be further increased to ensure that the number of the first text is balanced with the total number of questions and answers.

And generating a corresponding target video based on the target text. In the present embodiment, the target text refers to a text corresponding to one event, and if the text used for generating the target video includes a plurality of events, the text is first divided into a plurality of target texts based on the events, and a video material for synthesizing the target video is determined for each target text.

First, a field corresponding to a target text is determined, in this embodiment, different texts correspond to different fields, for example, the different fields are divided according to literature topics, and then the different fields can be divided into a poetry field, a novel field, a music field, a movie field, and the like. For example, the classification according to the text name can be divided into a three-country rehearsal field, a red dream field, a journey to the West field, a water and sea transmission field, and the like. Each field has corresponding category information, wherein the category to which the text content with importance in the field belongs is a core category, and the categories to which other text content in the field belongs are non-core categories. Based on the category information of the domain, the target text may be divided into two types of text contents, i.e., a core text corresponding to a core category and a non-core text corresponding to a non-core category.

In this embodiment, the extraction model may be used to extract the core text and the non-core text in the target text. In some embodiments, a Named Entity Recognition (NER) model may be used to recognize and extract the first text in each text material, the NER model may be BERT, BLSTM, CRF, or the like, the NER model may recognize Entity nouns in the target text corresponding to the core category, and use the extracted Entity nouns as the core text. Further, in order to improve the accuracy of the core text extracted by the NER model, the extracted entity nouns may be modified according to the domain vocabularies corresponding to the domains to obtain the final core text. The domain vocabulary includes all text contents corresponding to the core category in the domain. For example, words corresponding to the core category may be extracted from all text stories corresponding to the domain by crawling or the like. Determining error entity nouns by matching the entity nouns extracted by the NER with each word in the field vocabulary, and judging the error types of the error entity nouns, wherein if the error types are partial errors, the error entity nouns are replaced by corresponding words in the field vocabulary; if the error type is all errors, the error entity nouns are rejected. In some embodiments, the modification process may be performed by an NER model, where the NER model is a noun recognition model trained with a modification function.

In some embodiments, a classification model may be used to identify and extract non-core text corresponding to non-core classes in the target text, where the classification model may be a BLSTM, CNN model, or the like, and the classification model may classify events, emotions, and the like described in the target text by a classification algorithm to determine classification labels corresponding to the target text (pre-trained labels in the classification model, e.g., classification labels corresponding to "event" classes include "horse riding", "fight", "talk", and the like, and classification labels corresponding to "emotion" classes include "happy", "angry", and the like), i.e., non-core text.

In addition, a text abstract of the target text needs to be obtained, in this embodiment, the text abstract refers to one or more sentences that can represent semantics of the target text in the target text, and a similarity between a vector formed by the sentences and a vector of the target text meets a vector similarity threshold, for example, the target text is "liu backup, hui yu, zhang fei in peaches meaning, three people are different names, namely brothers, then concentric assistance is provided, the danger is saved, and the target text is reported to the country and then to the third party. The text abstract of the target text is "Liu Bei, Guang Yu, Zhang Fei in the peach garden meaning" if one sentence which can represent the most semantic meaning of the target text is "Liu Bei, Guang Yu, Zhang Fei in the peach garden meaning".

Through the process, the video synthesizer can automatically and accurately obtain the core text, the non-core text and the text abstract of the target text.

And acquiring a video material library corresponding to the field, wherein the video material library comprises a plurality of video materials, and each video material is provided with a corresponding label and a description text. The tags are usually in a word form, one video material may have one or more tags, and the tags of each video material are subjected to disambiguation (e.g., by comparing the content similarity of each video material, unifying the tags corresponding to the video materials having the content similarity greater than or equal to a threshold into the same group of tags, and eliminating tags having a smaller recurring number from the tags corresponding to each video material, and/or by comparing the similarity between the tags corresponding to the same video material, unifying multiple tags having an excessively high similarity into the same tag, etc.) so as to ensure the accuracy and the simplicity of the tags of each video material. The description text is usually in the form of short sentences, and a video material may have one or more description texts, each description text has a relatively long number of characters, and the description text includes a plurality of words, and the plurality of words jointly complete the overall description of the video content of the video material through the word senses of the plurality of words and the sentence components corresponding to each word in the short sentence.

And extracting the target video material from the video material library according to the text similarity of the core text and the label of each video material, the probability similarity of the non-core text and the label of each video material and the sentence similarity of the text abstract and the description text of each video material. The specific process is as follows:

firstly, calculating the text similarity between the core text and each video material, and determining the video material with the text similarity higher than a preset similarity threshold value as a candidate video material.

Then, the probability similarity of the non-core text and the label of each candidate video material is calculated. Illustratively, the probability that the non-core text is classified into classification labels "battlefield", "outdoor" and "indoor" is 0.857, 0.143 and 0 respectively, the probability that the non-core text is classified into the classification labels "battlefield", "outdoor" and "indoor" is obtained through the classification model, the video material 1 is labeled "battlefield", the video material 2 is labeled "outdoor", the video material 3 is labeled "indoor", accordingly, the probability similarity between the non-core text and the label of the video material 1 is 0.857, the probability similarity between the non-core text and the label of the video material 2 is 0.143, and the probability similarity between the non-core text and the label of the video material 3 is 0.

And then calculating sentence similarity between the text abstract and the description text of each candidate video material. Specifically, a first sentence vector corresponding to a description text of the candidate video material and a second sentence vector corresponding to the text abstract are generated, and the sentence similarity between the description text and the text abstract is obtained by calculating the cosine similarity between the first sentence vector and the second sentence vector. Illustratively, the description text of the candidate video material is "flying to kill horse on battlefield", the text abstract is "flying to rush horse on battlefield", and by calculating the sentence similarity of the two, if the sentence similarity is greater than or equal to the similarity threshold, it is stated that the candidate video material can reflect the whole text content of the target text more accurately.

In order to improve the relevance among the core text, the non-core text and the text abstract, the matching degree, the probability similarity and the sentence similarity can be comprehensively calculated to obtain the content matching degree between the target text and the candidate video material.

And associating the core text and the non-core text obtained by the process, and jointly calculating to obtain the first similarity of the target text and the candidate video material. Specifically, the first similarity satisfies the following formula: a1 ═ xa score (a) + xb score (B), where a1 represents the first degree of similarity, score (a) represents the ratio between the core text and the labels of the candidate video materials, score (a) satisfies the formula score (a) ═ k 1C/a + k2 × C/B, where a represents the total number of occurrences of the core text in the target text, B represents the total number of occurrences of the labels corresponding to the core category in the labels of the candidate video materials, C represents the number of intersections of occurrences of the core text of the target text and the labels corresponding to the core category in the labels of the candidate video materials, k1 and k2 are coefficients, and k1+ k2 ═ 1, the numerical values of k1 and k2 may be set according to the actual emphasis, for example, the target text may be set to be more emphasized, k1 > k2 may be set, and if the candidate video materials are more emphasized, k1 < k2 may be set. score (b) represents the probability of each non-core text being classified into a label of a corresponding non-core category of labels of the candidate video material, xa and xb are weight values corresponding to score (a) and score (b), respectively, and the values of xa and xb can be set by themselves as required, but xa + xb ═ 1 needs to be guaranteed.

Further, the core text, the non-core text and the text abstract are associated, and a second similarity between the target text and the candidate video material, that is, a content matching degree between the target text and the candidate video material, can be jointly calculated. Specifically, the second similarity satisfies the following formula: a2 ═ Q1 ═ a1+ Q2 × P3, where a2 represents a second degree of similarity (content matching degree), a1 represents a first degree of similarity, P3 represents a sentence similarity of the text digest and the descriptive text, and Q1 and Q2 are weight values corresponding to a1 and P3, respectively, where Q1+ Q2 ═ 1, 0 ≦ Q1 ≦ 1, 0 ≦ Q2 ≦ 1, and the weight values Q1 and Q2 may be set by themselves, for example, if the comparison is focused on the detail information of the candidate video material, Q1 > Q2 may be set, and if the comparison is focused on the whole information of the candidate video material, Q2 > Q1 may be set. Accordingly, a content matching degree threshold is set, and if a3 is greater than or equal to the content matching degree threshold, the candidate video material may be determined to be the target video material, otherwise, the candidate video material is not the target video material.

And splicing the determined target video materials to obtain a target video corresponding to the target text. The target video obtained based on the above process comprehensively considers the matching degree of the text corresponding to different content categories in the target text and the tags of the video materials and the matching degree of the text abstract of the target text and the description text of the video materials, so as to ensure that the determined content of the target video materials and the target text accurately corresponds.

S5, displaying the question, the answer, the first text and the second text in the target video.

And synthesizing the target video generated in the process with the barrage, namely displaying the questions and answers in the question-and-answer barrage in the target video and displaying the first text and the second text in the comment barrage. Illustratively, the barrage may be displayed on a screen corresponding to the respective barrage material, such as a commenting barrage "Wa! "the corresponding barrage material is" flying to fight against the enemy on the battlefield ", and the video content of the video material 1 corresponding to the barrage material is" flying to fight against the enemy on the battlefield ", so that" java! "is displayed on the video material 1. Therefore, the correspondence between the barrage and the video content can be improved, and the accuracy of the content and emotion reflected by the barrage is improved. For the question-and-answer barrage, in order to give the user sufficient thinking time, the questions may be displayed separately from the answers, for example, the questions may be displayed on a screen corresponding to the corresponding barrage material, and the answers may be displayed on a separate screen following the last video material of the target video.

In the bullet screen generation method, the bullet screen can be generated based on the target text while synthesizing the target video corresponding to the target text, and the bullet screen is directly displayed on the target video, so that the problems that the user cannot enjoy the bullet screen when watching and the experience is low due to the bullet screen formed by no user input text are solved. Meanwhile, through the question-answer barrage and the comment barrage, the interactive experience between the user and the target video in different dimensions can be improved through richer barrage types.

Example 1

The target text is that the Liu preparation is defeated by the Lubu and can not go everywhere, the Liu preparation plants vegetables in a backyard of a residence every day to cover wild hearts, the Lu preparation in the dark has plan of seeking to remove Cao with the Tujiu bearing plan, and the Cao worker asks the Liu preparation to reach the banquet. The Liu preparation and the Caocao Qingmei wine theory are hero, Caocao requires that who produces the Liu preparation can calculate the hero, the Liu preparation considers that Yuan Shao Liu Shi and other human males are superior to the Yuan Shi Liu Shi, and the Cao manipulator considers that only the Liu preparation and the Cao manipulator are true hero in the world. Cao teaches the thirst quenching story of Tan Mei, Liu Bei is scared off chopsticks by thunder, Cao tauer Liu Bei Dan. "

The symbols are designated as "" and "". "with the designated symbol as a division point, the target text may be divided into 11 sentences, that is," where bang of liu equipment has been defeated by lubi cloth has never been done "," where he has planted his/her vegetables after living daily, where "where he has been planted in the house to disguise his/her wild", "where he has been planned to plan to remove caoguan with jiu orientation", "caoguan person asks for bang to remove bang", "where he has been proposed to use caoguan qing plum to boil wine hero", "caoguan ask who has worked when liu equipment to calculate hero", "where he has been proposed to use saint yuyuan to take advantage of heyuan etc." man "from caoguan" thinks that from haoguan and caoguan are really hero "," caoguan person says that you have died from caoguan "," where he has been threatened by thunder to use chopsticks "," caoguan guo yu you have small bile ".

Each sentence is converted into a sentence vector, and the similarity between each sentence and the remaining sentences is calculated, resulting in the importance of each sentence (calculation formula refers to equations (1) - (5)). The 11 sentences are divided into 5 groups based on the importance of the sentences, namely the group A comprises: "Yingxiong theory of Liu Bei and Cao Qing Mei Jiu", "Cao Zhi He who can calculate the Yingxiong theory of Liu Bei and Cao" and "Cao Zhi that only Liu Bei and Cao are really the Yingxiong theory of Liu Bei and Cao"; group B includes: "Liu Bei is thought to be Yuan Liu exterior, etc., Xian Ba Fang (a Chinese medicine for treating Liu exterior), Liu Bei is scared off chopsticks by thunder sound, Cao Cai Xiao Liu Bei Dan Xiao"; group C includes: "Liu Bei quilt Liangbu beat everywhere to get away", "Liu Bei Cao is planned to be consummated with Guojiu's orientation in dark", "Liu Bei is planted in a post-dwelling yard daily to cover wild heart"; group D includes: "Caocao sends out Liu Bei to banquet"; group E includes: "Caocao tells the story of quan Mei quenching thirst".

Taking all sentences in the group A and the group B and partial sentences in the group C as first-class sentences (materials of question-answering barrage), namely the first-class sentences comprise: "Yingxiong theory of Liu Bei and Cao Qing Mei Ji" and "Yingcao question who can calculate Yingxiong when Liu Bei is in the past", "Cao cao thinks that Liu Bei and Cao cao are really Yingxiong" and "Liu Bei thinks that Yuan Shao Liu Shi and other people" and "Liu Bei is threatened by thunder sound to fall chopsticks", "Cao Yuan Xiao Liu Bei Xiao and" Liu Bei is threatened by Lu Bu and cannot go; all sentences in the groups D and E and the remaining sentences in the group C are taken as a second type of sentence (material of the commenting barrage), namely the second type of sentence, and the second type of sentence comprises the following steps: 'Jiujia' is planned to eliminate Cao's operation between Liu Bei dark and Gujiu Dong, and' Jiujia 'is planned to cover wild love when Liu Bei is planted in the backyard of residence every day', and 'Cao's person please find Liu Bei to attend banquet 'and' Cao's narrative the story of Wan Mei to quench thirst'.

Adopt the UniLM model to generate the barrage text that corresponds with first type sentence and second type sentence to first type sentence "Cao ask Liu Bei who can calculate last hero when the world" for example, explain the process of generating question-answering barrage, input "Cao ask Liu Bei you can calculate last hero when the world to UniLM model," output is "the question: cao considers who is the true hero, the answer: liu Bei and Cao "; taking the second sentence "Liu Bei is planted in the backyard of the residence and covers the wild everyday" as an example, the process of generating the comment type bullet screen is explained, and "Liu Bei is planted in the backyard of the residence and covers the wild everyday" is input into the UniLM model, and the output is "a first text: machine intelligence, second text: true tragedy. Before the first text and the second text are output, the second text can be deleted, and only the first text is output, so that the diversity of the output comment type barrage is ensured. By controlling the pre-training process of the UniLM model, the quantity balance of the output question-answering barrage and the output comment barrage is ensured.

Based on the text content of the target text, the field of the target text can be determined to be the 'three kingdoms rehearsal' field, the core category corresponding to the 'three kingdoms rehearsal' field is 'character', the non-core categories are 'scene', 'emotion' and 'event', based on the core category and the non-core categories, the corresponding core text and the non-core text in the target text can be determined, and the text abstract of the target text also needs to be determined. The method comprises the steps of obtaining a video material library corresponding to the field of 'three kingdoms' rehearsal, firstly, calculating the matching degree of a core text in a target text and labels of all video materials in the video material library, and further determining candidate video materials. Then, the probability similarity between the non-core text of the target text and the tags of the candidate video materials is calculated, the sentence similarity between the text abstract of the target text and the description text of the candidate video materials is calculated, and the content similarity between the target text and each candidate video material is determined by comprehensively calculating the matching degree, the probability similarity and the sentence similarity (the calculation process refers to the above process of obtaining the content similarity based on the first similarity and the second similarity). And determines a target video material, for example, target video material 1, for synthesizing the target video from among the candidate video materials based on the respective content similarities.

And synthesizing the target video material 1 with each question-answer type barrage and each comment type barrage to obtain the target video. Illustratively, the question and comment type barrage (first text) in the question-and-answer type barrage is displayed on a screen corresponding to the corresponding barrage material, for example, the bullet screen material corresponding to the problem that the Cao considers who is the real hero is that the Cao considers that only Liu Bei and Cao are the real hero in nature, corresponding to the screen 1, as shown in fig. 6, the problem is displayed on the screen 1, for example, the bullet screen material corresponding to the first text "smart" is "liu reserve in the backyard of the residence every day to cover the wild heart", corresponding to the picture 2, the question is displayed on the picture 2 as shown in (c) of fig. 6, in order to leave enough time for the user to think about the question, the answer corresponding to the question is displayed as shown in (c) of fig. 6, for example "liu backup and cao" are displayed separately on a separate screen behind the target video material 1, for example on screen 3.

Therefore, a user does not need to input texts to the target video to form a corresponding barrage, and the target video generated based on the target text is directly provided with the corresponding barrage, so that the user who watches the target video firstly or can enjoy the barrage experience under the condition that no texts are input into other target videos before the user watches the target video.

Example 2

Based on embodiment 1, in embodiment 2, the answer in embodiment 1 can be played in an avatar. In embodiment 1, the answer is displayed on the last single screen of the target video, only the answer in the text form is displayed on the single screen, and when the user watches the answer, the user is equivalent to reading a text instead of a bullet screen due to the lack of simultaneous display of other screens, so that the experience is reduced. Therefore, the answers can be played through the virtual image, the virtual image is a dynamic image, and the virtual image is matched with the audio corresponding to the answers through displaying the body action and/or the face action corresponding to the answers so as to more vividly display the answers to the user. In some implementations, an avatar may be arranged directly on the separate screen, through which the answer is played. In some implementations, the avatar and the separate frame can be divided into two user interfaces for simultaneous display, so that the user can view both the answer played by the avatar and the answer of the text version.

In one embodiment, as shown in fig. 7, there is provided a bullet screen generating apparatus including: the system comprises an importance calculating module 1, a grouping module 2, a sentence dividing module 3, a generating module 4 and a display module 5, wherein:

the importance calculation module 1 is configured to obtain importance of each sentence in a target text, where the target text is a text for synthesizing a target video, and the importance is a weighted average of similarities between each sentence and other sentences in the target text;

the grouping module 2 is configured to divide all sentences in the target text into at least three groups according to the sequence of the importance degrees from high to low;

a sentence dividing module 3 configured to divide all sentences in the target text into a first class of sentences and a second class of sentences, wherein the first class of sentences are question-and-answer barrage materials, the second class of sentences are comment-type barrage materials, the first class of sentences refer to all sentences in one group with the highest ranking and partial sentences in the middle-positioned group, and the second class of sentences refer to all sentences in one group with the lowest ranking and the rest sentences in the middle-positioned group;

a generating module 4 configured to generate the target video based on the target text, generate corresponding questions and answers based on the first type of sentence, and generate a first text and a second text based on the second type of sentence;

a display module 5 configured to display the question, the first text, the second text and/or the answer in the target video.

In one embodiment, the importance calculation module 1 is further configured to: converting each sentence into a sentence vector; calculating cosine values and physical distances between the sentence vectors of each sentence and the sentence vectors of other sentences in the target text to obtain the similarity between each sentence and the other sentences in the target text; and calculating the weighted average of the similarity between each sentence and the rest sentences in the target text to obtain the importance of each sentence.

In one embodiment, as shown in fig. 8, the bullet screen generating device further includes a sentence dividing module 6, wherein: a sentence dividing module 6 configured to identify a specified symbol in the target text; dividing the target text into a plurality of sentences with the specified symbols.

In one embodiment, the generation module 4 is further configured to: generating the answer and the first text using stochastic decoding, and generating the question and the second text using deterministic decoding.

In one embodiment, the generation module 4 is further configured to: deleting the second text.

In one embodiment, the generation module 4 is further configured to: respectively calculating a first similarity of each group of the questions and the answers and a second similarity of each group of the first texts and the second texts; based on the first similarity and the second similarity, reserving the question, the answer, the first text and the second text according to a specified screening proportion, wherein the specified screening proportion means that the number of reserved groups of the first text and the second text is larger than the number of reserved groups of the question and the answer.

According to another aspect of the embodiments of the present application, there is also provided an electronic device for implementing the bullet screen generating method, where the electronic device may be, but is not limited to be, applied in a server. As shown in fig. 9, the electronic device comprises a memory 100 and a processor 200, wherein the memory 100 stores a computer program, and the processor 200 is configured to execute the steps of any of the above method embodiments by the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, obtaining the importance of each sentence in a target text, wherein the target text is a text for synthesizing a target video, and the importance is a weighted average of the similarity of each sentence and the rest sentences in the target text;

s2, dividing all sentences in the target text into at least three groups according to the sequence of the importance degrees from high to low;

s3, dividing all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answering barrage materials, the second type of sentences are comment barrage materials, the first type of sentences refer to all sentences in one group with the highest sequence and partial sentences in the middle group, and the second type of sentences refer to all sentences in one group with the lowest sequence and the rest sentences in the middle group;

s4, generating the target video based on the target text, generating corresponding questions and answers based on the first type of sentences, and generating first texts and second texts based on the second type of sentences;

s5, displaying the question, the first text, the second text and/or the answer in the target video.

The specific process of executing S1 includes:

s11, converting each sentence into a sentence vector;

s12, calculating cosine values and physical distances between the sentence vectors of each sentence and the sentence vectors of other sentences in the target text to obtain the similarity between each sentence and other sentences in the target text;

and S13, calculating the weighted average of the similarity between each sentence and the rest sentences in the target text to obtain the importance of each sentence.

Further performing, before the S1:

s01, identifying a designated symbol in the target text;

The specific process of executing S4 includes:

and S41, generating the answer and the first text by adopting stochastic decoding, and generating the question and the second text by adopting deterministic decoding.

The specific process of executing S4 further includes:

s411, deleting the second text.

The specific process of executing S4 further includes:

s412, respectively calculating a first similarity of each group of questions and answers and a second similarity of each group of first texts and second texts;

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 9 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 9 does not limit the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 9, or have a different configuration than shown in FIG. 9.

The memory 100 may be used to store software programs and modules, such as program instructions/modules corresponding to the bullet screen generating method and apparatus in the embodiment of the present application, and the processor 200 executes various functional applications and data processing by running the software programs and modules stored in the memory 100, that is, implements the bullet screen generating method. Memory 100 may include high speed random access memory and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 100 may further include memory located remotely from the processor 200, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 100 may be, but is not limited to, specifically used for storing program steps of the bullet screen generating method.

Alternatively, the transmission device 300 implementing the network connection function described above is used to receive or transmit data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 300 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices so as to communicate with the internet or a local area Network. In one example, the transmission device 300 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 400 for displaying the bullet screen generation process; and a connection bus 500 for connecting the respective module parts in the above-described electronic apparatus.

Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the steps in any of the bullet screen generation method embodiments described above when running.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

step 3, dividing all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answering barrage materials, the second type of sentences are comment barrage materials, the first type of sentences refer to all sentences in one group with the highest sequence and partial sentences in the middle group, and the second type of sentences refer to all sentences in one group with the lowest sequence and the rest sentences in the middle group;

The storage medium described above may be configured to store a computer program for executing specific steps of:

s11, converting each sentence into a sentence vector;

before the step S1, the following steps are also executed:

s01, identifying a designated symbol in the target text;

Wherein the storage medium may be configured to store a computer program for executing the following specific steps:

s411, deleting the second text.

Optionally, the storage medium is further configured to store a computer program for executing the steps included in the method in the foregoing embodiment, which is not described in detail in this embodiment.

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A bullet screen generation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the obtaining the importance of each sentence in the target text comprises:

converting each sentence into a sentence vector;

3. The method according to claim 1, further comprising, before said obtaining the importance of each sentence in the target text:

identifying a designated symbol in the target text;

4. The method of claim 1, wherein generating the target video based on the target text and generating the corresponding question and answer based on the first type of sentence, and wherein generating the first text and the second text based on the second type of sentence comprises:

5. The method of claim 4, wherein generating the answer and the first text using stochastic decoding, and wherein generating the question and the second text using deterministic decoding comprises:

deleting the second text.

6. The method of claim 5, wherein generating the answer and the first text using stochastic decoding, and wherein generating the question and the second text using deterministic decoding comprises:

7. A bullet screen generating device, characterized in that the device comprises:

8. A bullet screen display method, characterized in that, in the target video of any one of claims 1-6, the answer is played through a preset avatar.

9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 6 when executed.

10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6.