CN114297354B - Bullet screen generation method and device, storage medium and electronic device - Google Patents

Bullet screen generation method and device, storage medium and electronic device Download PDF

Info

Publication number
CN114297354B
CN114297354B CN202111460420.4A CN202111460420A CN114297354B CN 114297354 B CN114297354 B CN 114297354B CN 202111460420 A CN202111460420 A CN 202111460420A CN 114297354 B CN114297354 B CN 114297354B
Authority
CN
China
Prior art keywords
text
sentences
sentence
target
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111460420.4A
Other languages
Chinese (zh)
Other versions
CN114297354A (en
Inventor
司马华鹏
华冰涛
汤毅平
汪成
孙雨泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Silicon Intelligence Technology Co Ltd
Original Assignee
Nanjing Silicon Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Silicon Intelligence Technology Co Ltd filed Critical Nanjing Silicon Intelligence Technology Co Ltd
Priority to CN202111460420.4A priority Critical patent/CN114297354B/en
Publication of CN114297354A publication Critical patent/CN114297354A/en
Application granted granted Critical
Publication of CN114297354B publication Critical patent/CN114297354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a barrage generation method and device, a storage medium and an electronic device, which can synthesize a target video corresponding to a target text, generate barrages based on the target text and directly display the barrages on the target video. The technical problem that in the related art, a user cannot enjoy the barrage when watching due to the barrage formed by the text input nothing is solved, and the experience is low. Meanwhile, through the question-answer barrage and the comment barrage, interaction experience between the user and the target video in different dimensions can be improved through richer barrage types.

Description

Bullet screen generation method and device, storage medium and electronic device
Technical Field
The application relates to the technical field of video production, in particular to a bullet screen generating method and device, a storage medium and an electronic device.
Background
The barrage is a text displayed on a video in the video playing process, and is one of ways of interaction between users and video contents and between users. The related art barrage is entirely dependent on text input by a user, that is, only a user inputs text to a video on which a barrage corresponding to the text can be displayed. However, if no text is entered into the video by the user before the user views the video, e.g., the user is the first user to view the video, or no text is entered by the user who views the video before the user views the video, then no pop-up will appear on the video viewed by the user and the user cannot enjoy the pop-up experience.
Aiming at the problem that in the related art, due to the bullet screen formed by no user input text in the video playing process, the experience of watching the video by a user is poor, no effective solution has been proposed in the related art.
Disclosure of Invention
The application provides a barrage generation method and device, a storage medium and an electronic device, which at least solve the technical problem that in the related art, because of the barrage formed by text input which is not used in the video playing process, the experience of a user watching a video is poor.
In one embodiment of the present application, a barrage generation method is provided, including:
acquiring importance of each sentence in a target text, wherein the target text is used for synthesizing a target video, and the importance is a weighted average of the similarity of each sentence and other sentences in the target text;
dividing all sentences in the target text into at least three groups according to the order of the importance from high to low;
dividing all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answer barrage materials, the second type of sentences are comment barrage materials, the first type of sentences are all sentences in a group with highest ranking and partial sentences in a group with middle ranking, and the second type of sentences are all sentences in a group with lowest ranking and the rest sentences in the group with middle ranking;
Generating the target video based on the target text, generating corresponding questions and answers based on the first type sentences, and generating a first text and a second text based on the second type sentences;
displaying the question, the first text, the second text and/or the answer in the target video.
In one implementation, the obtaining the importance of each sentence in the target text includes:
converting each sentence into a sentence vector;
calculating cosine values and physical distances between the sentence vectors of each sentence and the sentence vectors of the rest sentences in the target text, and obtaining the similarity between each sentence and the rest sentences in the target text;
and calculating a weighted average of the similarity between each sentence and the rest sentences in the target text, and obtaining the importance of each sentence.
In one implementation, before the obtaining the importance of each sentence in the target text, the method further includes:
identifying a designated symbol in the target text;
the target text is divided into a plurality of sentences with the specified symbols.
In one implementation, the generating the target video based on the target text, and generating the corresponding questions and answers based on the first type of sentence, and generating the first text and the second text based on the second type of sentence includes:
And generating the answer and the first text by adopting stochastic decoding, and generating the question and the second text by adopting deterministic decoding.
In one implementation, the generating the answer and the first text using stochastic decoding and the generating the question and the second text using deterministic decoding includes:
and deleting the second text.
In one implementation, the generating the answer and the first text using stochastic decoding and the generating the question and the second text using deterministic decoding includes:
respectively calculating a first similarity between each group of questions and the answers and a second similarity between each group of the first texts and the second texts;
and reserving the question, the answer, the first text and the second text according to a specified screening proportion based on the first similarity and the second similarity, wherein the specified screening proportion means that the reserved number of groups of the first text and the second text is larger than the reserved number of groups of the question and the answer.
In one embodiment of the present application, there is also provided a barrage generation apparatus, the apparatus including:
the importance calculating module is configured to acquire the importance of each sentence in a target text, wherein the target text is used for synthesizing a target video, and the importance is a weighted average value of the similarity of each sentence and the rest sentences in the target text;
A grouping module configured to divide all sentences in the target text into at least three groups in the order of the importance from high to low;
the sentence dividing module is configured to divide all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answer barrage materials, the second type of sentences are comment barrage materials, the first type of sentences are all sentences in one group with highest ranking and partial sentences in the group with middle ranking, and the second type of sentences are all sentences in one group with lowest ranking and the rest sentences in the group with middle ranking;
the generation module is configured to generate the target video based on the target text, generate corresponding questions and answers based on the first class sentences, and generate a first text and a second text based on the second class sentences;
and the display module is configured to display the question, the first text, the second text and/or the answer in the target video.
In an embodiment of the present application, a barrage display method is further provided, where in the target video according to any one of the above method embodiments, the answer is played through a preset avatar.
In an embodiment of the application, a computer-readable storage medium is also presented, in which a computer program is stored, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
In an embodiment of the application, an electronic device is also presented comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the embodiment of the application, the bullet screen can be generated based on the target text while the target video corresponding to the target text is synthesized, and the bullet screens are directly displayed on the target video. The technical problem that in the related art, a user cannot enjoy the barrage when watching due to the barrage formed by the text input nothing is solved, and the experience is low. Meanwhile, through the question-answer barrage and the comment barrage, interaction experience between the user and the target video in different dimensions can be improved through richer barrage types.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is an interactive schematic diagram of a prior art barrage generation method;
FIG. 2 is a flowchart of an alternative barrage generation method according to an embodiment of the application;
FIG. 3 is a flowchart of an alternative method of dividing sentences according to an embodiment of the present application;
FIG. 4 is a flowchart of an alternative method of calculating sentence importance in accordance with an embodiment of the present application;
FIG. 5 is a flow chart of an alternative barrage screening method according to an embodiment of the application;
FIG. 6 is a schematic illustration of an alternative barrage display according to an embodiment of the application;
FIG. 7 is a schematic diagram of an alternative barrage generation device according to an embodiment of the application;
FIG. 8 is a schematic diagram of an alternative barrage generation device according to an embodiment of the application;
fig. 9 is a schematic diagram of an alternative electronic device according to an embodiment of the application.
Detailed Description
The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
The video creator synthesizes a target video a based on the target text A and displays the text content of the target text A through the target video a, and the video creator publishes the target video a to a website for a user to watch. When the user 1 views the target video a, if the user 1 is the first user to view the target video a, or if the user 2 views the target video a before the user 1 views the target video a, but the user 2 does not input text to the target video a, the target video a viewed by the user 1 displays only the video content of the target video a (text content corresponding to the target text a) without a bullet screen as shown in (1) of fig. 1. In the process of watching the target video a, the user 1 inputs a text A1 to the target video a, wherein the text A1 can be the viewpoint of the user 1 on the target video a, so that the interaction between the user 1 and the target video a is realized. Accordingly, a barrage corresponding to the text A1 will be generated on the target video a. If the user 3 views the target video a after the user 1, at this time, the target video a viewed by the user 3 is displayed not only as shown in (2) in fig. 1, but also as a bullet screen (text A1), showing the video content of the target video a (text content corresponding to the target text a). The user 3 can understand the view of other users (user 1) on the target video a by browsing the barrage so as to realize the interaction between the user 3 and other users (user 1), and the user 3 can quickly grasp the information related to the target video a through the barrage, so that the watching has more directionality.
In the above scenario, since the target video a watched by the user 1 does not have a barrage, the user 1 cannot enjoy the barrage experience, resulting in poor experience of watching the target video a by the user 1. In order to solve the above problems, as shown in fig. 2, an embodiment of the present application provides a barrage generation method, including:
s1, obtaining importance of each sentence in a target text, wherein the target text is a text used for synthesizing a target video, and the importance is an average value of similarity of each sentence and other sentences in the target text.
In this embodiment, the target text is a text set by the video creator, and the video generated by the target text is the target video, where the text content of the target text is consistent with the video content of the target video, so that the text content of the target text can be displayed by playing the target video, so that the target text is more tangible and easy to be understood by a user.
In this embodiment, the importance of a sentence is used to reflect the degree of association between a sentence and the rest of sentences in the text where the sentence is located, and the higher the degree of association between a sentence and the rest of sentences, the higher the importance of the sentence can be considered to be the center of the rest of sentences, wherein the degree of association between sentences can be represented by the similarity of sentences, that is, the more similar two sentences, the higher the degree of association of two sentences, and conversely, the less similar two sentences, the lower the degree of association of two sentences. Thus, the importance of a sentence can be represented by the similarity between that sentence and the rest of the sentences in the text in which it is located. Specifically, the importance of each sentence satisfies the formula (1):
Wherein, imp i Represents the importance of the ith sentence in the target text, n represents the total n sentences of the target text, and SIM ij Representing the similarity between the ith sentence and the rest of sentences in the target text.
In this embodiment, before calculating the importance of each sentence, n sentences in the target text are first determined, as shown in fig. 3, and the embodiment of the present application provides a method for dividing sentences, which includes:
s01, identifying a designated symbol in the target text.
S02, dividing the target text into a plurality of sentences by the designated symbols.
For example, the designated symbol corresponds to a division point of sentence division, and may be set as follows. ","; ", I! "? The "punctuation marks" may also be set to "1, 2, 3," first, second, third, ", etc., and in this embodiment, the specific symbols are not limited, and only it is required to ensure that text between two adjacent specific symbols can represent a relatively complete semantic meaning. After all the specified symbols are identified, dividing the target text by taking each specified symbol as a division point, wherein the text between two adjacent specified symbols is a sentence after division, the text between the first character of the target text and the first specified symbol is a sentence, and the text between the last specified symbol of the target text and the last character of the target text is a sentence.
In some embodiments, the above process of dividing the target text into n sentences may be accomplished by a model. For example, the pre-training process of the model may be: and taking mass text materials as input, wherein each text material comprises at least two sentences, a designated symbol exists between each sentence, and each sentence is taken as output, so that the model is trained into a model capable of identifying the designated symbol in the text and dividing the sentences according to the designated symbol.
After n sentences are obtained by the method shown above, the similarity between each sentence and the rest of the sentences is calculated based on the sentence vector of each sentence, as shown in fig. 4, specifically as follows:
s11, converting each sentence into a sentence vector.
By converting sentences into sentence vectors and calculating the similarity between sentences based on the sentence vectors, the sentences can be quantized, so that the calculation accuracy of the sentence similarity is effectively improved. N sentences may be converted into sentence vectors, respectively, by a corresponding model, which may employ word2vec, glove, bert, for example.
S12, calculating cosine values and physical distances between the sentence vectors of each sentence and the sentence vectors of the rest sentences in the target text, and obtaining the similarity between each sentence and the rest sentences in the target text.
In the present embodiment, in order to ensure the calculation accuracy of the similarity between each sentence and the rest of sentences, that is, the SIM in formula (1), can be obtained by comprehensively calculating the cosine values between sentences and the physical distances between sentences ij Satisfy formula (2):
SIM ij =sim ij *pos ij (2)
wherein sim is ij Representing cosine values between the ith sentence and the jth sentence, pos ij Representing the physical distance between the ith sentence and the jth sentence.
Wherein sim is ij Satisfy formula (3):
sim ij =cos(V i 、V j ) (3)
wherein V is i Sentence vector representing the ith sentence, V j A sentence vector representing the j-th sentence.
pos ij Satisfy formula (4):
wherein d ij The physical distance between the ith sentence and the jth sentence is represented, n represents the number of all sentences in the target text, and the physical distance between the ith sentence and the jth sentence is divided by the number n of all sentences to be used as the final physical distance, so that the influence caused by the physical distance can be weakened, and the influence degree of the sentences with the closer physical distance on the importance of the ith sentence is particularly reduced.
S13, calculating the average value of the similarity between each sentence and the rest sentences in the target text, and obtaining the importance of each sentence.
By arranging the formulas (1), (2), (3) and (4), the similarity between each sentence and the rest sentences in the target text can be obtained to meet the formula (5):
based on the above process, the importance of each sentence in the target text can be accurately calculated.
S2, dividing all sentences in the target text into at least three groups according to the order of the importance level from high to low.
As can be seen from the above description of the importance of the sentence, the higher the association degree between the sentence and the rest of the sentences, the more the sentence can represent the main content of the rest of the sentences, and accordingly, the higher the importance of the sentence, the more the sentence can be considered to reflect the main content of the text in which the sentence is located. The bullet screen is a view, attitude, etc. generated by the user based on the video content, and in this embodiment, the bullet screen is directly carried in the target video, and should reflect the video content of the target video, and since the video content of the target video is consistent with the text content of the target text, the bullet screen can be directly generated based on the text content of the target text. In order to increase the effectiveness of the generated barrage, i.e., to make the generated barrage reflect the primary content of the target text, the barrage may be generated based on the importance of the sentence.
According to the importance of sentences, n sentences may be first roughly divided into at least three groups, namely, one group of higher importance, one group of lower importance, and one group of intermediate importance. N sentences can also be subdivided into more groups, for example five groups, according to actual requirements. Each clause in the group with higher importance can represent the most main text content of the target text, and the importance degree of the text content represented by each clause in the rest groups is gradually reduced.
In this embodiment, n sentences may be grouped in a number-sharing manner, or n sentences may be grouped in a specified proportion, for example, the proportion of division is set according to a relationship proportional to the importance of the grouping, so as to ensure that the number of sentences in the grouping with higher importance is greater, so as to improve the accuracy of reflecting the text content of the target text.
S3, dividing all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answer barrage materials, the second type of sentences are comment barrage materials, the first type of sentences are all sentences in a group with highest ranking and partial sentences in a group with middle ranking, and the second type of sentences are all sentences in a group with lowest ranking and the rest sentences in the group with middle ranking.
And (3) taking each sentence in each group obtained in the step (2) as a barrage material for generating barrages, wherein the barrages can be divided into two types, namely question-answer barrages and comment barrages. In this embodiment, the question-answering barrage has substantial content, for example, a sentence is "today is sunny", and the question-answering barrage is "question: what is today? "," answer: and (5) sunny days. "comment type barrages do not have substantial content and typically represent emotion, e.g., sentences are" fly on battlefield to kill the enemy ", comment type barrages are" Java-! "true-! "etc. Information related to the video content of the target video can be transmitted to the user through the question-answering barrage, so that the user can know the specific video content of the target video to a certain extent, for example, through the question-answering barrage' problem: what is today? "," answer: and (5) sunny days. The user can know that the target video is necessarily related to weather, and the thinking of the user can be induced through the question-answering mode, so that the attention of the user to the target video is improved. Through comment type barrages, emotion related to video content of target video can be transferred to users, so that the users can grasp emotion base of the target video to reconcile the spirit to be transferred, for example, through comment type barrages' Java-! "true-! The user can know that the emotion conveyed by the target video is excited, and can guide the user to watch the emotion, so that emotion resonance of the user is caused, and the experience of the user is improved.
In order to ensure the viewing experience of the user, the above two types of barrages are generated based on the target text, and in this embodiment, sentences for generating the question-answer barrages are referred to as first types of sentences which will be materials for generating the question-answer barrages, and sentences for generating the comment-type barrages are referred to as second types of sentences which will be materials for generating the comment-type barrages.
As can be seen from the above description of the question-answer barrage and the comment barrage, the question-answer barrage can reflect the video content of the target video more significantly than the comment barrage, based on which, a sentence with higher importance is used as a first type sentence, a sentence with lower importance is used as a second type sentence, a part of sentences with intermediate importance are used as the first type sentences, and another part of sentences with intermediate importance is used as the second type sentences.
For example, n sentences are divided into five groups, namely A, B, C, D, E groups in order according to the importance ranking, wherein all sentences in two groups a and B with higher importance and a part of sentences in a group C with middle importance are used as first-class sentences, all sentences in two groups D and E with lower importance and another part of sentences in a group C with middle importance are used as second-class sentences.
S4, generating the target video based on the target text, generating corresponding questions and answers based on the first type sentences, and generating a first text and a second text based on the second type sentences.
And (3) generating a question-answer barrage based on the first type of sentences determined in the step (3), and generating a comment barrage based on the second type of sentences determined in the step (3). Wherein two texts are generated based on each first type sentence, namely a question and an answer, and the question corresponds to the answer, and two texts are generated based on each second type sentence, namely a first text and a second text, and the first text and the second text represent the same emotion.
In some embodiments, the questions and answers, and the first text and the second text may be generated using respective models, illustratively, a UNIfied pre-trained language model (UniLM), wherein the UniLM model is modified based on a bi-directional transcoding renderer (Bidirectional Encoder Representations from Transformers, BERT). First, a UniLM model is pre-trained with a first type of bullet screen material sample (bullet screen material+questions and answers corresponding to the bullet screen material), and a second type of bullet screen material sample (bullet screen material+first text and second text corresponding to the bullet screen material). Describing a process of pre-training a first type of barrage material sample, the first type of barrage material sample is "input: sentence a; and (3) outputting: labeling and splicing the first type bullet screen material samples through the BERT model to obtain a [ CLS ] sentence A [ SEP ] answer a [ SEP ] problem a [ SEP ] ", and executing a sequence-to-sequence (Seq 2 Seq) task according to a special mode (using different self-sequence mask) through the UniLM model, wherein the sequence of an input part is bidirectional, the sequence of an output part is unidirectional, and the first type bullet screen material samples pretrain the model according to the above process. Describing a process of pre-training the second type bullet screen material sample, wherein the second type bullet screen material sample is input: sentence B; and (3) outputting: the first text b+the second text B ', marking and splicing the second type bullet screen material samples through a UniLM model to obtain a [ CLS ] sentence B [ SEP ] second text B [ SEP ] first text B [ SEP ]', executing the Seq2Seq task through the UniLM model according to a special mode (using different self-saturation mask), wherein the saturation of an input part is bidirectional, the saturation of an output part is unidirectional, and pre-training the model by each second type bullet screen material sample according to the above process. When the UniLM model obtained through pre-training is used, bullet screen materials (first sentences and second sentences) are input, questions and answers, and first texts and second texts are output, so that bullet screens finally displayed on target videos are obtained.
Through the UniLM model, the first type sentences and the second type sentences can be automatically classified, namely P (answers, questions I of the first type sentences) and P (second texts, first texts I of the second type sentences) are calculated, and therefore various bullet screens can be accurately and rapidly obtained.
In some embodiments, to ensure diversity of answers and comment type barrages in a question-answer type barrage while ensuring certainty of the emotion of questions and comment type barrages in a question-answer type barrage, a deterministic decode is used to generate questions and second text (the second text is used to determine emotion) and a random decode is used to generate answers and first text during the decoding process. Illustratively, the deterministic decoding chooses a beam_search strategy and the random decoding chooses a random_search strategy.
Further, since the second text lacks diversity, the second text may be deleted, leaving only the first text as the final comment bullet screen.
In some embodiments, to control the number ratio of question-answer barrages and comment barrages, the data ratio of barrage material samples of the pre-training UniLM model may be controlled, and the decoding strategy may be modified at the same time, and the flow shown in fig. 5 may be referred to as follows:
S412, respectively calculating a first similarity between each group of questions and the answers and a second similarity between each group of the first texts and the second texts.
The similarity of the two texts generated in the random_search strategy, that is, the similarity (first similarity) of the questions and the answers of the same group, the similarity (second similarity) of the first text and the second text of the same group, may be compared, for example, by calculating the similarity between cosine values of the two texts. In general, the similarity between the answer and the question is high, and the similarity between the first text and the second text is low, and thus, whether the two texts are the question and the answer or the first text and the second text can be distinguished by the similarity.
S413, reserving the question, the answer, the first text and the second text according to a specified screening proportion based on the first similarity and the second similarity, wherein the specified screening proportion means that the reserved number of groups of the first text and the second text is larger than the reserved number of groups of the question and the answer.
In this embodiment, the specified screening proportion may be set according to the actual requirement, for example, in order to improve the attention of the user to the target video, the specified screening proportion of the questions and the answers may be improved, so as to keep more question-answer barrages. For another example, to increase the user's interest in the target video, the specified filtering scale of the first text and the second text may be increased to preserve more comment-on barrages. For another example, for the above scene of deleting the second text from the comment type barrage, in order to equalize the numbers of question-answer type barrages and comment type barrages, the specified screening proportion of the first text and the second text may be further increased, so as to ensure that the number of the first text is balanced with the total number of questions and answers.
And generating corresponding target video based on the target text. In the present embodiment, the target text refers to a text corresponding to one event, and if the text for generating the target video includes a plurality of events, the text is first divided into a plurality of target texts based on the events, and video material for synthesizing the target video is determined for each target text.
Firstly, determining the domain corresponding to the target text, in this embodiment, different texts correspond to different domains, for example, according to literature subject division, the domains can be divided into poetry domain, novel domain, music domain, film domain and the like. For example, the text name can be divided into three kingdoms, red building dream, western tour, and waterside transmission. Each field has corresponding category information, wherein the category to which the text content important in the field belongs is a core category, and the category to which other text content in the field belongs is a non-core category. Based on the category information of the domain, the target text may be divided into two types of text contents, i.e., core text corresponding to a core category and non-core text corresponding to a non-core category.
In this embodiment, the extraction model may be used to extract core text and non-core text in the target text. In some embodiments, a named entity recognition (Named Entity Recognition, NER) model may be used to identify and extract the first text in each text material, the NER model may be a BERT, BLSTM, CRF model, etc., the NER model may identify the entity noun in the target text that corresponds to the core category, and the extracted entity noun may be used as the core text. Further, in order to improve the accuracy of the core text extracted by the NER model, the extracted entity nouns may be modified according to the domain word list corresponding to the domain, so as to obtain the final core text. The domain vocabulary comprises all text contents corresponding to the core category in the domain. For example, the words corresponding to the core category may be extracted from all text materials corresponding to the field by crawling or the like. Determining a wrong entity noun by matching each word in the entity noun and the domain word list extracted by the NER, and judging the error type of the wrong entity noun, wherein if the error type is partial error, the wrong entity noun is replaced by a corresponding word in the domain word list; if the error types are all errors, the entity noun with the error is rejected. In some embodiments, the correction process may be performed by a NER model, which is a recognition model of entity nouns trained to have correction functions.
In some embodiments, a classification model may be used to identify and extract non-core text corresponding to non-core categories in the target text, where the classification model may be a BLSTM, CNN model, or the like, and the classification model may classify events, moods, or the like described by the target text by a classification algorithm to determine classification labels corresponding to the target text (pre-trained labels in the classification model, e.g., classification labels corresponding to "event" categories including "riding", "fight", "talking", or the like, and classification labels corresponding to "moods" including "happy", "unhappy", "anger", or the like), i.e., non-core text.
In addition, a text abstract of the target text needs to be obtained, in this embodiment, the text abstract refers to one or more sentences in the target text, which can represent the semantics of the target text, and the similarity between the vectors formed by the sentences and the vectors of the target text accords with a vector similarity threshold, and the target text is "Liu Bei, guanyu, zhang at a peach park, three people are different names, namely brothers, and then the three people are in concentric harmony, rescue and endanger, report to the country, and lower an safe and Li. One sentence which can most represent the semantic meaning of the target text is "Liu Bei, guanyu, zhang Fei at the peach park meaning", and the text abstract of the target text is "Liu Bei, guanyu, zhang Fei at the peach park meaning".
Through the process, the video synthesizer can automatically and accurately acquire the core text, the non-core text and the text abstract of the target text.
And acquiring a video material library corresponding to the field, wherein the video material library comprises a plurality of video materials, and each video material has a corresponding label and descriptive text. The labels are usually in the form of words, one video material may have one or more labels, and each label of the video material is subjected to disambiguation (for example, by comparing the content similarity of each video material, unifying the labels corresponding to the video materials with the content similarity greater than or equal to a threshold value into a same group of labels, and eliminating the labels with fewer reproductions in the labels corresponding to each video material, and/or by comparing the similarity between the labels corresponding to each video material, unifying a plurality of labels with higher similarity into a same label, etc.), so as to ensure the accuracy and the simplicity of the labels of each video material. The descriptive text is usually in the form of a short sentence, and one video material may have one or more descriptive texts, each of which has a relatively long number of characters, and the descriptive text includes a plurality of words, and the overall description of the video content of the video material is completed by the plurality of words together with corresponding sentence components of each word in the short sentence by means of word senses of the plurality of words.
And extracting the target video material from the video material library according to the text similarity between the core text and the label of each video material, the probability similarity between the non-core text and the label of each video material and the sentence similarity between the text abstract and the descriptive text of each video material. The specific process is as follows:
firstly, calculating the text similarity between a core text and each video material, and determining the video material with the text similarity higher than a preset similarity threshold as a candidate video material.
Then, the probability similarity of the non-core text and the labels of the candidate video materials is calculated. For example, the non-core text is "battlefield", the probability that the non-core text is classified into the classification labels "battlefield", "outdoor" and "indoor" by the classification model is respectively 0.857, 0.143 and 0, the video material 1 is labeled "battlefield", the video material 2 is labeled "outdoor", the video material 3 is labeled "indoor", accordingly, the probability similarity of the non-core text and the label of the video material 1 is 0.857, the probability similarity of the non-core text and the label of the video material 2 is 0.143, and the probability similarity of the non-core text and the label of the video material 3 is 0.
And then, calculating the sentence similarity between the text abstract and the descriptive text of each candidate video material. Specifically, a first sentence vector corresponding to the descriptive text of the candidate video material and a second sentence vector corresponding to the text abstract are generated, and the sentence similarity between the descriptive text and the text abstract is obtained by calculating the cosine similarity between the first sentence vector and the second sentence vector. By way of example, the description text of the candidate video material is "fly on the battlefield to kill the horse", the text abstract is "fly on the battlefield to fly on the horse", and the sentence similarity of the two is calculated, so that if the sentence similarity is greater than or equal to the similarity threshold value, the candidate video material can accurately reflect the whole text content of the target text.
In order to improve the relevance among the core text, the non-core text and the text abstract, the matching degree, the probability similarity and the sentence similarity can be comprehensively calculated to obtain the content matching degree between the target text and the candidate video materials.
And correlating the core text and the non-core text obtained by the process, and jointly calculating to obtain the first similarity between the target text and the candidate video material. Specifically, the first similarity satisfies the following formula: a1 Score (a) =k1×c/a+k2×c/B, where a represents the total number of core texts appearing in the target text, B represents the total number of labels appearing in the labels of the candidate video material corresponding to the core category, C represents the number of intersections of the core texts of the target text with the labels of the candidate video material appearing in the core category, k1 and k2 are coefficients, and k1+k2=1, and the values of k1 and k2 may be set according to the actual emphasis, for example, k1 > k2 may be set if the target text is more emphasized, and k1 < k2 may be set if the candidate video material is more emphasized. score (b) represents the probability that each non-core text is categorized into a label of a corresponding non-core category in the labels of the candidate video material, xa and xb are weight values corresponding to score (a) and score (b), respectively, and the values of xa and xb can be set as required, but it is required to ensure xa+xb=1.
Further, the core text, the non-core text and the text abstract are associated, so that the second similarity between the target text and the candidate video material, namely the content matching degree between the target text and the candidate video material, can be obtained through calculation. Specifically, the second similarity satisfies the following formula: a2 The term "Q1". Gtoreq 1+q2 ". Gtoreq 3, where A2 represents a second similarity (content matching degree), A1 represents a first similarity, P3 represents a sentence similarity of a text abstract and a descriptive text, Q1 and Q2 are weight values corresponding to A1 and P3, respectively, where q1+q2=1, 0.ltoreq.q1, 0.ltoreq.q2, and 1, respectively, the weight values Q1 and Q2 may be set by themselves, for example, Q1 > Q2 may be set if the comparison focuses on the detailed information of the candidate video material, and Q2 > Q1 may be set if the comparison focuses on the whole information of the candidate video material. Correspondingly, setting a content matching degree threshold, if A3 is greater than or equal to the content matching degree threshold, determining that the candidate video material is a target video material, otherwise, determining that the candidate video material is not the target video material.
And splicing the determined target video materials to obtain a target video corresponding to the target text. The target video obtained based on the process comprehensively considers the matching degree of the labels of the texts corresponding to different content categories and the video materials in the target text and the matching degree of the text abstract of the target text and the description text of the video materials so as to ensure that the determined target video materials accurately correspond to the contents of the target text.
And S5, displaying the question, the answer, the first text and the second text in the target video.
And synthesizing the target video generated in the process with the barrage, namely displaying the questions and the answers in the question-answer barrage in the target video, and displaying the first text and the second text in the comment barrage. For example, a bullet screen may be displayed on a screen corresponding to the corresponding bullet screen material, such as the comment bullet screen "Java-! The corresponding barrage material is ' fly in battlefield to kill the enemy ', the video content of the video material 1 corresponding to the barrage material is ' fly in battlefield to kill the enemy ', the ' Java! "displayed on the video material 1". Therefore, the correspondence between the barrage and the video content can be improved, and the accuracy of the content and emotion reflected by the barrage can be improved. For question-and-answer barrages, questions may be displayed separately from answers in order to give the user sufficient thinking time, e.g., questions may be displayed on a screen corresponding to the corresponding barrage material and answers displayed on a separate screen after the last video material of the target video.
In the bullet screen generating method, the bullet screens can be generated based on the target text while synthesizing the target video corresponding to the target text, and the bullet screens are directly displayed on the target video, so that the problems that the user cannot enjoy the bullet screens when watching and the experience is low due to the bullet screens formed by no user input text are solved. Meanwhile, through the question-answer barrage and the comment barrage, interaction experience between the user and the target video in different dimensions can be improved through richer barrage types.
Example 1
The target text is that the Liu Bei is beaten by Lv Bu and can be used for covering the wild heart with the stock vegetables every day, the Liu Bei is surrendered in the dark with the national brother-in-law Dong Cheng to be used for removing the Cao, and the Cao operator takes the Liu Bei to go to banquet. The hero of Liu Bei and Cao Yi Mei Chi wine boiling theory is that the current world of Liu Bei Cao Bi can calculate the hero, the Liu Bei considers Yuan Shuyuan Shao Liu Biao et al Male Baon, and the Cao operation considers that only Liu Bei and Cao operation are true hero in the world. Caocai tells about the story of checking the plum and quenching thirst, and LiuBei is frightened by thunder to remove chopsticks, so that the Liujie is small. "
The designation "sum". The target text can be divided into 11 sentences by taking a designated symbol as a dividing point, namely, a place where the Liu Bei is defeated by Lv Bu can be removed, the Liu Bei is covered by the stock vegetables at the residence every day, the Liu Bei is surmounted by the stock vegetables at the residence, the Liu Bei is surreptitiously and the state brother-in-law Dong Cheng is planned to be strive for removing the Cao operation, the Liu Bei and Cao operation green plum wine boiling theory hero, the Liu Bei Liu Bie can be calculated as the upper hero, the Liu Bei is Yuan Shuyuan to be Liu Biao and the like, the Liu Bai is really hero in the world, the Liu Bai is telled by the Mei to be thirst story, the Liu Bai is scared by the thunder, and the Liu Bai is laugh to smile.
Each sentence is converted into a sentence vector, and the similarity between each sentence and the rest of sentences is calculated, resulting in the importance of each sentence (calculation formula refers to formulas (1) - (5)). The above 11 sentences are divided into 5 groups based on importance of the sentences, namely, group a includes: "Liu Bei and Cao Yi Mei wine theory hero", "Cao Bi Liu Bei who can calculate the hero" "Caocai is that only LiuBei and Caocai are true hero under the sky"; group B includes: "Liu Bei thinks Yuan Shuyuan Shao Liu Biao et al Male Baparty", "Liu Bei is frightened by thunder to drop chopsticks", "Cao Yi Xiao Cai Yi"; group C includes: "Liu Bei quilt Lv Bu can go without defeating place", "Liu Bei in the dark and national brother-in-law Dong Cheng plan for removing Caocao operation", "Liu Bei daily in the post-residence hospital vegetable mask wild heart"; group D includes: 'Caocao operator please take his or her arms to go to banquet'; group E includes: "Caocao tells about the story of quenching thirst of the race with the plum".
Taking all sentences in the A group and the B group and part of sentences in the C group as first type sentences (materials of question-answer type barrages), namely the first type sentences comprise: "Liu Bei and Cao Yi Mei wine theory hero", "Cao Bi Liu Bei who can calculate the hero" "Caocai is characterized by only LiuBei and Caocai are true hero" "Liu Bei thinks Yuan Shuyuan Shao Liu Biao et al Male Baparty", "Liu Bei is frightened by thunder to drop chopsticks", "Cao Yi Xiao Cai Yi Xiao" Liu Bei is Lv Bu to beat there is no place to go "; all sentences in the D group and the E group, and the rest sentences in the C group are used as second class sentences (materials of comment type barrages), namely the second class sentences comprise: "Liu Bei in the dark and Guo brother-in-law Dong Cheng plan to eliminate Cao Ying", "Liu Bei daily in the residence backyard vegetable mask the wild heart", "Liu Ying Su please Liu Bei to go to banquet", "Cao Yi told about the story of checking the plum and quenching thirst".
The method comprises the steps of generating barrage texts corresponding to a first type sentence and a second type sentence by adopting a UniLM model, describing a process of generating an answer barrage by taking a first type sentence "who can calculate the hero in the Liuzhu province" as an example, inputting a "who can calculate the hero in the Liuzhu province" into the UniLM model, and outputting a "problem: who the Cao operation is true hero, answer: liu Bei and Cao Yi "; taking a second sentence of "Liu Bei daily at home and stock vegetable mask wild heart" as an example, describing the process of generating a comment type barrage, inputting "Liu Bei daily at home and stock vegetable mask wild heart" into a UniLM model, and outputting a first text: mental, second text: true tragic. Before the first text and the second text are output, the second text can be deleted, and only the first text is output, so that the diversity of the comment type barrages output is ensured. The quantity balance of the output question-answer barrages and the comment barrages is ensured by controlling the pre-training process of the UniLM model.
Based on the text content of the target text, the field of the target text can be determined to be the field of 'three-country meaning', the core category corresponding to the field of 'three-country meaning' is 'person', the non-core category is 'scene', 'emotion', 'event', and based on the core category and the non-core category, the corresponding core text and non-core text in the target text can be determined, and the text abstract of the target text also needs to be determined. The method comprises the steps of obtaining a video material library corresponding to the field of 'three kingdoms meaning', firstly, calculating the matching degree of a core text in a target text and labels of all video materials in the video material library, and further determining candidate video materials. Then, the probability similarity of the non-core text of the target text and the labels of the candidate video materials is calculated, the sentence similarity of the text abstract of the target text and the descriptive text of the candidate video materials is calculated, and the content similarity of the target text and each candidate video material is determined by comprehensively calculating the matching degree, the probability similarity and the sentence similarity (the calculation process refers to the process of obtaining the content similarity based on the first similarity and the second similarity). And determines a target video material, such as target video material 1, for synthesizing the target video from among the candidate video materials based on the respective content similarities.
And synthesizing the target video material 1 with each question-answer barrage and comment barrages to obtain a target video. For example, a question and comment type barrage (first text) in a question-and-answer type barrage is displayed on a screen corresponding to a corresponding barrage material, for example, a barrage material corresponding to a question "who is true hero" is "under the question" is true hero "and is" under the question "and is true hero", and a screen corresponding to "under the question" is displayed on a screen 1, for example, a barrage material corresponding to a first text "smart" is daily on a backyard of living and a game is covered with a wild heart "as shown in fig. 6, and a screen corresponding to a screen 2 is displayed on a screen 2, for example, a separate screen after a target video material 1 is displayed on a separate screen, for example, on a screen 3, as shown in fig. 6 (3), in order to keep enough time for a user to think about the question.
Therefore, a user does not need to input texts to the target video to form a corresponding barrage, and the target video generated based on the target text is directly provided with the corresponding barrage, so that the user watching the target video first or the user can enjoy the barrage experience under the condition that no text is input before the user watches the target video.
Example 2
Based on embodiment 1, in embodiment 2, the answer in embodiment 1 can be played in an avatar. In embodiment 1, the answer is displayed on the last single screen of the target video, on which only the answer in text form is displayed, and the user views the answer, because of the lack of simultaneous display of other screens, the user is equivalent to reading a text instead of a barrage, and experience is reduced. Thus, the answers can be played through the avatar, which is a dynamic avatar that plays audio corresponding to the answer in cooperation with displaying limb actions and/or facial actions corresponding to the answer to more vividly display the answer to the user. In some implementations, the avatars may be arranged directly on the separate screen, with the answers being played through the avatars. In some implementations, the avatar may be separated from the separate screen into two user interfaces for simultaneous display, such that the user may view both the answer played by the avatar and the answer of the text.
In one embodiment, as shown in fig. 7, there is provided a bullet screen generating apparatus including: importance calculation module 1, grouping module 2, sentence division module 3, generation module 4 and display module 5, wherein:
The importance calculating module 1 is configured to obtain the importance of each sentence in a target text, wherein the target text is used for synthesizing a target video, and the importance is a weighted average value of the similarity of each sentence and the rest sentences in the target text;
a grouping module 2 configured to divide all sentences in the target text into at least three groups in the order of the importance from high to low;
the sentence dividing module 3 is configured to divide all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answer barrage materials, the second type of sentences are comment barrage materials, the first type of sentences are all sentences in one group with highest ranking and partial sentences in the group with middle ranking, and the second type of sentences are all sentences in one group with lowest ranking and the rest sentences in the group with middle ranking;
a generating module 4 configured to generate the target video based on the target text, generate corresponding questions and answers based on the first class sentences, and generate a first text and a second text based on the second class sentences;
And a display module 5 configured to display the question, the first text, the second text and/or the answer in the target video.
In one embodiment, the importance calculation module 1 is further configured to: converting each sentence into a sentence vector; calculating cosine values and physical distances between the sentence vectors of each sentence and the sentence vectors of the rest sentences in the target text, and obtaining the similarity between each sentence and the rest sentences in the target text; and calculating a weighted average of the similarity between each sentence and the rest sentences in the target text, and obtaining the importance of each sentence.
In one embodiment, as shown in fig. 8, the bullet screen generating apparatus further includes a sentence dividing module 6, wherein: a sentence dividing module 6 configured to identify a specified symbol in the target text; the target text is divided into a plurality of sentences with the specified symbols.
In one embodiment, the generation module 4 is further configured to: and generating the answer and the first text by adopting stochastic decoding, and generating the question and the second text by adopting deterministic decoding.
In one embodiment, the generation module 4 is further configured to: and deleting the second text.
In one embodiment, the generation module 4 is further configured to: respectively calculating a first similarity between each group of questions and the answers and a second similarity between each group of the first texts and the second texts; and reserving the question, the answer, the first text and the second text according to a specified screening proportion based on the first similarity and the second similarity, wherein the specified screening proportion means that the reserved number of groups of the first text and the second text is larger than the reserved number of groups of the question and the answer.
According to still another aspect of the embodiment of the present application, there is further provided an electronic device for implementing the bullet screen generating method, where the electronic device may be, but is not limited to, applied to a server. As shown in fig. 9, the electronic device comprises a memory 100 and a processor 200, the memory 100 having stored therein a computer program, the processor 200 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.
Alternatively, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of the computer network.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
S1, acquiring importance of each sentence in a target text, wherein the target text is used for synthesizing a target video, and the importance is a weighted average value of similarity of each sentence and other sentences in the target text;
s2, dividing all sentences in the target text into at least three groups according to the order of the importance from high to low;
s3, dividing all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answer barrage materials, the second type of sentences are comment barrage materials, the first type of sentences are all sentences in a group with highest ranking and partial sentences in a group with middle ranking, and the second type of sentences are all sentences in a group with lowest ranking and the rest sentences in the group with middle ranking;
s4, generating the target video based on the target text, generating corresponding questions and answers based on the first class sentences, and generating a first text and a second text based on the second class sentences;
and S5, displaying the question, the first text, the second text and/or the answer in the target video.
The specific process of executing the S1 comprises the following steps:
s11, converting each sentence into a sentence vector;
s12, calculating cosine values and physical distances between sentence vectors of each sentence and sentence vectors of other sentences in the target text, and obtaining similarity between each sentence and the other sentences in the target text;
s13, calculating a weighted average value of the similarity between each sentence and the rest sentences in the target text, and obtaining the importance of each sentence.
Also performed before the S1:
s01, identifying a designated symbol in the target text;
s02, dividing the target text into a plurality of sentences by the designated symbols.
Wherein, the specific process of executing S4 includes:
s41, generating the answer and the first text by adopting stochastic decoding, and generating the question and the second text by adopting deterministic decoding.
The specific process of executing S4 further includes:
s411, deleting the second text.
The specific process of executing S4 further includes:
s412, respectively calculating a first similarity between each group of questions and the answers and a second similarity between each group of the first texts and the second texts;
S413, reserving the question, the answer, the first text and the second text according to a specified screening proportion based on the first similarity and the second similarity, wherein the specified screening proportion means that the reserved number of groups of the first text and the second text is larger than the reserved number of groups of the question and the answer.
Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 9 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 9 does not limit the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 9, or have a different configuration than shown in FIG. 9.
The memory 100 may be used to store software programs and modules, such as program instructions/modules corresponding to the bullet screen generating method and apparatus in the embodiment of the present application, and the processor 200 executes the software programs and modules stored in the memory 100, thereby executing various functional applications and data processing, that is, implementing the bullet screen generating method described above. Memory 100 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 100 may further include memory remotely located relative to processor 200, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. Wherein the memory 100 may specifically, but not limited to, store program steps of a barrage generation method.
Alternatively, the transmission device 300 implementing the above-described network connection function is used to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission apparatus 300 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 300 is a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.
In addition, the electronic device further includes: a display 400 for displaying the bullet screen generating process; and a connection bus 500 for connecting the respective module parts in the above-described electronic device.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps of any of the above-described bullet screen generating method embodiments when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
S1, acquiring importance of each sentence in a target text, wherein the target text is used for synthesizing a target video, and the importance is a weighted average value of similarity of each sentence and other sentences in the target text;
s2, dividing all sentences in the target text into at least three groups according to the order of the importance from high to low;
step 3, dividing all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answer barrage materials, the second type of sentences are comment barrage materials, the first type of sentences are all sentences in a group with highest ranking and partial sentences in a group with middle ranking, and the second type of sentences are all sentences in a group with lowest ranking and the rest sentences in the group with middle ranking;
s4, generating the target video based on the target text, generating corresponding questions and answers based on the first class sentences, and generating a first text and a second text based on the second class sentences;
and S5, displaying the question, the first text, the second text and/or the answer in the target video.
The above-described storage medium may be configured to store a computer program for performing the following specific steps:
s11, converting each sentence into a sentence vector;
s12, calculating cosine values and physical distances between sentence vectors of each sentence and sentence vectors of other sentences in the target text, and obtaining similarity between each sentence and the other sentences in the target text;
s13, calculating a weighted average value of the similarity between each sentence and the rest sentences in the target text, and obtaining the importance of each sentence.
The above-described storage medium may be configured to store a computer program for performing the following specific steps:
the following steps are also performed before S1:
s01, identifying a designated symbol in the target text;
s02, dividing the target text into a plurality of sentences by the designated symbols.
Wherein the above-mentioned storage medium may be arranged to store a computer program for performing the following specific steps:
s41, generating the answer and the first text by adopting stochastic decoding, and generating the question and the second text by adopting deterministic decoding.
The above-described storage medium may be configured to store a computer program for performing the following specific steps:
S411, deleting the second text.
The above-described storage medium may be configured to store a computer program for performing the following specific steps:
s412, respectively calculating a first similarity between each group of questions and the answers and a second similarity between each group of the first texts and the second texts;
s413, reserving the question, the answer, the first text and the second text according to a specified screening proportion based on the first similarity and the second similarity, wherein the specified screening proportion means that the reserved number of groups of the first text and the second text is larger than the reserved number of groups of the question and the answer.
Optionally, the storage medium is further configured to store a computer program for executing the steps included in the method in the above embodiment, which is not described in detail in this embodiment.
Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method of barrage generation comprising:
acquiring importance of each sentence in a target text, wherein the target text is used for synthesizing a target video, and the importance is a weighted average of the similarity of each sentence and other sentences in the target text;
Dividing all sentences in the target text into at least three groups according to the order of the importance from high to low;
dividing all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answer barrage materials, the second type of sentences are comment barrage materials, the first type of sentences are all sentences in a group with highest ranking and partial sentences in a group with middle ranking, and the second type of sentences are all sentences in a group with lowest ranking and the rest sentences in the group with middle ranking;
generating the target video based on the target text, generating a first text and a second text based on a second class of sentences and generating a question and answer text based on the first class of sentences by using a pre-trained language model obtained based on a bi-directional transcoding presenter; the pre-trained language model performs classification on the target text based on a first classification formula and a second classification formula, the first classification formula being:
p (answer, question I first class sentence);
the second classification formula is:
p (second text, first text i second class sentence);
Controlling the quantity proportion of question-answer barrage materials and comment barrages through the data proportion of barrage material samples of the pre-training language model, and modifying a decoding strategy, wherein random decoding is adopted to generate the answer and the first text, and deterministic decoding is adopted to generate the question and the second text;
displaying the question, the first text, the second text and/or the answer in the target video.
2. The method of claim 1, wherein the obtaining the importance of each sentence in the target text comprises:
converting each sentence into a sentence vector;
calculating cosine values and physical distances between the sentence vectors of each sentence and the sentence vectors of the rest sentences in the target text, and obtaining the similarity between each sentence and the rest sentences in the target text;
and calculating a weighted average of the similarity between each sentence and the rest sentences in the target text, and obtaining the importance of each sentence.
3. The method of claim 1, further comprising, prior to the obtaining the importance of each sentence in the target text:
Identifying a designated symbol in the target text;
the target text is divided into a plurality of sentences with the specified symbols.
4. The method of claim 1, wherein the generating the target video based on the target text and generating corresponding questions and answers based on the first type of sentence and generating the first text and the second text based on the second type of sentence comprises:
and generating the answer and the first text by adopting stochastic decoding, and generating the question and the second text by adopting deterministic decoding.
5. The method of claim 4, wherein the generating the answer and the first text using stochastic decoding and the generating the question and the second text using deterministic decoding comprises:
and deleting the second text.
6. The method of claim 5, wherein the generating the answer and the first text using stochastic decoding and the generating the question and the second text using deterministic decoding comprises:
respectively calculating a first similarity between each group of questions and the answers and a second similarity between each group of the first texts and the second texts;
And reserving the question, the answer, the first text and the second text according to a specified screening proportion based on the first similarity and the second similarity, wherein the specified screening proportion means that the reserved number of groups of the first text and the second text is larger than the reserved number of groups of the question and the answer.
7. A bullet screen generating apparatus, the apparatus comprising:
the importance calculating module is configured to acquire the importance of each sentence in a target text, wherein the target text is used for synthesizing a target video, and the importance is a weighted average value of the similarity of each sentence and the rest sentences in the target text;
a grouping module configured to divide all sentences in the target text into at least three groups in the order of the importance from high to low;
the sentence dividing module is configured to divide all sentences in the target text into a first type of sentences and a second type of sentences, wherein the first type of sentences are question-answer barrage materials, the second type of sentences are comment barrage materials, the first type of sentences are all sentences in one group with highest ranking and partial sentences in the group with middle ranking, and the second type of sentences are all sentences in one group with lowest ranking and the rest sentences in the group with middle ranking; wherein the target text is classified based on a first classification formula and a second classification formula by using a pre-trained language model obtained based on a bi-directional transcoding renderer, the first classification formula being:
P (answer, question I first class sentence);
the second classification formula is:
p (second text, first text i second class sentence);
a generation module configured to generate the target video based on the target text, generate corresponding questions and answers based on the first class of sentences using the pre-trained language model, and generate first text and second text based on the second class of sentences; controlling the quantity proportion of question-answer barrage materials and comment barrages through the data proportion of barrage material samples of the pre-training language model, and modifying a decoding strategy, wherein random decoding is adopted to generate the answer and the first text, and deterministic decoding is adopted to generate the question and the second text;
and the display module is configured to display the question, the first text, the second text and/or the answer in the target video.
8. A barrage display method, wherein in the target video according to any one of claims 1 to 6, the answer is played through a preset avatar.
9. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program, wherein the computer program is arranged to perform the method of any of claims 1 to 6 when run.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of claims 1 to 6.
CN202111460420.4A 2021-12-02 2021-12-02 Bullet screen generation method and device, storage medium and electronic device Active CN114297354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111460420.4A CN114297354B (en) 2021-12-02 2021-12-02 Bullet screen generation method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111460420.4A CN114297354B (en) 2021-12-02 2021-12-02 Bullet screen generation method and device, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN114297354A CN114297354A (en) 2022-04-08
CN114297354B true CN114297354B (en) 2023-12-12

Family

ID=80964850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111460420.4A Active CN114297354B (en) 2021-12-02 2021-12-02 Bullet screen generation method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN114297354B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170372B (en) * 2022-09-06 2022-12-09 江西兴智教育科技有限公司 Interactive education platform system and method based on Internet

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
WO2017185640A1 (en) * 2016-04-29 2017-11-02 乐视控股(北京)有限公司 Bullet-screen generation and display method, and device, server, and client thereof
CN107704563A (en) * 2017-09-29 2018-02-16 广州多益网络股份有限公司 A kind of question sentence recommends method and system
CN108495184A (en) * 2018-02-06 2018-09-04 北京奇虎科技有限公司 A kind of method and apparatus for adding barrage for video
CN110035325A (en) * 2019-04-19 2019-07-19 广州虎牙信息科技有限公司 Barrage answering method, barrage return mechanism and live streaming equipment
CN112307738A (en) * 2020-11-11 2021-02-02 北京沃东天骏信息技术有限公司 Method and device for processing text
CN112464675A (en) * 2020-12-02 2021-03-09 科大讯飞股份有限公司 Method, device, equipment and storage medium for detecting language contradiction

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100287162A1 (en) * 2008-03-28 2010-11-11 Sanika Shirwadkar method and system for text summarization and summary based query answering

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
WO2017185640A1 (en) * 2016-04-29 2017-11-02 乐视控股(北京)有限公司 Bullet-screen generation and display method, and device, server, and client thereof
CN107704563A (en) * 2017-09-29 2018-02-16 广州多益网络股份有限公司 A kind of question sentence recommends method and system
CN108495184A (en) * 2018-02-06 2018-09-04 北京奇虎科技有限公司 A kind of method and apparatus for adding barrage for video
CN110035325A (en) * 2019-04-19 2019-07-19 广州虎牙信息科技有限公司 Barrage answering method, barrage return mechanism and live streaming equipment
CN112307738A (en) * 2020-11-11 2021-02-02 北京沃东天骏信息技术有限公司 Method and device for processing text
CN112464675A (en) * 2020-12-02 2021-03-09 科大讯飞股份有限公司 Method, device, equipment and storage medium for detecting language contradiction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
唐子惠.《医学人工智能导论》.2020,第376-377页. *

Also Published As

Publication number Publication date
CN114297354A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
Yu et al. A joint sequence fusion model for video question answering and retrieval
Sanabria et al. How2: a large-scale dataset for multimodal language understanding
US10977452B2 (en) Multi-lingual virtual personal assistant
US7783486B2 (en) Response generator for mimicking human-computer natural language conversation
CN107944027B (en) Method and system for creating semantic key index
CN109643325B (en) Recommending friends in automatic chat
CN110234018B (en) Multimedia content description generation method, training method, device, equipment and medium
CN111611436A (en) Label data processing method and device and computer readable storage medium
CN110209774A (en) Handle the method, apparatus and terminal device of session information
CN113380271B (en) Emotion recognition method, system, device and medium
CN109783624A (en) Answer generation method, device and the intelligent conversational system in knowledge based library
Jin et al. Combining cnns and pattern matching for question interpretation in a virtual patient dialogue system
CN111626049A (en) Title correction method and device for multimedia information, electronic equipment and storage medium
CN110166802A (en) Barrage processing method, device and storage medium
CN112163560A (en) Video information processing method and device, electronic equipment and storage medium
CN114297354B (en) Bullet screen generation method and device, storage medium and electronic device
Sharif et al. Vision to language: Methods, metrics and datasets
CN114491152B (en) Method for generating abstract video, storage medium and electronic device
CN110891201A (en) Text generation method, device, server and storage medium
CN110781327A (en) Image searching method and device, terminal equipment and storage medium
Shin et al. Customized image narrative generation via interactive visual question generation and answering
CN113806620B (en) Content recommendation method, device, system and storage medium
CN112749553B (en) Text information processing method and device for video file and server
CN109284364B (en) Interactive vocabulary updating method and device for voice microphone-connecting interaction
Tian et al. Script-to-Storyboard: A New Contextual Retrieval Dataset and Benchmark

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant