WO2023207463A1

WO2023207463A1 - Voting information generation method and apparatus, and voting information display method and apparatus

Info

Publication number: WO2023207463A1
Application number: PCT/CN2023/083979
Authority: WO
Inventors: 陈小帅
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2022-04-29
Filing date: 2023-03-27
Publication date: 2023-11-02
Also published as: CN117037012A

Abstract

A voting information generation method and apparatus, and a voting information display method and apparatus, which belong to the technical field of computers. The voting information generation method comprises: acquiring text content associated with a video clip in a video, wherein the text content comprises at least one of first text content or second text content, the first text content is text content that is included in the video clip, and the second text content is text content that is included in bullet-screen comments of the video clip (201); generating a voting theme of the video clip on the basis of keywords in the text content (202); generating a plurality of voting candidate items of the video clip on the basis of the keywords and the voting theme (203); and generating voting information of the video clip on the basis of the voting theme and the plurality of voting candidate items (204). Provided in the embodiments of the present application is a method for automatically generating voting information of a video clip. It is not necessary to manually create voting information, thereby improving the operation efficiency and saving on time.

Description

Voting information generation method, voting information display method and device

This application claims priority to the Chinese patent application submitted on April 29, 2022, with the application number 202210473861.6 and the invention title "Voting Information Generation Method, Voting Information Display Method and Device", the entire content of which is incorporated into this application by reference. middle.

Technical field

The embodiments of the present application relate to the field of computer technology, and in particular to a voting information generation method, voting information display method and device.

Background technique

Voting is a common interaction method that is widely used in various scenarios. With the widespread spread of videos, a method of voting in videos is currently provided. The operator of the video website or the producer of the video artificially creates voting information in the video, and displays the vote during the video playback process. The message entices those who watch the video to vote. However, this method requires manual creation of voting information in the video, which is very inefficient. Moreover, this method is difficult to cover a large number of videos, resulting in insufficient voting interaction in the video.

Contents of the invention

The embodiments of this application provide a voting information generation method, voting information display method and device, which eliminates the need to manually create voting information, improves operating efficiency, saves time, and this method can effectively cover a large number of videos and improve voting interaction. coverage. The technical solutions are as follows:

On the one hand, a voting information generation method is provided, and the method includes:

The computer device obtains text content associated with a video clip in the video, where the text content includes at least one of first text content or second text content, where the first text content is the text content included in the video clip, and the The second text content is the text content contained in the barrage of the video clip;

The computer device generates a voting topic for the video clip based on keywords in the text content;

The computer device generates a plurality of voting candidates for the video clip based on the keywords and the voting topic;

The computer device generates voting information for the video clip based on the voting topic and the plurality of voting candidates.

On the other hand, a voting information display method is provided, and the method includes:

The computer device obtains voting information of the video clip based on the video clip, the voting information is generated based on a voting topic and a plurality of voting candidates, and the voting topic is based on keywords in text content associated with the video clip. Generating, the plurality of voting candidates are generated based on the keywords and the voting topic;

The computer device determines interaction parameters based on the interest tag of the currently logged-in account and the voting information, where the interaction parameters represent the possibility of the account performing a voting operation based on the voting information;

The computer device displays the voting information when playing the video clip when the interaction parameters meet the interaction conditions;

Wherein, the text content includes at least one of first text content or second text content, the first text content is the text content contained in the video clip, and the second text content is the bounce of the video clip. The text content contained in the scene.

On the other hand, a voting information generating device is provided, which is provided in a computer device, and the device includes:

A text content acquisition module, configured to acquire text content associated with video clips in the video, where the text content includes at least one of first text content or second text content, and the first text content is the text content contained in the video clip. text content, The second text content is the text content contained in the barrage of the video clip;

A topic generation module, configured to generate voting topics for the video clips based on keywords in the text content;

A candidate generation module, configured to generate multiple voting candidates for the video clip based on the keywords and the voting topic;

A voting information generation module, configured to generate voting information for the video clip based on the voting topic and the plurality of voting candidates.

On the other hand, a voting information display device is provided, and the device includes:

An information acquisition module, configured to obtain voting information of video clips based on video clips in the target video, where the voting information is generated based on a voting topic and a plurality of voting candidates, and the voting topic is based on text associated with the video clip. Keywords in the content are generated, and the plurality of voting candidates are generated based on the keywords and the voting topic;

A parameter determination module, configured to determine interaction parameters based on the interest tag of the currently logged-in account and the voting information, where the interaction parameters represent the possibility of the account performing a voting operation based on the voting information;

An information display module, configured to display the voting information when the video clip is played when the interaction parameters meet the interaction conditions;

On the other hand, a computer device is provided. The computer device includes a processor and a memory. At least one computer program is stored in the memory. The at least one computer program is loaded and executed by the processor to implement the above. The operations performed by the voting information generation method described in the aspect, or to implement the operations performed by the voting information display method described in the above aspect.

On the other hand, a computer-readable storage medium is provided. At least one computer program is stored in the computer-readable storage medium. The at least one computer program is loaded and executed by a processor to implement voting as described in the above aspect. The operations performed by the information generation method, or the operations performed by the voting information display method as described above.

On the other hand, a computer program product is provided, including a computer program that, when executed by a processor, implements the operations performed by the voting information generation method described in the above aspect, or implements the voting information described in the above aspect. Shows the operations performed by the method.

Embodiments of the present application provide a method for automatically generating voting information for video clips, which can automatically generate voting information for video clips based on the text content associated with the video clips, eliminating the need to manually create voting information, improving operating efficiency, saving time, and This method can effectively cover a large number of videos and improve the coverage of voting interaction.

Description of the drawings

Figure 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.

Figure 2 is a flow chart of a voting information generation method provided by an embodiment of the present application.

Figure 3 is a flow chart of another voting information generation method provided by an embodiment of the present application.

Figure 4 is a schematic diagram of voting information provided by an embodiment of the present application.

Figure 5 is a schematic flowchart of generating voting topics based on the first generation model provided by an embodiment of the present application.

Figure 6 is a schematic diagram of a first generation model provided by an embodiment of the present application.

FIG. 7 is a schematic flowchart of generating voting candidates based on the second generation model provided by an embodiment of the present application.

Figure 8 is a schematic diagram of a second generation model provided by an embodiment of the present application.

Figure 9 is a schematic diagram of an overall process for generating voting information provided by an embodiment of the present application.

Figure 10 is a flow chart of a voting information generation method provided by an embodiment of the present application.

Figure 11 is a schematic diagram of a voting decision model provided by an embodiment of the present application.

Figure 12 is a flow chart of a voting information display method provided by an embodiment of the present application.

Figure 13 is a flow chart of another voting information display method provided by an embodiment of the present application.

Figure 14 is a schematic diagram of a voting interaction model provided by an embodiment of the present application.

Figure 15 is a schematic structural diagram of a voting information generation device provided by an embodiment of the present application.

Figure 16 is a schematic structural diagram of a voting information display device provided by an embodiment of the present application.

Figure 17 is a schematic structural diagram of a terminal provided by an embodiment of the present application.

Figure 18 is a schematic structural diagram of a server provided by an embodiment of the present application.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It will be understood that the terms "first", "second", etc. used in this application may be used to describe various concepts herein, but these concepts are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, without departing from the scope of the present application, the first text content may be called second text content, and the second text content may be called first text content.

The terms "at least one", "multiple", "each", "any", etc. used in this application, at least one includes one, two or more than two, and multiple includes two or more, each A refers to each of the corresponding plurality, and any refers to any one of the plurality. For example, multiple keywords include 3 keywords, and each keyword refers to each of these 3 keywords. Any one refers to any one of these 3 keywords, which can be the third keyword. One, it can be the second, it can be the third.

Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technology of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

Artificial intelligence technology is a comprehensive subject that covers a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, machine learning/deep learning, autonomous driving, smart transportation and other major directions.

Computer Vision Technology (Computer Vision, CV) Computer vision is a science that studies how to make machines "see". Furthermore, it refers to machine vision such as using cameras and computers instead of human eyes to identify and measure targets, and further Perform graphics processing to make computer processing into images more suitable for human eye observation or transmitted to instruments for detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multi-dimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition, optical character recognition), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual Reality, augmented reality, simultaneous positioning and map construction, autonomous driving, smart transportation and other technologies, as well as common biometric recognition technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.

With the research and progress of artificial intelligence technology, artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, driverless driving, autonomous driving, and drones. , Robots, smart medical care, smart customer service, Internet of Vehicles, autonomous driving, smart transportation, etc. I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important role.

The voting information generation method and voting information display method provided by the embodiments of this application utilize computer vision technology and machine learning technology in artificial intelligence to generate voting information for video clips and display the voting information when the video clips are played.

The execution subject of the voting information generation method and voting information display method provided by the embodiments of this application is a computer device, and the computer device is a terminal or a server. In one possible implementation, the voting information generation method is executed by the server, and the voting information display method is executed by the terminal. Then the embodiment of the present application provides an implementation environment as shown in Figure 1 below.

Figure 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. Refer to Figure 1. The implementation environment includes: a server 101 and a terminal 102. The server 101 and the terminal 102 can be connected directly or indirectly through wired or wireless communication methods, which is not limited in this application.

The server 101 is used to store or deliver videos, and is also used to automatically generate voting information for video clips in the video, while the terminal 102 is used to access the server 101, play the video delivered by the server 101, and display the current video when playing the video. The voting information of the clip is thus launched to initiate a voting interaction for the current video clip, attracting users to perform voting operations and participate in the voting interaction.

In one possible implementation, the server 101 is an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, and networks. Services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms. The terminal 102 is a smartphone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart TV, a smart car terminal, etc., but is not limited thereto.

In another possible implementation, a target application provided by the server 101 is installed on the terminal 102, and the terminal 102 can implement functions such as video playback and voting through the target application. For example, the target application is a video sharing application, which has the function of video sharing. Of course, the video sharing application can also have other functions, such as the function of posting barrages, the function of voting, etc.

Figure 2 is a flow chart of a voting information generation method provided by an embodiment of the present application. The execution subject of the embodiment of the present application is a computer device, and the computer device is a terminal or a server. The embodiment of the present application explains the process of generating voting information of a video clip. Referring to Figure 2, the method includes:

201. The computer device obtains text content associated with the video clips in the video.

Wherein, the video is any video in the computer device. For example, the computer device is a terminal, and the video is any video downloaded by the terminal or any video shot, etc., or the computer device is a server, and the server has a video sharing function and can Store the video uploaded by any device and send the video to any device for playback, and the video is any video stored by the server.

The video includes one or more video clips, and the playing time of the video clips is not greater than the total playing time of the video. For example, the video is divided into multiple video segments according to a fixed duration, and the playback duration of each video segment is equal to the fixed duration.

The text content associated with the video clip includes at least one of first text content or second text content. The first text content is text content contained in the video clip. For example, the first text content includes subtitle text content in the video clip, or includes text content recognized from the voice data in the video clip. The first text content It can represent the content contained in the video clip itself, such as the characters and things that appeared in the video clip or the plot that occurred in the video clip.

The second text content is the text content included in the barrage of the video clip, which may be called the barrage text content. When playing the video clip, the terminal playing the video clip can publish a barrage for the video clip. The text content contained in the barrage can express the end user's views or opinions on the video clip, so the barrage text content can be viewed Create interactive data for video clips. Among them, each video clip in the target video has a corresponding playback time period, and the playback time of the video clip If the segment includes the release time point of the barrage, it means that the barrage is the barrage of the video clip.

After the computer device obtains the text content, it can automatically generate the voting information of the video clip based on the text content, without the need for technical personnel to manually generate it, and display the voting information when the video clip is played, and can also attract users to participate in voting interaction , which helps improve the interactive coverage of the video. The process of generating voting information is detailed in steps 202-204 below.

202. The computer device generates a voting topic for the video clip based on the keywords in the text content.

203. The computer device generates multiple voting candidates for the video clip based on the keywords and voting topics.

The voting information includes a voting topic and multiple voting candidates. The voting topic represents the question asked of the user, and the multiple voting candidates represent the candidate answers provided to the user. When the voting information is displayed, the user understands the question by viewing the voting topic. Select one voting candidate from multiple voting candidates and choose your own answer, which is to implement the voting operation.

The text content includes at least one word, and the keywords in the text content may include every word in the text content, or only include words extracted from the text content through a keyword extraction algorithm. Since the keyword can represent the content of the video clip, the voting topic generated based on the keyword is related to the content of the video clip. Moreover, the multiple voting candidates generated based on keywords and voting topics are also related to the content of the video clip and consistent with the generated voting topics.

204. The computer device generates voting information of the video clip based on the voting topic and multiple voting candidates.

It should be noted that the embodiment of the present application only takes one video clip in a video as an example, and the process of generating voting information for other video clips is similar to the embodiment of the present application and will not be described again here.

In related technologies, voting information in videos is mainly created by operators of video sites or creators of videos, which takes a long time and has low operating efficiency. Moreover, this method is difficult to cover a large number of videos, resulting in The voting interaction in the video is not sufficient, which prohibits users from participating in voting interaction.

Based on the above embodiment shown in Figure 2, the embodiment of the present application also provides another voting information generation method, and explains the specific process of generating voting topics and voting candidates. Figure 3 is a flow chart of another voting information generation method provided by an embodiment of the present application. The execution subject of the embodiment of the present application is a computer device, and the computer device is a terminal or a server. Referring to Figure 3, the method includes:

301. The computer device obtains the text content associated with the video clips in the video.

Optionally, the text content includes first text content, and the first text content is text content included in the video clip. The process of obtaining the first text content includes at least one of the following:

(1) Extract subtitle text content from the video clip.

For example, one or more video frames are extracted from the video clip, and an OCR (Optical Character Recognition) algorithm is used to extract subtitle text content from the one or more video frames.

(2) Extract voice data from the video clip, perform text recognition on the voice data, and obtain the text content corresponding to the voice data.

For example, the ASR (Automatic Speech Recognition) algorithm is used to recognize the speech data and obtain the text content corresponding to the speech data, which is the text content of the dialogue in the video clip.

Optionally, the text content includes second text content, and the second text content is the text content contained in the bullet comments of the video clip. The process of obtaining the second text content includes: extracting the bullet comments of the video clip from the video clip collection. Screen, extract text content from the barrage of video clips.

The voting information includes a voting topic and multiple voting candidates. After obtaining the text content, the computer device first needs to generate a voting topic. For specific steps to generate a voting topic, please refer to the following steps 302-303.

302. The computer device encodes the keywords in the text content and obtains the keyword characteristics of the keywords.

Among them, the keyword feature is used to describe keywords, and the keywords are converted into the form of keyword features to facilitate subsequent processing based on quantifiable keyword features, and to generate voting information related to the keywords.

Optionally, use a coding algorithm to code the keyword to obtain keyword features, or call a coding model to code the keyword to obtain keyword features. Among them, the encoding model can be a Transformer model (a model based on a self-attention mechanism) or other types of models.

Before encoding keywords, you first need to extract keywords from the text content. Optionally, use a keyword extraction algorithm to extract keywords, or call a keyword extraction model to extract keywords. The keyword extraction model may be a TextRank model (a model that extracts keywords based on text ranking) or other types of models.

303. The computer device decodes the keyword features to obtain the voting topic, which is composed of multiple voting topic words.

Optionally, the computer device decodes the keyword features each time to obtain a voting topic word, and then continues to decode the keyword features and the last determined voting topic word to obtain the next voting topic word until the target number is obtained. of voting topic words, and the target number of voting topic words constitute the voting topic.

The specific steps for the computer device to generate multiple voting candidates are detailed in steps 304-306 below.

304. The computer device obtains the first keyword, which is a keyword in the first text content.

In this embodiment of the present application, the computer device obtains the first text content and the second text content, where the keywords in the first text content are called first keywords, and the keywords in the second text content are called second keywords. word.

The step of obtaining the first keyword in the first text content is similar to the method of extracting the keyword in the above-mentioned step 302, and will not be described again here.

305. The computer device clusters the second text content to obtain multiple text categories, each text category contains at least one piece of second text content, and extracts the second keyword from each text category.

The video may include multiple barrages. Accordingly, the computer device will obtain the second text content included in the multiple barrages, thereby extracting the second keywords from the plurality of second text contents. In order to avoid extracting duplicate keywords and save processing time, multiple second text contents can be clustered first, and semantically related second text contents can be divided into one text category, and then different texts can be classified into Category, extract the second keyword in each text category separately.

306. The computer device generates voting candidates for each text category based on the first keyword, the voting topic, and the second keyword of each text category.

After dividing different text categories, since the semantic differences of the second keywords of different text categories are large, and the semantic differences of the second keywords of the same text category are small, different text categories can be generated according to different text categories. Vote candidates. Then the number of generated voting candidates is equal to the number of text categories obtained by clustering.

Taking the process of generating the i-th voting candidate as an example, i is a positive integer and i is not greater than the number of text categories. The process of generating the i-th voting candidate includes: comparing the first keyword, voting topic and i-th text The second keyword of the category is encoded to obtain keyword features, and the keyword features are decoded to obtain the i-th voting candidate. The i-th voting candidate is composed of multiple voting candidate words. The specific process is similar to the above steps 302-303. The difference is that the keyword characteristics determined this time are used to describe the first keyword, the keywords in the voting topic and the second keyword. Multiple keywords are obtained based on the keyword characteristics. The voting candidate words comprehensively consider the influence of the first keyword, the keywords in the voting topic, and the second keyword, ensuring that the voting candidate words are related to the first keyword, the voting topic, and the second keyword.

307. The computer device generates voting information of the video clip based on the voting topic and multiple voting candidates.

Optionally, the voting topic and multiple voting candidates constitute the voting information of the video clip. Alternatively, the voting topic, multiple voting candidates, and associated information constitute the voting information of the video clip. The associated information includes text or images used to prompt users to vote, etc., and may also include other types of information.

As shown in Figure 4, when a video clip is played, the video screen of the video clip and voting information are displayed in the playback interface. The voting information is divided into two parts, one part is the voting topic "What did you come to see?", and the other part There are three voting candidates for users to choose from.

After generating the voting information, the computer device stores the video clips in association with the voting information, displays the voting information when playing the video clips, or delivers the voting information each time the video clips are delivered to other devices. Alternatively, the computer device adds the voting information to the video clip so that the voting information is displayed when the video clip is played. The specific process of displaying voting information is detailed in the embodiment shown in FIG. 12 and FIG. 13 below, and will not be described here.

It should be noted that the embodiment of the present application only takes voting information of a video clip as an example for explanation. In another embodiment, the computer device can generate multiple voting topics and each voting topic by repeatedly executing the above steps. The corresponding voting candidates constitute multiple voting information, and then the multiple voting information can be displayed when the video clip is played, or one or more of the multiple voting information can be displayed. The embodiment of the present application does not limit this. .

Embodiments of the present application provide a method for automatically generating voting information for video clips, which can automatically generate voting information for video clips based on the text content associated with the video clips, eliminating the need to manually create voting information, improving operating efficiency, saving time, and This method can effectively cover a large number of videos and improve the coverage of voting interaction. Moreover, the generated voting information is related to the text content of the video clip, which meets the interactive function requirements of the video, helps to increase the enthusiasm of users to participate in voting interaction while watching the video clip, and thereby enhances the interactive atmosphere. Using the above method makes it easy to cover a large number of videos, improves the interactive coverage and richness of the videos, and increases the user interaction activity on the video platform.

Based on the above embodiment shown in Figure 3, in one possible implementation, the process of generating voting topics in the above steps 302-303 can be performed based on the first generation model, the first generation model includes the first encoding sub-model and a first decoding sub-model, wherein the first encoding sub-model is used to encode keywords into keyword features, and the first decoding sub-model is used to decode keyword features into voting topic words. As shown in Figure 5, the process of generating voting topics based on the first generation model includes:

501. The computer device calls the first encoding sub-model, encodes the N keywords, and obtains the keyword characteristics of the N keywords, where N is an integer greater than 1.

Among them, the first encoding sub-model is a Transformer Encoder model (the encoder in the Transformer model) or other types of encoding models.

Among them, the N keywords include the first keywords in the first text content, such as dialogue text keywords or alphabetical text keywords of the video clip, etc., and also include the second keywords in the second text content, such as the video clip's Text keywords in the barrage.

In the embodiment of this application, the computer device obtains N keywords of the video clip, and needs to comprehensively consider these N keywords to generate a voting topic. Then N keywords are input into the first encoding sub-model of the first generative model, so that the N keywords are encoded respectively based on the first encoding sub-model to obtain N keyword features, each keyword corresponding to a key word features.

502. The computer device calls the first decoding sub-model, decodes the N keyword features, obtains the first decoding feature, and determines the first voting topic word based on the first decoding feature and the N keyword features.

Among them, the first decoding sub-model is a Transformer Decoder model (the decoder in the Transformer model) or other types of coding models.

503. The computer device calls the first decoding sub-model, decodes the N keyword features and the first voting topic words, obtains the second decoding feature, and determines the reference voting topic based on the second decoding feature and the N keyword features. , the reference voting topic includes the first voting topic word and the second voting topic word, until after N times of decoding, the reference voting topic obtained by the Nth decoding is determined as the voting topic of the video clip.

In the embodiment of this application, a method of sequential decoding is used to generate voting topic words. Each time decoding is performed, the current voting topic words are determined. The current voting topic words are combined with the previously determined voting topic words in order to obtain the current voting topic words. Reference voting topic. As decoding is performed multiple times, the number of voting topic words contained in the reference voting topic gradually increases until, after N decodings, the reference voting topic obtained by decoding for the Nth time contains N voting topic words, thus obtaining the result containing N voting topic words. Poll topic for topic words.

In the embodiment of this application, not only the N keyword features are considered during each decoding, but also the previously determined voting topic words are considered. This can ensure that the voting topic words determined this time are associated with the previously determined voting topic words. This ensures that the associations of different voting topic words in the voting topic composed of the determined N voting topic words can be combined to form a sentence with clear semantics.

Optionally, the first generation model also includes a first classification layer and a preset word library. The preset word library includes a plurality of words. Based on the first decoding feature and the N keyword features, the first voting topic word is determined, including :

(1) Determine the N first usage probabilities based on the 1st decoding feature and N keyword features, and the jth first usage The probability is the probability of using the jth keyword in the voting topic, j is a positive integer, and j is not greater than N. The jth first usage probability is also the probability of determining the jth keyword as the first voting topic word in the voting topic.

(2) When the jth first usage probability satisfies the usage condition, determine the jth keyword as the first voting topic word. Among them, the usage conditions refer to the conditions that need to be met when using keywords in voting topics.

(3) When each first usage probability does not meet the usage conditions, the first classification layer is called to classify based on the first decoding feature and the preset word library, and the classification probability of each word in the preset word library is obtained. , based on the classification probability of each word, determine the first voting topic word. Among them, the classification probability of a word represents the probability of determining the word as the voting subject word.

The embodiments of this application provide two ways of determining voting topic words. One is to use a copy mechanism to copy keywords into voting topic words. That is, if the first usage probability of the keyword meets the usage conditions, then the key The words were directly used as the voting topic words. The other is to generate new voting topic words based on the preset word library. Then every time the first decoding sub-model performs decoding, the usage probability is first determined, and based on whether the usage probability meets the usage conditions, it is determined whether to use the keyword as a voting topic word or to generate a new voting topic word.

Optionally, the usage condition includes a usage probability threshold. In the case where the jth first usage probability is greater than the usage probability threshold, the jth keyword is determined as the first voting topic word, wherein the plurality of first usage probabilities are equal to If it is greater than the usage probability threshold, the keyword with the largest first usage probability will be determined as the first voting topic word, and if each first usage probability is not greater than the usage probability threshold, then determine the keyword in the preset word library The classification probability of each word represents the possibility of determining the word as a voting topic word. Then based on the classification probability of each word, the first voting topic word can be selected from multiple words.

Optionally, determining the first voting topic word based on the classification probability of each word includes: selecting the word with the highest classification probability from multiple words in the preset word library as the first voting topic word, or selecting from Among the multiple words in the preset word library, the target number of words with the highest classification probability are selected as the candidate first voting topic words. Among them, the number of targets is greater than 1.

Optionally, based on the second decoding feature and the N keyword features, determine the reference voting topic. The reference voting topic includes the first voting topic word and the second voting topic word, including:

(1) Determine N second usage probabilities based on the 2nd decoding feature and N keyword features. The jth second usage probability is the probability of using the jth keyword in the voting topic, j is a positive integer, and j is not greater than N. The j-th second usage probability is also the probability of determining the j-th keyword as the second voting topic word in the voting topic.

(2) When the j-th second usage probability satisfies the usage conditions, determine the j-th keyword as the second voting topic word. Among them, the usage conditions refer to the conditions that need to be met when using keywords in voting topics.

(3) When each second usage probability does not meet the usage conditions, the first classification layer is called to classify based on the second decoding feature and the preset word library to obtain the classification probabilities of multiple candidate voting topics. Each candidate The voting topic includes a first voting topic word and a second voting topic word. Among them, the classification probability of a candidate voting topic represents the probability of determining the candidate voting topic as a reference voting topic.

(4) Based on the classification probability of each candidate voting topic, determine the reference voting topic.

Optionally, determine the reference voting topic based on the classification probability of each candidate voting topic, including: selecting the candidate voting topic with the highest classification probability from multiple candidate voting topics as the reference voting topic, or voting from multiple candidate voting topics Among the topics, the target number of candidate voting topics with the highest classification probability are selected as reference voting topics. Among them, the number of targets is greater than 1. Subsequently, voting candidates can be generated for each voting topic to form a target number of voting information.

The process of the second decoding is similar to the process of the first decoding. The difference is that during the second decoding, the first voting topic words obtained by the first decoding will also be input into the first decoding sub-model, so that The first voting topic word and each word in the preset word library constitute a candidate voting topic. By determining the classification probabilities of multiple candidate voting topics, the current voting topic is determined from multiple candidate voting topics, which can ensure Taking into account the correlation between the words in the preset word library and the first voting topic words, the classification probability of the candidate voting topics can also reflect the candidate voting topics obtained by combining the words in the preset word library and the first voting topic words. to a reasonable degree, thereby ensuring that voting topics with clear semantics and logical logic can be generated.

It should be noted that the embodiment of the present application only explains the first decoding and the second decoding. When N is greater than 2, the decoding process after the second decoding is similar to the second decoding process. , which will not be described in detail here.

During the decoding process, based on the decoding features and N keyword features, the following formula is used to determine the i-th usage probability:

P _i =e^(X*w _i *sv _i )/sum _j (e^(X*w _i *sv _i )), where P _i represents the i-th usage probability, X represents the current decoding feature, w _i represents the weight parameter of the i-th keyword feature, sv _i represents the i-th keyword feature, j is any positive integer, j is not greater than N, X _j represents the j-th decoding feature, w _j represents the j-th key The weight parameter of the word feature, sv _j represents the jth keyword feature.

Figure 6 is a schematic diagram of a first generation model provided by an embodiment of the present application. Referring to Figure 6, the N keywords obtained by the computer device include dialogue text keywords, subtitle text keywords, and barrage text keywords. N keywords are input into the first encoding sub-model, and N keyword features are obtained by calling the first encoding sub-model: keyword feature 1, keyword feature 2...keyword feature N, and during each decoding, You can choose to copy text keywords or redefine new voting topic words. As shown in the right half of Figure 6, if it is determined to copy the text keywords, the corresponding text keywords can be directly determined as the voting topic words. If it is determined not to copy the text keywords, then the corresponding text keywords can be determined based on the last determined voting topic words. The current voting topic word.

Optionally, the training process of the first generation model includes:

Obtain the sample text content associated with the positive sample video clip. The positive sample video clip is a video clip that contains sample voting information and the participation rate of the sample voting information reaches the target threshold; obtain the sample voting topic included in the sample voting information; based on the sample text content and Sample voting topics, adjusting model parameters in the first generative model.

Among them, the positive sample video clip is a video clip for which sample voting information has been created, and when the sample voting information is displayed when the positive sample video clip is played, the participation rate of the sample voting information reaches the target threshold, indicating that there are many The user participated in voting while watching the positive sample video clip. The sample voting information has a strong correlation with the positive sample video clip. Then the sample voting topic contained in the sample voting information also has a strong correlation with the positive sample video clip. Strong correlation, so the first generative model is trained based on the sample text content associated with the positive sample video clip and the sample voting topic, so that the trained first generative model is based on the voting topic obtained from the sample text content and the sample voting topic. The similarity increases, thereby improving the accuracy of the first generation model, so that the function of the first generation model in generating voting topics based on text content is improved.

The model parameters in the first generation model may include weight parameters or other parameters in each layer of the first generation model. For example, the model parameters include the weight parameter w used to determine the probability of use. Through one or more Training and adjusting the model parameters can improve the accuracy of the weight parameter w, improve the accuracy of the determined usage probability, and thereby improve the accuracy of the final determined voting topic.

Optionally, negative sample video clips can also be introduced when training the first generation model. Negative sample video clips are video clips for which sample voting information has been created but the participation rate has not reached the target threshold, or video clips for which no sample voting information has been created.

If the sample voting information is displayed when the negative sample video clip is played, and the participation rate of the sample voting information does not reach the target threshold, it means that many users did not participate in voting when watching the negative sample video clip. The sample voting information If the correlation with the negative sample video clip is not strong, the sample voting topic contained in the sample voting information will not be strongly correlated with the negative sample video clip. The sample voting topic should not be used as the vote corresponding to the negative sample video clip. topic, so the first generative model is trained based on the sample text content associated with the negative sample video clip and the sample voting topic, so that the trained first generative model is based on the similarity between the voting topic obtained from the sample text content and the sample voting topic. By reducing, the accuracy of the first generative model can be improved to prevent the first generative model from generating inappropriate voting topics based on text content.

Embodiments of the present application provide a method for automatically generating voting topics for video clips. By modeling the text content and voting topics of video clips, a first generative model is trained, and a voting topic is generated based on the first generative model, which enables the video to be The content of the clip is deeply understood, and the voting topic is generated based on the characteristics of the deep representation. The generated voting topic is related to the text content of the video clip, which meets the interactive functional requirements of the video and helps to improve user participation in the process of watching the video clip. The enthusiasm of voting interaction further enhances the interactive atmosphere.

Based on the above embodiment shown in Figure 3, in one possible implementation manner, the investment generated in the above steps 304-306 The process of voting candidates may be performed based on a second generative model, which includes a second encoding sub-model and a second decoding sub-model, wherein the second encoding sub-model is used to encode keywords into keyword features, and the second encoding sub-model is used to encode keywords into keyword features. The binary decoding sub-model is used to decode keyword features into voting candidate words. As shown in Figure 7, the process of generating voting candidates based on the second generation model includes:

701. The computer device obtains the first keyword in the first text content, clusters the second text content, and obtains multiple text categories. Each text category contains at least one piece of second text content. From each text category, Extract the second keyword.

The specific process of this step 701 is similar to the above-mentioned steps 304-305, and will not be described again here.

702. The computer device calls the second encoding sub-model to encode the first keyword, the voting topic and the second keyword of the i-th text category to obtain keyword features.

Among them, the second encoding sub-model is a Transformer Encoder model or other types of encoding models.

In the embodiment of this application, the computer device obtains the first keyword of the video clip, the voting topic, and the second keyword of the i-th text category, and needs to comprehensively consider these keywords to generate the i-th voting candidate. Then input the first keyword, the voting topic and the second keyword of the i-th text category into the second encoding sub-model of the second generation model, and take the number of input keywords as M as an example, so that The M keywords are encoded respectively based on the second encoding sub-model to obtain M keyword features.

703. The computer device calls the second decoding sub-model, decodes the M keyword features, obtains the first decoding feature, and determines the first voting candidate word based on the first decoding feature and the M keyword features.

704. The computer device calls the second decoding sub-model, decodes the M keyword features and the first voting candidate words, obtains the second decoding feature, and determines the reference vote based on the second decoding feature and the M keyword features. Candidates, reference voting candidates include the first voting candidate word and the second voting candidate word, until after M times of decoding, the reference voting candidate obtained by the M-th decoding is determined as the i-th voting candidate of the video clip item.

In the embodiment of this application, a method of sequential decoding is used to generate voting candidate words. Each time decoding is performed, the current voting candidate words are determined. The current voting candidate words are combined with the previously determined voting candidate words in order. You can get the current reference voting candidates. As decoding is carried out multiple times, the number of voting candidate words contained in the voting candidates gradually increases, until after M times of decoding, the reference voting candidate obtained by decoding for the Mth time contains M voting candidate words, thus obtaining A voting candidate containing M voting candidate words.

In the embodiment of this application, not only the M keyword features are considered during each decoding, but also the previously determined voting candidate words are considered. This ensures that the voting candidate words determined this time are consistent with the previously determined voting candidate words. Correlation, thereby ensuring that the association of different voting candidate words in the voting candidates composed of the determined M voting candidate words can be combined to form a sentence with clear semantics.

Optionally, the second generation model also includes a second classification layer and a preset word library. The preset word library includes a plurality of words. Based on the first decoding feature and the M keyword features, the first voting candidate word is determined, include:

(1) Determine M third usage probabilities based on the 1st decoding feature and M keyword features. The jth third usage probability is the probability of using the jth keyword in the voting candidate, j is a positive integer, And j is not greater than M. The j-th third usage probability is also the probability of determining the j-th keyword as the first voting candidate word among the voting candidates.

(2) When the j-th third usage probability satisfies the usage condition, determine the j-th keyword as the first voting candidate word. Among them, the usage conditions refer to the conditions that need to be met when using keywords in voting candidates.

(3) When each third usage probability does not meet the usage conditions, call the second classification layer to classify based on the first decoding feature and the preset word library to obtain the classification probability of each word in the preset word library , based on the classification probability of each word, determine the first voting candidate word. The classification probability of a word represents the probability of determining the word as the first voting candidate word.

The embodiments of this application provide two ways to determine voting candidate words. One is to use a copy mechanism to copy keywords into voting candidate words. That is, if the third usage probability of the keyword meets the usage conditions, then Keywords are directly used as voting candidates. The other is to generate new voting candidate words based on the preset word library. Then every time the first decoding sub-model performs decoding, the usage probability is first determined, and based on whether the usage probability meets the usage conditions, it is determined whether to Using keywords as voting candidate words still requires generating new voting candidate words.

Optionally, the usage condition includes a usage probability threshold. When the jth third usage probability is greater than the usage probability threshold, the jth keyword is determined as the first voting candidate word, wherein among multiple third usage probabilities If both are greater than the usage probability threshold, the keyword with the largest third usage probability will be determined as the first voting topic word, and if each third usage probability is not greater than the usage probability threshold, then the preset word library will be determined The classification probability of each word in , the classification probability represents the possibility of determining the word as a voting candidate word, then based on the classification probability of each word, the first voting candidate word can be selected from multiple words.

Optionally, determining the first voting candidate word based on the classification probability of each word includes: selecting the word with the highest classification probability from multiple words in the preset word library as the first voting candidate word, or , from the multiple words in the preset word library, select the target number of words with the highest classification probability as the first voting candidate words. Among them, the number of targets is greater than 1.

Optionally, based on the second decoding feature and the M keyword features, the reference voting candidates are determined, and the reference voting candidates include the first voting candidate words and the second voting candidate words, including:

(1) Determine M fourth usage probabilities based on the second decoding feature and M keyword features. The jth fourth usage probability is the probability of using the jth keyword in the voting candidate, j is a positive integer, And j is not greater than N. The j-th fourth usage probability is also the probability of determining the j-th keyword as the second voting candidate word among the voting candidates.

(2) When the jth fourth usage probability satisfies the usage condition, determine the jth keyword as the second voting candidate word. Among them, the usage conditions refer to the conditions that need to be met when using keywords in voting candidates.

(3) When each fourth usage probability does not meet the usage conditions, the first classification layer is called to classify based on the second decoding feature and the preset word library to obtain the classification probabilities of multiple candidate voting candidates. Each The candidate voting candidates include a first voting candidate word and a second voting candidate word. Among them, the classification probability of the voting candidate represents the probability of determining the voting candidate as the reference voting candidate.

(4) Based on the classification probability of each candidate voting candidate, determine the reference voting candidate.

Optionally, determining the reference voting candidate based on the classification probability of each candidate voting candidate includes: selecting the voting candidate with the highest classification probability from multiple voting candidates as the reference voting candidate, or, From multiple candidate voting candidates, a target number of candidate voting candidates with the highest classification probability are selected as reference voting candidates respectively. Among them, the number of targets is greater than 1. Subsequently, corresponding voting candidates can be generated for each voting candidate, thereby forming a target number of voting information.

The process of the second decoding is similar to the process of the first decoding. The difference is that during the second decoding, the first voting candidate words obtained by the first decoding will also be input into the first decoding sub-model. Thus, the first voting candidate word and each word in the preset word library constitute alternative voting candidates. By determining the classification probabilities of multiple alternative voting candidates, the candidate voting candidates are selected from multiple alternative voting candidates. The current voting candidate is determined in the item, which ensures that the correlation between the words in the preset word library and the first voting candidate word is taken into account, and the classification probability of the alternative voting candidate can also reflect the preset word The reasonable degree of the alternative voting candidates obtained by combining the words in the library with the first voting candidate words ensures that voting candidates with clear semantics and smooth logic can be generated.

During the decoding process, based on the i-th decoding feature and M keyword features, the following formula is used to determine the i-th usage probability:

P _i =e^(X*w _i *sv _i )/sum _j (e^(X*w _i *sv _i )), where P _i represents the i-th usage probability, X represents the current decoding feature, w _i represents the weight parameter of the i-th keyword feature, sv _i represents the i-th keyword feature, j is any positive integer, j is not greater than M, X _j represents the j-th decoding feature, w _j represents the j-th key The weight parameter of the word feature, sv _j represents the jth keyword feature.

Figure 8 is a schematic diagram of a second generation model provided by an embodiment of the present application. Referring to Figure 8, the computer device clusters the text content of the barrage to obtain multiple text categories, and the second keywords extracted from each text category are , which is the barrage text keyword. The M keywords obtained by the computer device include dialogue text keywords, subtitle text keywords, and voting topics. The keywords in and the barrage text keywords corresponding to the i-th text category, input M keywords into the second encoding sub-model, and obtain M keyword features by calling the second encoding sub-model: keyword features 1. Keyword feature 2...keyword feature M, and by calling the second decoding sub-model, during each decoding, you can choose whether to copy the text keyword or re-determine new voting candidate words. As shown in the right half of Figure 8, if it is determined to copy text keywords, the corresponding text keywords can be directly determined as voting candidate words. If it is determined not to copy the text keywords, then based on the last determined voting candidate words , determine the current voting candidate words.

Optionally, the training process of the second generative model includes:

Obtain the sample text content associated with the positive sample video clip. The positive sample video clip is a video clip that contains sample voting information and the participation rate of the sample voting information reaches the target threshold; obtain the sample voting topic and multiple sample voting candidates included in the sample voting information. items; based on the correlation between each sample text content and each sample voting candidate, determine the text category corresponding to each sample voting candidate. The text category includes the sample text content associated with the sample voting candidate; respectively, from each sample voting candidate Extract sample keywords from the text category corresponding to the sample voting candidate; adjust the model parameters in the second generation model based on the sample text content, the sample voting topic, the multiple sample voting candidates, and the sample keywords of each sample voting candidate. .

Among them, the positive sample video clip is a video clip for which sample voting information has been created, and when the sample voting information is displayed when the positive sample video clip is played, the participation rate of the sample voting information reaches the target threshold, indicating that there are many The user participated in voting while watching the positive sample video clip, and the sample voting information has a strong correlation with the positive sample video clip, then the sample voting topic and multiple sample voting candidates included in the sample voting information are related to the sample voting information. Positive sample video clips also have strong correlation, so the second generative model is trained based on the sample text content associated with the positive sample video clip and the sample voting information, so that the trained second generative model is based on the votes obtained from the sample text content. The similarity between the candidate and the sample voting candidate increases, thereby improving the accuracy of the second generation model, so that the function of the second generation model in generating voting candidates based on text content is improved.

Moreover, the sample text content associated with the positive sample video clip contains text content associated with each sample voting candidate. By determining the correlation between each sample text content and each sample voting candidate, each sample text is The content is divided into the text categories of each sample voting candidate, which can distinguish the sample text content of different text categories, so that the sample keywords corresponding to each sample voting candidate can be extracted according to different text categories to avoid being affected by other texts. Category interference, so that the second generative model has the function of generating voting candidates based on keywords of different text categories.

The model parameters in the second generation model may include weight parameters or other parameters in each layer of the second generation model.

Optionally, negative sample video clips can also be introduced when training the second generation model. Negative sample video clips are video clips for which sample voting information has been created but the participation rate has not reached the target threshold, or video clips for which no sample voting information has been created.

If the sample voting information is displayed when the negative sample video clip is played, and the participation rate of the sample voting information does not reach the target threshold, it means that many users did not participate in voting when watching the negative sample video clip. The sample voting information If the correlation with the negative sample video clip is not strong, the sample voting topic and sample voting candidate included in the sample voting information will not be strongly correlated with the negative sample video clip, and the sample voting candidate should not be used as the negative sample video clip. The voting candidate corresponding to the sample video clip, or the sample voting candidate should not be used as a candidate for the sample voting topic, so based on the sample text content associated with the negative sample video clip, the sample voting topic and the multiple sample voting candidates The item trains the second generative model, so that the similarity between the voting candidates obtained by the trained second generative model based on the sample text content and the sample voting candidates is reduced, thereby improving the accuracy of the second generative model to avoid the second generation model. The second generative model generates inappropriate voting candidates based on text content and voting topics.

Embodiments of the present application provide a method for automatically generating voting candidates for video clips. By modeling the text content and voting candidates of the video clips, a first generation model is trained, and voting candidates are generated based on the first generation model. It can deeply understand the content of video clips and generate voting information based on deep representation features, improving the accuracy of voting information. The generated voting information is related to the text content of the video clip, which meets the interactive functional requirements of the video, helps to increase the enthusiasm of users to participate in voting interaction while watching the video clip, and thereby enhances the interactive atmosphere.

On the basis of the above embodiments, embodiments of the present application also provide another method for generating voting information. Figure 9 is a schematic diagram of an overall process for generating voting information provided by embodiments of the present application. See Figure 9. The voting information In the generation method, it can first be determined whether voting information needs to be generated for the video clip. Only when it is determined that voting information needs to be generated for the video clip, the steps of generating voting information, including generating voting topics and generating voting candidates, will be performed. After the voting information is generated, personalized voting information is displayed based on the login account. Among them, the embodiment shown in Figure 10 below will explain the process of determining whether voting information needs to be generated, and the process of displaying personalized voting information based on the login account is detailed in the implementation shown in Figures 12 and 13 below. For example, the embodiment shown in FIG. 10 below will not be described for the time being.

Figure 10 is a flow chart of a voting information generation method provided by an embodiment of the present application. The execution subject of the embodiment of the present application is a computer device, and the computer device is a terminal or a server. Referring to Figure 10, the method includes:

1001. The computer device acquires the first text content, the second text content, the first popularity parameter and the second popularity parameter.

The first text content is the text content contained in the video clip, and the second text content is the text content contained in the barrage of the video clip. The process of obtaining the first text content and the second text content is detailed in step 301 above, and will not be described again here.

The first popularity parameter represents the popularity of the video clip, and the second popularity parameter represents the popularity of the barrages of the video clip.

The first popularity parameter may be determined based on the number of times the video clip is played, for example, the first popularity parameter is positively correlated with the number of times the video clip is played. Alternatively, in order to accurately measure the popularity of the video segment among multiple video segments, the first popularity parameter of the video segment may be determined based on the number of times the video segment is played and the maximum number of times the multiple video segments are played by the computer device.

For example, the following formula is used to determine the first heat parameter:
Q1＝log(1.0+num1)/log(1.0+num2), or Q1＝log(1.0+num1)/log(1.0+num2)/R1;

Among them, Q1 represents the first popularity parameter, num1 represents the number of plays of the video clip, num2 represents the maximum number of plays, and R1 represents the maximum value of the first popularity parameter interval. Among them, R1 is a preset value. For example, R1 is 0.1, which means that the heat parameter interval is divided into 10 levels. The first level is determined by using the above formula Q1=log(1.0+num1)/log(1.0+num2)/R1. A popularity parameter Q1 is the popularity level of the video clip.

Wherein, the second popularity parameter can be determined based on at least one of the number of barrages or the number of likes on barrages included in the video clip, such as the second popularity parameter and the number of barrages or the number of likes on barrages included in the video clip. at least one positive correlation. Alternatively, in order to accurately measure the popularity of the video clip among multiple video clips, the video clip may be based on at least one of the number of barrages or the number of barrage likes included in the video clip, and the number of barrage likes among the multiple video clips of the computer device. At least one of the maximum number of barrages or the maximum number of likes on barrages determines the second popularity parameter. Among them, the number of likes of the barrage of the video clip is the sum of the number of likes of the barrage of the video clip or the maximum value of the number of likes of the barrage of the video clip.

For example, the following formula is used to determine the second heat parameter:
Q2＝log(1.0+num3+num4)/log(1.0+num5+num6);

Or, Q2=log(1.0+num3+num4)/log(1.0+num5+num6)/R2;

Among them, Q2 represents the second popularity parameter, num3 represents the number of barrages included in the video clip, num4 represents the number of barrage likes of the video clip, and num5 represents the maximum number of barrages in multiple video clips, that is, the one containing the most barrages The number of barrages included in the video clip, num6 represents the maximum number of barrage likes in multiple video clips, that is, the maximum number of barrage likes in multiple video clips, and R2 represents the maximum value of the second popularity parameter interval. R2 is a preset value. For example, R2 is 0.1, which means that the second heat parameter interval is divided into 10 levels. The above formula Q2=log(1.0+num3+num4)/log(1.0+num5+num6) is used The second popularity parameter Q2 determined by /R2 is the barrage popularity level of the video clip.

1002. The computer device determines a voting identifier of the video clip based on the first text content, the second text content, the first popularity parameter, and the second popularity parameter. The voting identifier indicates whether to generate voting information for the video clip.

In this embodiment of the present application, the first text content and the second text content represent the content of the video clip, and the first popularity parameter and the second popularity parameter represent the popularity of the video clip. Based on the first text content, the second text content, and the third The first popularity parameter and the second popularity parameter determine the voting identification of the video clip. The content and popularity of the video clip are comprehensively considered. The determined voting identification can better reflect whether the average user will participate in the voting interaction of the video clip and whether it will be based on The voting information is used to perform a voting operation to ensure that the determined voting identification can accurately represent whether voting information is to be generated for the video clip.

Optionally, step 1002 includes:

Obtain the first text feature of the first text content and the second text feature of the second text content. The first text feature and the second text feature are used to describe the first text content and the second text content respectively, and then obtain the first text feature. The first heat feature of the heat parameter and the second heat feature of the second heat parameter. The first heat feature and the second heat feature are used to describe the first heat parameter and the second heat parameter respectively. The first text feature is , the second text feature, the first popularity feature and the second popularity feature are spliced to obtain the video clip features, so that the video clip features can accurately describe the video clip, then the video clip features are classified based on the video clip features to obtain the voting identification.

Wherein, the computer device may first determine a heat feature table, where the heat feature table includes heat features corresponding to at least one heat parameter. By querying the heat feature table, the first heat feature of the first heat parameter and the second heat feature of the second heat parameter can be determined. For example, the heat feature table includes a first heat feature table and a second heat feature table. The first heat feature table contains heat features of the heat parameters of the video clips. The first heat feature table can be determined by querying the first heat feature table. The first popularity feature and the second popularity feature table include the popularity feature of the popularity parameter of the barrage of the video clip. The second popularity feature of the second popularity parameter can be determined by querying the second popularity feature table.

1003. When the voting identifier indicates that the voting information is generated for the video clip, the computer device generates a voting topic for the video clip based on the keywords in the text content, and generates multiple voting candidates for the video clip based on the keywords and the voting topic. Based on the voting topic and multiple voting candidates, the voting information of the video clip is generated.

Optionally, the voting identification includes a first voting identification and a second voting identification. The first voting identification indicates that voting information is generated for the video clip, and the second voting identification indicates that voting information is not generated for the video clip. Then, when the determined voting identification is the first voting identification, the step of generating voting information is performed. For example, the first voting ID is 1 and the second voting ID is 0. Alternatively, the voting identifier is a probability of generating voting information. The determined probability is greater than the probability threshold, indicating that voting information is generated for the video clip. The determined probability is not greater than the probability threshold, indicating that voting information is not generated for the video clip.

The step of generating the voting information of the video clip includes the steps of generating a voting topic, generating a plurality of voting candidates, and generating voting information based on the video topic and the voting candidates. The specific process can be seen in the above embodiments and will not be described again here.

The method provided by the embodiment of the present application determines the voting identifier of the video clip based on the characteristics corresponding to the first text content, the second text content, the first popularity parameter, and the second popularity parameter, taking into account the content and popularity of the video clip, that is, It fully captures the content characteristics of the video clip and also takes into account the interaction data of the video clip. The determined voting identification can better reflect whether the general user will participate in the voting interaction of the video clip and whether the voting operation will be based on the voting information, thus effectively It can accurately determine whether to generate voting information for this video clip, avoiding the need to directly generate voting information for each video clip. On the basis of improving the interactivity of the video, it also saves processing and reduces the amount of video data.

Based on the above embodiment shown in Figure 10, the process of determining the voting identification can be performed based on the voting decision model. The voting decision model includes a first feature extraction sub-model, a first splicing layer and a second classification layer. The voting decision model also It includes a heat feature table, and the heat feature table includes heat features corresponding to at least one heat parameter. Optionally, the heat feature table includes the above-mentioned first heat feature table and second heat feature table.

Accordingly, the process of determining voting identification includes:

Call the first feature extraction sub-model to obtain the first text feature of the first text content and the second text feature of the second text content; query the heat feature table based on the first heat parameter and the second heat parameter to obtain the first heat feature and The second hotness feature; call the first splicing layer to splice the first text feature, the second text feature, the first hotness feature and the second hotness feature to obtain the video clip features; call the second classification layer to perform the classification based on the video clip features Classify and get the voting ID.

Optionally, the first feature extraction sub-model is a BERT (Bidirectional Encoder Representations from Transformers) model or other types of models.

Optionally, when the video clip includes multiple barrages, multiple barrage text contents can be obtained, thereby obtaining the barrage text features of the multiple barrage text contents, and the multiple barrage text features can be obtained. Maximum pooling is performed to obtain the second text feature, and then the second text feature is spliced with the first text feature, the first popularity feature, and the second popularity feature to obtain the video clip feature.

Figure 11 is a schematic diagram of a voting decision model provided by the embodiment of the present application. Referring to Figure 11, the first feature extraction The sub-model is the BERT model as an example. The voting decision model includes the BERT model, the first splicing layer and the second classification layer. The computer device obtains the dialogue text content and subtitle text content of the video clip, as well as the first popularity parameter, and also obtains the K barrage text content of the video clip and the second popularity parameter, where K is an integer greater than 1. The BERT model is used to extract the first text features corresponding to the dialogue text content and subtitle text content, as well as the barrage text features corresponding to K barrage text contents, and perform maximum pooling on the K barrage text features to obtain the second text feature. The voting decision model also includes a first heat feature table and a second heat feature table, and the first heat feature and the second heat feature are determined respectively by querying the first heat feature table and the second heat feature table. The obtained features can then be spliced and classified to determine the voting identifier.

The training process of the voting decision model includes:

Obtain the sample video clip, the sample text content associated with the sample video clip, and the popularity parameters of the sample text content. The sample video clip includes at least one of a positive sample video clip or a negative sample video clip. The positive sample video clip contains sample voting information and the sample Video clips whose participation rate of voting information reaches the target threshold. Negative sample video clips are: video clips that contain sample voting information and the participation rate of sample voting information does not reach the target threshold, or video clips that do not contain sample voting information. ; Based on the sample video clip, the sample text content associated with the sample video clip, and the popularity parameters of the sample text content, adjust the model parameters in the voting decision model. The model parameters include a popularity feature table.

By modeling the text content and popularity parameters of the video clips, the voting decision model is trained, which enables the voting decision model to learn the impact of the text content and popularity parameters of the video clips on whether to generate voting information, thereby improving the voting decision. The accuracy of the model can accurately determine the voting identification based on the voting determination model, avoiding the need to directly generate voting information for each video segment of the video. On the basis of improving the interactivity of the video, it also saves processing and reduces the data of the video. quantity. Among them, the training objectives of the voting determination model are: the voting identification of the positive sample video clips obtained by the voting determination model represents the voting information generated for the positive sample video clips, and the voting identification of the negative sample video clips obtained by the voting determination model Indicates that no voting information will be generated for this negative sample video clip.

It should be noted that the heat feature table is also used as one of the model parameters of the voting decision model. During the training process of the voting decision model, the heat feature table will be adjusted based on the training samples so that the heat feature table changes with the voting decision model. Gradually update through training to improve the accuracy of the hot feature table.

Figure 12 is a flow chart of a voting information display method provided by an embodiment of the present application. The execution subject of the embodiment of the present application is a terminal. The embodiment of the present application describes the process of generating the voting information of the video clip and then displaying the voting information. Referring to Figure 12, the method includes:

1201. The terminal obtains the voting information of the video clip based on the video clip.

Among them, the voting information is generated based on the voting topic and multiple voting candidates, the voting topic is generated based on the keywords in the text content associated with the video clip, and the multiple voting candidates are generated based on the keywords and voting topics. The text content includes at least one of first text content or second text content. The first text content is the text content contained in the video clip, and the second text content is the text content contained in the barrage of the video clip. The process of generating voting information for video clips is detailed in the above embodiments of the voting information generation method, and will not be described in detail in the embodiments of this application.

Furthermore, the voting information may be generated by the terminal or by other devices other than the terminal. For example, a server used to provide the video generates voting information for one or more video clips. When the terminal requests the video from the server, the server sends the video and the generated voting information to the terminal, and the terminal can Get the video to be played and the voting information of one or more video clips.

1202. The terminal determines interaction parameters based on the interest tag and voting information of the currently logged-in account. The interaction parameters represent the possibility of the account performing a voting operation based on the voting information.

The terminal is currently logged in with an account, and the interest tag of the account indicates the interest of the user to whom the account belongs. The terminal determines the interaction parameters based on the interest tag and the voting information, and can accurately determine whether the current user is interested in the voting information, thus ensuring that all The determined interaction parameters can accurately measure the likelihood of the current user participating in voting and whether there is a need to generate voting information for this video clip.

1203. When the interaction parameters meet the interaction conditions, the terminal displays voting information when playing video clips.

Only when the interaction parameters meet the interaction conditions, that is, when the current user is interested in the voting information, he may participate In the case of voting, the terminal will only display voting information when playing video clips. Among them, the interaction condition refers to the condition for displaying voting information when playing video clips.

When the interaction parameters do not meet the interaction conditions, it can be considered that the current user is not interested in the voting information and the probability of participating in voting is low. At this time, in order to avoid causing interference to the playback process of the video clip, do not play the video clip. Show voting information again.

Optionally, the interaction condition includes an interaction parameter threshold. If the interaction parameter is greater than the interaction parameter threshold, the voting information is displayed when the video clip is played. When the interaction parameter is not greater than the interaction parameter threshold, the voting information will not be displayed when the video clip is played.

It should be noted that the embodiment of this application only takes one video clip as an example to illustrate the display process of voting information of this video clip, and the video includes multiple video clips, and the multiple video information has voting information. In the case of , during the video playback process of the terminal, each video clip is played in sequence according to the playback order of each video clip. When the currently played video clip has corresponding voting information, based on the interest tag of the account and the currently played The voting information corresponding to the video clip determines the interaction parameters. When the interaction parameters meet the interaction conditions, the voting information is displayed. Otherwise, playback continues until the next video clip is played.

Another point that needs to be noted is that the embodiment of the present application only takes one video clip with one voting information as an example for explanation. In another embodiment, a video clip has multiple voting information, and the voting information whose corresponding interaction parameters satisfy the interaction conditions is selected from them and displayed when the video clip is played. Or, when there are multiple voting information whose corresponding interaction parameters satisfy the interaction conditions, select the voting information with the largest interaction parameter and display it when the video clip is played.

The method provided by the embodiment of this application determines the interaction parameters that can measure the possibility of the user participating in voting based on the interest tags and voting information of the currently logged-in account. Only when the interaction parameters meet the interaction conditions will they be displayed when playing the video clip. Voting information makes it easier to attract users to participate in voting, realizes personalized display of voting information, enhances users' active participation in interaction, improves interactive experience, and avoids interference to users with a low probability of participating in voting.

Based on the above-mentioned embodiment shown in Figure 12, the embodiment of the present application also provides another method of displaying voting information, and provides a detailed description of the process of determining interaction parameters. Figure 13 is a flow chart of another voting information display method provided by an embodiment of the present application. The execution subject of the embodiment of this application is a terminal. Referring to Figure 13, the method includes:

1301. The terminal obtains the voting information of the video clip based on the video clip.

The process of generating voting information for video clips can be found in the above-mentioned embodiments of the voting information generation method, and will not be described in detail in the embodiments of this application.

1302. The terminal obtains the interest characteristics of the account's interest tag, the voting topic characteristics of the voting topic, and the voting candidate characteristics of multiple voting candidates.

The interest feature is used to describe the interest tag, the voting topic feature is used to describe the voting subject, and the voting candidate feature is used to describe the voting candidate. And each of the above features can be in the form of vectors, matrices or other forms. By obtaining the characteristics of interest tags, voting topics, and multiple voting candidates, the interest tags, voting topics, and multiple voting candidates can be accurately quantified, so that subsequent calculations can be performed based on the quantified features to obtain interaction parameters.

1303. The terminal splices the interest features, voting topic features and multiple voting candidate features to obtain interactive features.

1304. The terminal classifies based on interaction features and obtains interaction parameters.

Since the interaction feature is spliced by the characteristics of the interest tag, the characteristics of the voting topic and the characteristics of multiple voting candidates, and integrates the information of the interest tag, voting topic and multiple voting candidates, the interaction obtained by classifying based on the interaction parameters Parameters can consider information about interest tags, voting topics, and multiple voting candidates, and can more accurately measure the user's interest in voting information, thereby accurately determining the user's likelihood of participating in voting.

1305. When the interaction parameters meet the interaction conditions, the terminal displays voting information when playing video clips.

This step is similar to the above-mentioned step 1203 and will not be described again.

The method provided by the embodiment of this application determines the interaction parameters that can measure the possibility of the user participating in voting based on the interest tags and voting information of the currently logged-in account. Only when the interaction parameters meet the interaction conditions will they be displayed when playing the video clip. Voting information, thereby attracting users to participate in voting, realizing personalized display of voting information, and improving interactive coverage. Coverage rate also avoids causing interference to users with a low probability of participating in voting.

In one possible implementation, the process of determining interaction parameters in the above steps 1302-1304 can be performed based on a voting interaction model, which includes a second feature extraction sub-model, a second splicing layer, and a third classification layer. Accordingly, the process of determining interaction parameters includes:

Call the second feature extraction sub-model to obtain the interest features of the interest tag, the voting topic features of the voting topic, and the voting candidate features corresponding to multiple voting candidates; call the second splicing layer to combine the interest features, voting topic features, and multiple voting candidates. The features of voting candidates are spliced together to obtain interactive features; the third classification layer is called to classify based on the interactive features to obtain interactive parameters.

The interest feature, the voting topic feature and the multiple voting candidate features are respectively used to describe the current user's interest tag, voting topic and multiple voting candidates, and the interest feature, voting topic feature and multiple voting candidate features can be integrated Taking into account the influence of the current user's interest tags, voting topics and multiple voting candidates, the interaction parameters are obtained through classification based on interaction features, which improves the accuracy of the interaction parameters.

Wherein, the second feature extraction sub-model is a BERT model or other types of models, which is not limited in the embodiments of the present application.

Figure 14 is a schematic diagram of a voting interaction model provided by an embodiment of the present application. Refer to Figure 14. Taking the second feature extraction sub-model as the BERT model as an example, the voting interaction model includes the BERT model, the second splicing layer and the third classification layer. . The computer device obtains the account's interest tag, voting topic, and M voting candidates, where M is an integer greater than 1. The BERT model is used to extract the features corresponding to the interest tags, voting topics and M voting candidates, and then splicing and classifying are performed through the second splicing layer and the third classification layer respectively to obtain the interaction parameters.

Optionally, the training process of the voting interaction model includes:

Obtain the sample interest tag of the sample account and the sample voting information in the sample video clip. The sample account has performed a voting operation based on the sample voting information; based on the sample interest tag, the sample voting topic and multiple sample voting candidates in the sample voting information Term, adjust the model parameters in the voting interaction model.

Among them, the sample account is any account, the sample interest tag indicates the interest of the user to whom the sample account belongs, the sample video clip includes sample voting information, and the sample account has performed a voting operation based on the sample voting information, indicating that the sample video is being played The sample voting information is displayed in the clip, and the user to whom the sample account belongs is interested in the sample voting information and has participated in the voting interaction. Therefore, adjusting the model parameters in the voting interaction model based on the sample interest labels, the sample voting topics in the sample voting information, and multiple sample voting candidates can enable the voting interaction model to learn the association between the sample interest labels and the sample voting information. relationship, thus having the function of determining the corresponding interaction parameters based on any interest tag and any voting information, which improves the accuracy of the voting interaction model. Among them, the training goal of the voting interaction model is to increase the interaction parameters obtained by the voting interaction model, that is, to enable the voting interaction model to predict based on sample interest labels, sample voting topics and multiple sample voting candidates. This sample account is more likely to conduct voting operations. For example, the interaction parameter represents the probability of the account performing a voting operation based on voting information, and the training goal is to make the interaction parameter obtained by the voting interaction model equal to 1.

It should be noted that the above optional solutions only take one training as an example. In the actual training process, multiple sets of sample data need to be obtained. Each set of sample data includes sample interest tags corresponding to the sample account and samples in the sample video clips. Based on the voting information, the voting interaction model is iteratively trained based on the multiple sets of sample data, thereby improving the accuracy of the voting interaction model, and thereby improving the accuracy of the interaction parameters determined by the voting interaction model.

Figure 15 is a schematic structural diagram of a voting information generation device provided by an embodiment of the present application. The device is installed in a computer device. Referring to Figure 15, the device includes:

The text content acquisition module 1501 is used to acquire text content associated with video clips in the video. The text content includes at least one of first text content or second text content. The first text content is the text content contained in the video clip, and the second text content is the text content contained in the video clip. The text content is the text content contained in the barrage of the video clip;

The topic generation module 1502 is used to generate voting topics for video clips based on keywords in the text content;

The candidate generation module 1503 is used to generate multiple voting candidates for video clips based on keywords and voting topics;

The voting information generation module 1504 is used to generate voting information for video clips based on the voting topic and multiple voting candidates.

Optionally, the topic generation module 1502 includes:

The encoding unit is used to encode keywords and obtain the keyword characteristics of the keywords;

The decoding unit is used to decode the keyword features to obtain the voting topic. The voting topic is composed of multiple voting topic words.

Optionally, the first generation model includes a first encoding sub-model and a first decoding sub-model;

The encoding unit is used to: call the first encoding sub-model, encode the N keywords, and obtain the keyword features of the N keywords, where N is an integer greater than 1;

Decoding unit for:

Call the first decoding sub-model to decode the N keyword features to obtain the first decoding feature. Based on the first decoding feature and the N keyword features, determine the first voting topic word;

Call the first decoding sub-model to decode the N keyword features and the first voting topic words to obtain the second decoding feature. Based on the second decoding feature and the N keyword features, determine the reference voting topic and the reference voting topic. Including the first voting topic words and the second voting topic words, until after N decodings, the reference voting topic obtained by the Nth decoding is determined as the voting topic of the video clip.

Optionally, the first generation model also includes a first classification layer and a preset word library. The preset word library includes a plurality of words and a decoding unit for:

Determine N first usage probabilities based on the 1st decoding feature and N keyword features. The jth first usage probability is the probability of using the jth keyword in the voting topic. j is a positive integer and j is not greater than N;

When the jth first usage probability satisfies the usage condition, the jth keyword is determined as the first voting topic word;

When each first usage probability does not meet the usage conditions, the first classification layer is called to classify based on the first decoding feature and the preset word library, and the classification probability of each word in the preset word library is obtained. Based on each The classification probability of each word determines the first voting topic word.

Optionally, a decoding unit is used for:

Determine N second usage probabilities based on the 2nd decoding feature and N keyword features. The jth second usage probability is the probability of using the jth keyword in the voting topic. j is a positive integer, and j is not greater than N;

When the j-th second usage probability satisfies the usage conditions, the j-th keyword is determined as the second voting topic word;

When each second usage probability does not meet the usage conditions, the first classification layer is called to classify based on the second decoding feature and the preset word library to obtain the classification probabilities of multiple candidate voting topics. Each candidate voting topic includes One first voting topic word and one second voting topic word;

Based on the classification probability of each candidate voting topic, the reference voting topic is determined.

Optionally, the training process of the first generation model includes:

Obtain the sample text content associated with the positive sample video clip. The positive sample video clip is a video clip that contains sample voting information and the participation rate of the sample voting information reaches the target threshold;

Get the sample voting topic contained in the sample voting information;

Based on the sample text content and the sample voting topic, model parameters in the first generative model are adjusted.

Optionally, the text content includes first text content and second text content, and the candidate generation module 1503 includes:

The first acquisition unit is used to acquire the first keyword, which is a keyword in the first text content;

A clustering unit is used to cluster the second text content to obtain multiple text categories, each text category containing at least one piece of second text content;

The second acquisition unit is used to extract the second keyword from each text category respectively;

The generation unit is used to respectively generate each keyword based on the first keyword, the voting topic and the second keyword of each text category. Voting candidates for text categories.

Optionally, generate units for:

Encode the first keyword, voting topic and the second keyword of the i-th text category to obtain keyword features, i is a positive integer, and i is not greater than the number of text categories;

The keyword features are decoded to obtain the i-th voting candidate, which consists of multiple voting candidate words.

Optionally, the second generation model includes a second encoding sub-model and a second decoding sub-model; a generation unit, used for:

Call the second encoding sub-model to encode the first keyword, voting topic and the second keyword of the i-th text category to obtain keyword features;

Call the second decoding sub-model to decode the M keyword features to obtain the first decoding feature. Based on the first decoding feature and the M keyword features, determine the first voting candidate word, where M is an integer greater than 1. ;

The second decoding sub-model is called to decode the M keyword features and the first voting candidate words to obtain the second decoding feature. Based on the second decoding feature and the M keyword features, the reference voting candidates are determined. Refer to The voting candidates include the first voting candidate words and the second voting candidate words, until after M times of decoding, the reference voting candidate obtained by the M-th decoding is determined as the i-th voting candidate of the video clip.

Optionally, the training process of the second generative model includes:

Obtain the sample voting topic and multiple sample voting candidates contained in the sample voting information;

Based on the correlation between each sample text content and each sample voting candidate, determine the text category of each sample voting candidate, where the text category includes the sample text content associated with the sample voting candidate;

Extract sample keywords from the text category of each sample voting candidate respectively;

Based on the sample text content, the sample voting topic, the plurality of sample voting candidates, and the sample keywords of each sample voting candidate, the model parameters in the second generation model are adjusted.

Optionally, the text content acquisition module 1501 is used for:

Obtain the first text content, the second text content, the first popularity parameter and the second popularity parameter. The first popularity parameter represents the popularity of the video clip, and the second popularity parameter represents the popularity of the barrage of the video clip;

Topic generation module 1502, used for:

Based on the first text content, the second text content, the first popularity parameter and the second popularity parameter, determine the voting identification of the video clip, where the voting identification indicates whether to generate voting information for the video clip;

In the case where the voting identification indicates that voting information is generated for the video clip, a voting topic of the video clip is generated based on the keywords in the obtained text content.

Optionally, the topic generation module 1502 is used for:

Obtain the first text feature of the first text content and the second text feature of the second text content;

Obtain the first heat feature of the first heat parameter and the second heat feature of the second heat parameter;

Splice the first text feature, the second text feature, the first popularity feature and the second popularity feature to obtain video clip features;

Classify based on the characteristics of the video clips to obtain the voting identification.

Optionally, the voting decision model includes a first feature extraction layer, a first splicing layer and a second classification layer, the voting decision model also includes a heat feature table, the heat feature table includes heat features of at least one heat parameter;

Topic generation module 1502, used for:

Call the first feature extraction sub-model to obtain the first text feature of the first text content and the second text feature of the second text content;

Query the heat feature table based on the first heat parameter and the second heat parameter to obtain the first heat feature and the second heat feature;

Call the first splicing layer to splice the first text feature, the second text feature, the first popularity feature and the second popularity feature to obtain the video clip features;

The second classification layer is called to classify based on the characteristics of the video clips and obtain the voting identification.

Optionally, the training process of the voting decision model includes:

Obtain the sample video clip, the sample text content associated with the sample video clip, and the popularity parameters of the sample text content. The sample video clip includes at least one of a positive sample video clip or a negative sample video clip. The positive sample video clip contains sample voting information and the sample Video clips whose participation rate of voting information reaches the target threshold. Negative sample video clips are: video clips that contain sample voting information and the participation rate of sample voting information does not reach the target threshold, or video clips that do not contain sample voting information. ;

Based on the sample video clip, the sample text content associated with the sample video clip, and the popularity parameters of the sample text content, adjust the model parameters in the voting decision model, and the model parameters include a popularity feature table.

Optionally, the device also includes:

The interaction parameter determination module is used to determine the interaction parameters based on the account's interest tags and voting information. The interaction parameters represent the possibility of the account performing voting operations based on the voting information;

The sending module is used to send voting information to the terminal of the account when the interaction parameters meet the interaction conditions. The voting information is used to display when the terminal plays the video clip.

Figure 16 is a schematic structural diagram of a voting information display device provided by an embodiment of the present application. Referring to Figure 16, the device includes:

The information acquisition module 1601 is used to obtain the voting information of the video clip based on the video clip in the video. The voting information is generated based on the voting topic and multiple voting candidates. The voting topic is generated based on the keywords in the text content associated with the video clip, and more Voting candidates are generated based on keywords and voting topics;

The parameter determination module 1602 is used to determine interaction parameters based on the interest tags and voting information of the currently logged-in account. The interaction parameters represent the possibility of the account performing voting operations based on the voting information;

Information display module 1603, used to display voting information when playing video clips when the interaction parameters meet the interaction conditions;

The text content includes at least one of first text content or second text content. The first text content is the text content contained in the video clip, and the second text content is the text content contained in the barrage of the video clip.

Optionally, the voting information includes a voting topic and multiple voting candidates. The parameter determination module 1602 includes:

The feature acquisition unit is used to acquire the interest features of the account's interest tag, the voting topic features of the voting topic, and the voting candidate features of multiple voting candidates;

The splicing unit is used to splice interest features, voting topic features and multiple voting candidate features to obtain interactive features;

Classification unit is used to classify based on interaction features and obtain interaction parameters.

Optionally, the voting interaction model includes a second feature extraction sub-model, a second splicing layer and a third classification layer;

The feature acquisition unit is used for: calling the second feature extraction sub-model to obtain the interest features of the interest tag, the voting topic features of the voting topic, and the voting candidate features of multiple voting candidates;

The splicing unit is used to: call the second splicing layer to splice the interest features, voting topic features and multiple voting candidate features to obtain interactive features;

Classification unit is used to: call the third classification layer, classify based on interaction features, and obtain interaction parameters.

Optionally, the training process of the voting interaction model includes:

Obtain the sample interest tag of the sample account and the sample voting information in the sample video clip, where the sample account has performed a voting operation based on the sample voting information;

Based on the sample interest tags, sample voting topics in the sample voting information, and multiple sample voting candidates, adjust the model parameters in the voting interaction model.

It should be noted that when the voting information generation device or the voting information display device provided in the above embodiments generates or displays voting information, only the division of the above functional modules is used as an example. In practical applications, the above functions can be used as needed. The above function allocation is completed by different functional modules, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the voting information generation device and the voting information generation method embodiments provided by the above embodiments belong to the same concept. The voting information display device and the voting information display method embodiments provided by the above embodiments belong to the same concept. For details of the implementation process, please refer to the method implementation. For example, we won’t go into details here.

Embodiments of the present application also provide a computer device. The computer device includes a processor and a memory. At least one computer program is stored in the memory. The at least one computer program is loaded and executed by the processor to implement the voting information of the above embodiments. The operation performed by the generate method or voting information display method.

Optionally, the computer device is provided as a terminal. Figure 17 shows a schematic structural diagram of a terminal 1700 provided by an exemplary embodiment of the present application.

The terminal 1700 includes: a processor 1701 and a memory 1702.

The processor 1701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 1701 can be implemented using at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array). . The processor 1701 can also include a main processor and a co-processor. The main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the co-processor is A low-power processor used to process data in standby mode. In some embodiments, the processor 1701 may be integrated with a GPU (Graphics Processing Unit, an image processing interface), and the GPU is responsible for rendering and drawing the content that needs to be displayed on the display screen. In some embodiments, the processor 1701 may also include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.

Memory 1702 may include one or more computer-readable storage media, which may be non-transitory. Memory 1702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash memory storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 1702 is used to store at least one computer program, and the at least one computer program is used to be possessed by the processor 1701 to implement the methods provided by the method embodiments in this application. Voting information generation method.

In some embodiments, the terminal 1700 optionally further includes: a peripheral device interface 1703 and at least one peripheral device. The processor 1701, the memory 1702 and the peripheral device interface 1703 may be connected through a bus or a signal line. Each peripheral device can be connected to the peripheral device interface 1703 through a bus, a signal line, or a circuit board. Optionally, the peripheral device includes: at least one of a radio frequency circuit 1704, a display screen 1705, and a camera assembly 1706.

The peripheral device interface 1703 may be used to connect at least one I/O (Input/Output) related peripheral device to the processor 1701 and the memory 1702 . In some embodiments, the processor 1701, the memory 1702, and the peripheral device interface 1703 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1701, the memory 1702, and the peripheral device interface 1703 or Both of them can be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The radio frequency circuit 1704 is used to receive and transmit RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals. Radio frequency circuitry 1704 communicates with communication networks and other communication devices through electromagnetic signals. The radio frequency circuit 1704 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals. Optionally, the radio frequency circuit 1704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and the like. Radio frequency circuitry 1704 can communicate with other devices through at least one wireless communication protocol. The wireless communication protocol includes but is not limited to: metropolitan area network, mobile communication networks of all generations (2G, 3G, 4G and 5G), wireless LAN and/or WiFi (Wireless Fidelity, wireless fidelity) network. In some embodiments, the radio frequency circuit 1704 may also include NFC (Near Field Communication) related circuits, which is not limited in this application.

The display screen 1705 is used to display UI (User Interface, user interface). The UI can include graphics, text, icons, videos, and any combination thereof. When display screen 1705 is a touch display screen, display screen 1705 also has the ability to collect touch signals on or above the surface of display screen 1705 . The touch signal can be input to the processor as a control signal 1701 for processing. At this time, the display screen 1705 can also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards. In some embodiments, there may be one display screen 1705, which is provided on the front panel of the terminal 1700; in other embodiments, there may be at least two display screens 1705, which are respectively provided on different surfaces of the terminal 1700 or have a folding design; In other embodiments, the display screen 1705 may be a flexible display screen disposed on a curved surface or a folding surface of the terminal 1700. Even, the display screen 1705 can be set into a non-rectangular irregular shape, that is, a special-shaped screen. The display screen 1705 can be made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Enitting Diode).

The camera component 1706 is used to capture images or videos. Optionally, camera assembly 1706 includes a front camera and a rear camera. The front camera is installed on the front panel of the terminal 1700 , and the rear camera is installed on the back of the terminal 1700 . In some embodiments, there are at least two rear cameras, one of which is a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, so as to realize the integration of the main camera and the depth-of-field camera to realize the background blur function. Integrated with a wide-angle camera to achieve panoramic shooting and VR (Virtual Reality, virtual reality) shooting functions or other integrated shooting functions. In some embodiments, camera assembly 1706 may also include a flash. The flash can be a single color temperature flash or a dual color temperature flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.

Those skilled in the art can understand that the structure shown in FIG. 17 does not constitute a limitation on the terminal 1700, and may include more or fewer components than shown, or combine certain components, or adopt different component arrangements.

Optionally, the computer device is provided as a server. Figure 18 is a schematic structural diagram of a server provided by an embodiment of the present application. The server 1800 may vary greatly due to different configurations or performance, and may include one or more processors (Central Processing Units, CPU) 1801 and a Or one or more memories 1802, wherein at least one computer program is stored in the memory 1802, and the at least one computer program is loaded and executed by the processor 1801 to implement the methods provided by the above method embodiments. Of course, the server may also have components such as wired or wireless network interfaces and input and output interfaces to facilitate input and output. The server may also include other components for implementing device functions, which will not be described in detail here.

Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores at least one computer program. The at least one computer program is loaded and executed by the processor to implement the voting information generation of the above embodiments. The operation performed by the method.

Embodiments of the present application also provide a computer program product. The computer program product includes a computer program. When the computer program is executed by a processor, the operations performed by the voting information generation method or the voting information display method of the above embodiments are implemented. operate.

In some embodiments, the computer program involved in the embodiments of the present application may be deployed and executed on one computer device, or executed on multiple computer devices located in one location, or distributed in multiple locations and communicated through Executed on multiple computer devices interconnected by a network, multiple computer devices distributed in multiple locations and interconnected through a communication network can form a blockchain system.

It can be understood that in the specific implementation of this application, user information and other related data are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent needs to be obtained, and the collection of relevant data , use and processing need to comply with relevant laws, regulations and standards of relevant countries and regions. For example, the interest tags involved in this application were all obtained with full authorization.

Those of ordinary skill in the art can understand that all or part of the steps to implement the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The above-mentioned The storage media mentioned can be read-only memory, magnetic disks or optical disks, etc.

The above are only optional embodiments of the embodiments of the present application and are not intended to limit the embodiments of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the embodiments of the present application shall be included within the protection scope of this application.

Claims

A method for generating voting information, the method includes:

The computer device obtains text content associated with a video clip in the video, where the text content includes at least one of first text content or second text content, where the first text content is the text content included in the video clip, and the The second text content is the text content contained in the barrage of the video clip;

The computer device generates a voting topic for the video clip based on keywords in the text content;

The computer device generates a plurality of voting candidates for the video clip based on the keywords and the voting topic;

The computer device generates voting information for the video clip based on the voting topic and the plurality of voting candidates.
The method of claim 1, wherein the computer device generates a voting topic for the video clip based on keywords in the text content, including:

The computer device encodes the keywords to obtain keyword features of the keywords;

The computer device decodes the keyword features to obtain the voting topic, which is composed of a plurality of voting topic words.
The method of claim 2, wherein the first generative model includes a first encoding sub-model and a first decoding sub-model;

The computer device encodes the keywords to obtain keyword features of the keywords, including:

The computer device calls the first encoding sub-model, encodes the N keywords, and obtains keyword features of the N keywords, where N is an integer greater than 1;

The computer device decodes the keyword features to obtain the voting topic, including:

The computer device calls the first decoding sub-model, decodes the N keyword features to obtain the first decoding feature, and determines the first decoding feature and the N keyword features based on the first decoding feature and the N keyword features. One voting topic word;

The computer device calls the first decoding sub-model, decodes the N keyword features and the first voting topic words, and obtains the second decoding feature, based on the second decoding feature and the N The keyword features determine the reference voting topic, and the reference voting topic includes the first voting topic words and the second voting topic words, until after N times of decoding, the reference voting topic obtained by the Nth decoding is determined as The poll topic for said video clip.
The method according to claim 3, wherein the first generation model further includes a first classification layer and a preset word library, the preset word library includes a plurality of words, the first decoding feature based on the and N keyword features to determine the first voting topic word, including:

N first usage probabilities are determined based on the 1st decoding feature and N keyword features, and the jth first usage probability is the probability of using the jth keyword in the voting topic, j is a positive integer, and j is not greater than N;

In the case where the jth first usage probability satisfies the usage condition, determine the jth keyword as the first voting topic word;

In each case where the first usage probability does not meet the usage conditions, the first classification layer is called to perform classification based on the first decoding feature and the preset word library to obtain the preset words The first voting topic word is determined based on the classification probability of each word in the library.
The method according to claim 3, wherein determining the reference voting topic based on the second decoding feature and the N keyword features includes:

N second usage probabilities are determined based on the 2nd decoding feature and N keyword features, and the jth second usage probability is the probability of using the jth keyword in the voting topic, j is a positive integer, and j is not greater than N;

In the case where the jth second usage probability satisfies the usage condition, determine the jth keyword as the second voting topic word;

In the case where each of the second usage probabilities does not meet the usage conditions, the first classification layer is called to perform classification based on the second decoding feature and the preset word library to obtain multiple candidate voting topics. The classification probability, each of the candidate voting topics includes one of the first voting topic words and one of the second voting topic words;

The reference voting topic is determined based on the classification probability of each of the candidate voting topics.
The method according to claim 3, wherein the training process of the first generative model includes:

The computer device obtains sample text content associated with a positive sample video clip, where the positive sample video clip is a video clip that contains sample voting information and the participation rate of the sample voting information reaches a target threshold;

The computer device obtains a sample voting topic contained in the sample voting information;

The computer device adjusts model parameters in the first generative model based on the sample text content and the sample voting topic.
The method of claim 1, wherein the text content includes the first text content and the second text content, and the computer device generates the video clip based on the keywords and the voting topic Multiple voting candidates, including:

The computer device obtains a first keyword, where the first keyword is a keyword in the first text content;

The computer device performs clustering on the second text content to obtain a plurality of text categories, each of the text categories containing at least one piece of the second text content;

The computer device extracts second keywords from each of the text categories respectively;

The computer device generates voting candidates for each of the text categories based on the first keyword, the voting topic, and the second keyword of each of the text categories.
The method of claim 7, wherein the computer device generates votes for each of the text categories based on the first keyword, the voting topic, and a second keyword for each of the text categories. Candidates include:

The computer device encodes the first keyword, the voting topic and the i-th second keyword of the text category to obtain keyword features, i is a positive integer, and i is not greater than the text category quantity;

The computer device decodes the keyword features to obtain the i-th voting candidate, where the i-th voting candidate is composed of multiple voting candidate words.
The method of claim 8, wherein the second generative model includes a second encoding sub-model and a second decoding sub-model;

The computer device encodes the first keyword, the voting topic and the second keyword of the i-th text category to obtain keyword features, including:

The computer device calls the second encoding sub-model to encode the first keyword, the voting topic and the i-th second keyword of the text category to obtain the keyword feature;

The computer device decodes the keyword features to obtain the i-th voting candidate, including:

The computer device calls the second decoding sub-model, decodes the M keyword features to obtain the first decoding feature, and determines the first decoding feature based on the first decoding feature and the M keyword features. A voting candidate word, M is an integer greater than 1;

The computer device calls the second decoding sub-model to decode the M keyword features and the first voting candidate words to obtain a second decoding feature. Based on the second decoding feature and M Each of the keyword features is used to determine a reference voting candidate. The reference voting candidate includes the first voting candidate word and the second voting candidate word. Until after M times of decoding, the Mth decoding is obtained. The reference voting candidate is determined to be the i-th voting candidate of the video clip.
The method according to claim 9, wherein the training process of the second generative model includes:

The computer device obtains sample text content associated with a positive sample video clip, where the positive sample video clip is a video clip that contains sample voting information and the participation rate of the sample voting information reaches a target threshold;

The computer device obtains a sample voting topic and a plurality of sample voting candidates included in the sample voting information;

The computer device determines a text category of each of the sample voting candidates based on a correlation between each of the sample text content and each of the sample voting candidates, the text category including the sample voting candidates Sample text content associated with the item;

The computer device extracts sample keywords from the text category of each of the sample voting candidates respectively;

The computer device adjusts model parameters in the second generation model based on the sample text content, the sample voting topic, the plurality of sample voting candidates, and the sample keywords of each of the sample voting candidates. .
The method according to any one of claims 1-10, wherein the computer device obtains text content associated with video clips in the video, including:

The computer device acquires the first text content, the second text content, a first popularity parameter, and a second popularity parameter. The first popularity parameter represents the popularity of the video clip, and the second popularity parameter Indicates the popularity of the barrage of the video clip;

The computer device generates a voting topic for the video clip based on the keywords in the text content, including:

The computer device determines a voting identification of the video clip based on the first text content, the second text content, the first popularity parameter and the second popularity parameter, and the voting identification indicates whether the video clip is the The above video clips are used to generate voting information;

The computer device generates the voting topic based on keywords in the text content when the voting identification indicates that voting information is generated for the video clip.
The method of claim 11, wherein the computer device determines the video segment based on the first text content, the second text content, the first popularity parameter, and the second popularity parameter. Voting signs, including:

The computer device obtains a first text feature of the first text content and a second text feature of the second text content;

The computer device acquires a first heat characteristic of the first heat parameter and a second heat characteristic of the second heat parameter;

The computer device splices the first text feature, the second text feature, the first heat feature and the second heat feature to obtain video clip features;

The computer device performs classification based on the characteristics of the video clips to obtain the voting identification.
The method according to claim 12, wherein the voting decision model includes a first feature extraction sub-model, a first splicing layer and a second classification layer, the voting decision model further includes a popularity feature table, the popularity feature table includes at least The heat characteristic of a heat parameter;

The computer device obtains the first text feature of the first text content and the second text feature of the second text content, including: the computer device calls the first feature extraction sub-model to obtain the first a first text feature of the text content and a second text feature of the second text content;

The computer device obtains the first heat characteristic of the first heat parameter and the second heat feature of the second heat parameter, including: the computer device queries based on the first heat parameter and the second heat parameter. The heat characteristic table obtains the first heat characteristic and the second heat characteristic;

The computer device splices the first text feature, the second text feature, the first heat feature and the second heat feature to obtain video clip features, including: the computer device calls the third A splicing layer that splices the first text feature, the second text feature, the first heat feature and the second heat feature, Obtain the characteristics of the video clip;

The computer device performs classification based on the characteristics of the video clips to obtain the voting identification, including: the computer equipment calls the second classification layer, performs classification based on the characteristics of the video clips, and obtains the voting identification.
The method according to claim 13, wherein the training process of the voting decision model includes:

The computer device obtains a sample video clip, a sample text content associated with the sample video clip, and a popularity parameter of the sample text content. The sample video clip includes at least one of a positive sample video clip or a negative sample video clip, so The positive sample video clip is a video clip that contains sample voting information and the participation rate of the sample voting information reaches the target threshold. The negative sample video clip is a video clip that contains sample voting information and the participation rate of the sample voting information does not reach the target threshold. At least one of the video clips with the target threshold, or the video clips that do not contain sample voting information;

The computer device adjusts model parameters in the voting decision model based on the sample video clip, the sample text content associated with the sample video clip, and the popularity parameters of the sample text content, where the model parameters include the popularity Feature table.
The method according to any one of claims 1 to 10, wherein after the computer device generates the voting information of the video clip based on the voting topic and the plurality of voting candidates, the method further includes:

The computer device determines interaction parameters based on the interest tag of the account and the voting information, where the interaction parameters represent the possibility of the account performing a voting operation based on the voting information;

When the interaction parameters meet the interaction conditions, the computer device sends the voting information to the terminal of the account, and the voting information is used to display when the terminal plays the video clip.
A method for displaying voting information, the method comprising:

The computer device obtains voting information of the video clip based on the video clip, the voting information is generated based on a voting topic and a plurality of voting candidates, and the voting topic is based on keywords in text content associated with the video clip. Generating, the plurality of voting candidates are generated based on the keywords and the voting topic;

The computer device determines interaction parameters based on the interest tag of the currently logged-in account and the voting information, where the interaction parameters represent the possibility of the account performing a voting operation based on the voting information;

The computer device displays the voting information when playing the video clip when the interaction parameters meet the interaction conditions;

Wherein, the text content includes at least one of first text content or second text content, the first text content is the text content contained in the video clip, and the second text content is the bounce of the video clip. The text content contained in the scene.
The method according to claim 16, wherein the computer device determines interaction parameters based on the interest tag of the currently logged-in account and the voting information, including:

The computer device acquires the interest characteristics of the interest tag, the voting topic characteristics of the voting topic, and the voting candidate characteristics of the plurality of voting candidates;

The computer device splices the interest characteristics, the voting topic characteristics and the plurality of voting candidate characteristics to obtain interactive characteristics;

The computer device performs classification based on the interaction characteristics to obtain the interaction parameters.
The method according to claim 17, wherein the voting interaction model includes a second feature extraction sub-model, a second splicing layer and a third classification layer;

The computer device obtains the interest characteristics of the interest tag, the voting topic characteristics of the voting topic, and the voting candidate characteristics of the plurality of voting candidates, including: the computer equipment calls the second feature extraction sub-model. , obtain the interest characteristics of the interest tag, the voting topic characteristics of the voting topic, and the voting candidate characteristics of the plurality of voting candidates;

The computer device splices the interest features, the voting topic features and multiple voting candidate features to obtain interactive features, including: the computer device calls the second splicing layer to combine the interest features, all voting candidate features Splicing the voting topic characteristics and the plurality of voting candidate characteristics to obtain the interactive characteristics;

The computer device performs classification based on the interaction characteristics to obtain the interaction parameters, including: the computer equipment calls the third classification layer, performs classification based on the interaction characteristics, and obtains the interaction parameters.
The method according to claim 18, wherein the training process of the voting interaction model includes:

The computer device obtains the sample interest tag of the sample account and the sample voting information in the sample video clip, wherein the sample account has performed a voting operation based on the sample voting information;

The computer device adjusts model parameters in the voting interaction model based on the sample interest tag, the sample voting topic in the sample voting information, and a plurality of sample voting candidates.
A voting information generation device, installed in computer equipment, the device includes:

A text content acquisition module, configured to acquire text content associated with video clips in the video, where the text content includes at least one of first text content or second text content, where the first text content is included in the video clip. Text content, the second text content is the text content contained in the barrage of the video clip;

A topic generation module, configured to generate voting topics for the video clips based on keywords in the text content;

A candidate generation module, configured to generate multiple voting candidates for the video clip based on the keywords and the voting topic;

A voting information generation module, configured to generate voting information for the video clip based on the voting topic and the plurality of voting candidates.
A voting information display device, installed in computer equipment, the device includes:

An information acquisition module, configured to obtain voting information of video clips based on video clips in the video, where the voting information is generated based on a voting topic and a plurality of voting candidates, and the voting topic is based on text content associated with the video clips. The keywords in are generated, and the plurality of voting candidates are generated based on the keywords and the voting topic;

A parameter determination module, configured to determine interaction parameters based on the interest tag of the currently logged-in account and the voting information, where the interaction parameters represent the possibility of the account performing a voting operation based on the voting information;

An information display module, configured to display the voting information when the video clip is played when the interaction parameters meet the interaction conditions;

Wherein, the text content includes at least one of first text content or second text content, the first text content is the text content contained in the video clip, and the second text content is the bounce of the video clip. The text content contained in the scene.
A computer device. The computer device includes a processor and a memory. At least one computer program is stored in the memory. The at least one computer program is loaded and executed by the processor to implement any of claims 1 to 15. The operations performed by the voting information generating method described in one of the claims, or to implement the operations performed by the voting information display method described in any one of claims 16 to 19.
A computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is loaded and executed by a processor to implement the method described in any one of claims 1 to 15 The operations performed by the voting information generation method, or the operations performed by the voting information display method according to any one of claims 16 to 19.
A computer program product, including a computer program that, when executed by a processor, implements the operations performed by the voting information generation method according to any one of claims 1 to 15, or implements the operations performed by any one of claims 16 to 19. An operation performed by the voting information display method described in one item.