CN113055741B

CN113055741B - Video abstract generation method, electronic equipment and computer readable storage medium

Info

Publication number: CN113055741B
Application number: CN202011622336.3A
Authority: CN
Inventors: 詹长静; 周维; 陈志刚
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2023-05-30
Anticipated expiration: 2040-12-31
Also published as: CN113055741A

Abstract

The application discloses a method for generating a video abstract, electronic equipment and a computer readable storage medium, wherein the method for generating the video abstract comprises the following steps: acquiring a source video, and dividing the source video into a plurality of unit video clips; screening a plurality of unit video clips from the plurality of unit video clips as key video clips according to bullet screen information corresponding to each unit video clip; and splicing all the key video fragments according to the time sequence to generate a video abstract corresponding to the source video. By the scheme, the personalized video abstract can be generated.

Description

Video abstract generation method, electronic equipment and computer readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method for generating a video abstract, an electronic device, and a computer readable storage medium.

Background

With the development of internet technology and multimedia technology, digital video has been heavily gushed in, such as news, advertisements, television, movies, webcasts, and the like. Whether learning or social entertainment, users are surrounded by massive videos, and it is not easy to want to quickly search for interesting videos in massive videos, so that a video abstract is created, which is a brief representation of video content as the name implies, in order to facilitate the user to quickly understand the video content and decide whether to watch it in detail. And indexing and querying for video databases, etc.

Generalized video summarization is generally divided into two types, one is to directly extract key frames in the video to be combined into a new video, similar to a trailer of a movie. One is video concentration, which is more complex than the former, involves the design and implementation of a series of algorithms, such as moving object detection, object trajectory extraction, trajectory optimization, and the generation of a concentrated video.

The existing video abstract generation method is to sample the video based on the time point, namely, one frame or one segment is extracted at regular intervals, and the method is easy to realize, but lacks attention to the video content. Another method for generating a video abstract is to combine visual information (such as color, shape, motion direction, etc.) in the video with other multimedia information (such as audio, subtitle, etc.), etc., and apply a pattern recognition method to apply video and image processing techniques, and finally generate a key frame sequence or thumbnail video. However, the method ignores the demands of the user, lacks the characteristics of interaction between the user and the video, and cannot reflect the video content focused by the user.

Disclosure of Invention

The technical problem that this application mainly solves is to provide a video abstract generating method, electronic equipment and computer readable storage medium, can produce individualized video abstract.

In order to solve the above problems, a first aspect of the present application provides a method for generating a video summary, where the method includes: acquiring a source video, and dividing the source video into a plurality of unit video clips; screening a plurality of unit video clips from the plurality of unit video clips as key video clips according to bullet screen information corresponding to each unit video clip; and splicing all the key video fragments according to the time sequence to generate a video abstract corresponding to the source video.

In order to solve the above problem, a second aspect of the present application provides an electronic device, including a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory, so as to implement the method for generating a video summary of the first aspect.

In order to solve the above-mentioned problem, a third aspect of the present application provides a computer-readable storage medium having stored thereon program instructions that, when executed by a processor, implement the method for generating a video summary of the first aspect.

The beneficial effects of the invention are as follows: different from the condition of the prior art, after the source video is acquired, the source video can be divided into a plurality of unit video segments, then a plurality of unit video segments are screened out from the plurality of unit video segments to serve as key video segments according to bullet screen information corresponding to each unit video segment, and all the key video segments can be spliced according to time sequence to generate a video abstract corresponding to the source video. Because all the key video clips are screened from a plurality of unit video clips according to barrage information, the characteristics of user interaction are integrated, and the unit video clips which are interested by the user can be captured more accurately, so that the content focused by the user can be reflected in the generated video abstract, and the personalized video abstract can be generated.

Drawings

Fig. 1 is a flowchart of a first embodiment of a method for generating a video summary of the present application;

FIG. 2 is a flowchart of an embodiment of step S12 in FIG. 1;

FIG. 3 is a flowchart illustrating an embodiment of step S122 in FIG. 2;

FIG. 4 is a flowchart of a second embodiment of a method for generating a video summary of the present application;

FIG. 5 is a flowchart illustrating an embodiment of step S45 in FIG. 4;

FIG. 6 is a flowchart of a third embodiment of a method for generating a video summary of the present application;

FIG. 7 is a schematic diagram of a frame of an embodiment of an electronic device of the present application;

FIG. 8 is a schematic diagram of a framework of one embodiment of a computer readable storage medium of the present application.

Detailed Description

The following describes the embodiments of the present application in detail with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a flowchart of a first embodiment of a method for generating a video summary of the present application. Specifically, the method for generating a video summary of the present embodiment may include the following steps:

step S11: and acquiring a source video, and dividing the source video into a plurality of unit video clips.

The video summary is a brief representation of video content, so that a user can quickly understand the video content and decide whether to watch the video content in detail, and therefore, some video frames can be screened from the source video to form the video summary corresponding to the source video. Specifically, after the source video is acquired, the source video may be divided into a plurality of unit video pieces based on a time sequence of the source video, and then the unit video pieces that can be used to compose a video summary may be screened out of the unit video pieces. In this embodiment, the source video may be divided according to a preset time length to obtain all the unit video segments, where the preset time length may be set according to actual needs, for example, 1 second or 2 seconds.

Step S12: and screening a plurality of unit video clips from the plurality of unit video clips to serve as key video clips according to bullet screen information corresponding to each unit video clip.

The bullet screen is a comment subtitle which is shot by a user when watching a video, comment objects are short fragments in the video, the expression form is short and exquisite, and the expression content is recorded from worship, exclamation, happiness, drama, undevelope, fun, groove spitting and the like, so that the instant response and emotion of the user are recorded, and the preference of the user to the video content is reflected. It can be said that a bullet screen is an interaction between a user and the video content itself, and a user watching the same video can send a bullet screen through the bullet screen to discuss the video content or conduct a conversation, for example, "… … is not right before, i feel … …" and express that the user is disagreeable to other views and his own views, so that, although the time when the bullet screen is actually sent may be different, in the time sequence of playing the source video, the bullet screens sent at the same time tend to have the same theme or feature. Therefore, all the barrages sent in the playing time of a certain unit video clip are the barrages corresponding to the unit video clip, and all the barrages corresponding to the unit video clip can reflect whether the user pays attention to or is interested in the unit video clip, if the user pays attention to or is interested in a certain unit video clip, the unit video clip can be used as a key video clip; therefore, a plurality of unit video clips can be screened out from the plurality of unit video clips as key video clips according to the barrage information corresponding to each unit video clip.

Step S13: and splicing all the key video fragments according to the time sequence to generate a video abstract corresponding to the source video.

It can be understood that after all the key video snippets are obtained, the corresponding time sequence of each key video snippet in the source video can be spliced, so that a video abstract corresponding to the source video can be generated.

According to the scheme, after the source video is acquired, the source video can be divided into a plurality of unit video clips, then a plurality of unit video clips are screened out from the plurality of unit video clips to serve as key video clips according to bullet screen information corresponding to each unit video clip, and all the key video clips can be spliced according to time sequence to generate a video abstract corresponding to the source video. Because all the key video clips are screened from a plurality of unit video clips according to barrage information, the characteristics of user interaction are integrated, and the unit video clips which are interested by the user can be captured more accurately, so that the content focused by the user can be reflected in the generated video abstract, and the personalized video abstract can be generated.

Further, referring to fig. 2, fig. 2 is a flowchart of an embodiment of step S12 in fig. 1. In an embodiment, the step S12 may specifically include:

step S121: and acquiring bullet screen information corresponding to each unit video segment, and carrying out type division on each piece of bullet screen information.

It can be understood that, to determine whether a user pays attention to or is interested in a unit video clip through all the barrages corresponding to the unit video clip, an instant response and emotion of the user recorded by each barrage need to be analyzed, so after the barrage information corresponding to each unit video clip is obtained, each barrage information needs to be classified, and then the attention degree or the interest degree of the user to the content of the corresponding unit video clip can be analyzed according to the classified situation.

In one embodiment, the bullet screen information includes a user population of the bullet screen, a bullet screen type of the bullet screen, and an emotional tendency of the bullet screen; the step S121 may specifically include: and acquiring bullet screen texts of all bullet screen information corresponding to the unit video clips, dividing all bullet screen texts according to user groups of bullet screens, bullet screen types of bullet screens and emotion tendencies of bullet screens, and counting the bullet screen numbers of all user groups, bullet screen numbers of all bullet screen types and bullet screen numbers of all emotion tendencies.

The bullet screen information of a bullet screen may reflect the user group corresponding to the bullet screen, for example, in many video playing platforms, the color of the bullet screen may be used as a significant way to distinguish the user group, taking the curry website as an example, the white bullet screen represents the ordinary user, and when the ordinary user reaches a certain level to become a high level user, the color bullet screen may be sent. The bullet screen information of a bullet screen may also reflect the bullet screen type of the bullet screen, common bullet screen types include, but are not limited to: criticizing type barrages, star chasing type barrages, science popularization type barrages, translation type barrages, dramatic transmission type barrages, symbol type barrages, random abuse type barrages and the like. The bullet screen information of a bullet screen can also reflect the emotional tendency of the bullet screen, and the emotional tendency generally comprises positive direction, negative direction or neutral direction. Therefore, after the barrage texts of all barrage information corresponding to the unit video clips are obtained, all barrage texts can be divided according to the user groups of the barrages, the barrage types of the barrages and the emotion tendencies of the barrages, and then the barrage numbers of all the user groups, the barrage numbers of all the barrage types and the barrage numbers of all the emotion tendencies can be counted.

Further, the step of obtaining all bullet screen texts corresponding to the unit video clip in the step S121 may specifically include: according to the first time period corresponding to the unit video clip, acquiring all barrage texts with the publication time in the second time period as all barrage texts corresponding to the unit video clip; the starting time of the first time period and the starting time of the second time period differ by a preset duration, and the duration of the first time period is the same as the duration of the second time period.

It can be appreciated that, considering that there is a time delay in the barrage transmission, the barrage text with the posting time at the second moment is actually a comment of the user on the video at the first moment, and the preset time length between the first moment and the second moment can be set according to the actual situation, for example, the length of the barrage text or the state of the network can be considered. For example, the bullet screen at the time t can be traced back for 2 seconds, i.e., the bullet screen at the time t corresponds to the video at the time t-2; in addition, the movie transparent bullet screen is special, and the movie transparent bullet screen describes the video content at the future time, so that the movie transparent bullet screen at the time t can be corresponding to the video at the time t+2. Therefore, according to the first time period corresponding to a certain unit video segment, all barrage texts with the publication time in the second time period are acquired and used as all barrage texts corresponding to the unit video segment, so that barrage information corresponding to the unit video segment can be acquired more accurately.

Step S122: and carrying out weighted summation on the barrage information based on the type of each barrage information and the weighting coefficient of each type to obtain the criticality of the unit video clip.

It can be understood that the bullet screen information belonging to different types may reflect that the attention degree or the interest degree of the user to the video may be different, so that different weighting coefficients may be set for different types of bullet screen information, and for a certain unit video segment, the key degree of the unit video segment may be obtained by weighting and summing all bullet screen information of the unit video segment based on the type of each bullet screen information and the weighting coefficient of each type.

Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of step S122 in fig. 2. In an embodiment, the step S122 may specifically include:

step S1221: and calculating the user group score of the unit video segment according to the bullet screen number of each user group and the preset weight of each user group.

It will be appreciated that in addition to being more visually attractive to the eye, color scriptures have a higher threshold than white scriptures, and that the high-level users represented behind them are also preferred objects that the video creator wants to attract. Therefore, different weights can be set for the barrages with different colors, namely preset weights are set for user groups with different categories, for example, when a priority object which is wanted to be attracted by a certain source video is a high-level user, the preset weights of the high-level user can be set to be higher; therefore, according to the bullet screen number of each user group in a certain unit video segment and the preset weights of each user group, the user group score of the unit video segment can be calculated, wherein the user group score indicates the key degree of the unit video segment in different user groups, and the higher the user group score is, the higher the key degree of the unit video segment can be indicated.

Step S1222: and calculating to obtain the bullet screen type score of the unit video segment according to the bullet screen quantity of each bullet screen type and the preset weight of each bullet screen type.

Common video content types include, but are not limited to: television shows, network shows, movies, shows, sporting events, cartoons, documentaries, news, music videos, game videos, fun videos, life videos, travel videos, short videos, and the like, common bullet screen types including, but not limited to: criticizing type barrages, star chasing type barrages, science popularization type barrages, translation type barrages, dramatic transmission type barrages, symbol type barrages, random abuse type barrages and the like, and various types of barrages are slightly different from each other in terms of different types of videos. The following table shows:

the above table shows descriptions of various bullet screen types, and it can be found that the appearance of each bullet screen type is different for different types of video, so that the preset weights of different bullet screen types need to be set according to the types of source video. For example, for the case where the source video is a movie and documentary, weights may be set for the various bullet screen types in the table above as follows:

	film making apparatus	Recording sheet
			Comment type bullet screen	0.4	0.4
Star-chasing bullet screen	0.2	0.05
			Bullet screen for science popularization	0.05	0.2
Translation type barrage	0.05	0.05
			Bullet screen	0.1	0.1
Symbol barrage	0.1	0.1
			Others	0.1	0.1

It can be understood that, for the case that the source video is a movie, the criticizing of the movie and the comment of stars of actors are mainly expressed by using the barrage, so that higher weights can be set for the criticizing barrage and the star-following barrage, while for the case that the source video is a documentary, the user mainly criticizes the documentary and carries out the science popularization of the relevant knowledge by using the barrage expression, so that higher weights can be set for the criticizing barrage and the science popularization barrage; in addition, the random abuse class of barrages are typically video independent, so such barrages can be removed first. Therefore, according to the number of the bullet screens of each bullet screen type and the preset weight of each bullet screen type in a certain unit video segment, the bullet screen type score of the unit video segment can be calculated, wherein the bullet screen type score indicates the key degree of the unit video segment in different bullet screen types, and the higher the bullet screen type score is, the higher the key degree of the unit video segment can be indicated.

Step S1223: and calculating to obtain the emotion tendency score of the unit video segment according to the bullet screen number of each emotion tendency and the preset weight of each emotion tendency.

It will be appreciated that a barrage is a comment that a user sends immediately when watching a video, including the user's current emotional feelings, and the dissatisfaction. Therefore, different weights can be set for the barrages with different emotion tendencies, for example, an emotion classifier can be trained for barrage texts, then the emotion classifier is used for judging the emotion tendencies of the current user sending out a barrage, the weight of the barrage with positive emotion tendencies is set to be 1, the weight of the barrage with negative emotion tendencies is set to be-1, and the weight of the barrage with neutral emotion tendencies is set to be 0; according to the bullet screen number of various emotion tendencies in a certain unit video segment and the preset weight of various emotion tendencies, the emotion tendencies score of the unit video segment can be calculated, the emotion tendencies score indicates the key degree of the unit video segment in different emotion tendencies, and the higher the emotion tendencies score, the higher the key degree of the unit video segment can be indicated.

Step S1224: and adding the user population score, the barrage type score and the emotion tendency score to obtain the key degree of the unit video segment.

Specifically, dividing a source video by taking a preset time length as a unit to obtain a plurality of unit video clips, wherein for each unit video clipObtaining the barrages on the unit video segment, counting the number of the barrages with various colors and various types to obtain the number of the barrages of various user groups and the number of the barrages with various types, and carrying out weighted summation on the barrages according to the preset weights of the various user groups and the preset weights of the various barrages to obtain the user group scores and the barrages type scores of the unit video segment; meanwhile, judging the emotion tendencies of each bullet screen by adopting an emotion classifier, counting to obtain the bullet screen quantity of each emotion tendencies, and carrying out weighted summation on each bullet screen quantity according to the preset weight of each emotion tendencies to obtain the emotion tendencies score of the unit video segment; then, adding the user group score, the bullet screen type score and the emotion tendency score to obtain a total score S of the source video at each unit time _t ：

S _t ＝∑ _i w _i c _i +∑ _j w _j e _j +∑ _k w _k h _k ；

Wherein S is _t Scoring video at time t, c _i Number of barrages for the ith color, w _i E (0, 1), is the weight of the ith barrage color, e _j For the number of the j-th barrage type, w _j E (0, 1), is the weight of the jth barrage type, h _k Number of barrages, w, for kth emotional tendency _k The value is 1, 0-1, which is the weight of the kth emotion tendency. According to the formula, the total score of each unit video segment corresponding to the source video can be obtained, and the key degree of the unit video segment can be reflected according to the total score.

Step S123: and selecting the unit video clips with the highest key degree as the key video clips.

It can be understood that the length of the video summary to be generated can be set according to actual requirements, for example, the length can be set actively by the creator of the active video, or the length can be set according to the duration of the active video; in an embodiment, the length of the video summary to be generated is determined to be N seconds, for example 60s,300s, and the like, and the length of each unit video clip is 1 second, at this time, the first N unit video clips with the highest key degree may be selected from all the unit video clips as key video clips, that is, the video clip with the most interest to the user is selected, and the total duration of the N key video clips is the same as the length of the video summary, so that the N key video clips may be spliced according to the corresponding time sequence of each key video clip in the source video, to generate the video summary corresponding to the source video.

Because the influence of the user group score of each unit video clip is considered when selecting the key video clip, a video summary meeting the preference of a specific user can be generated for the specific user group. Because the influence of emotion tendency scores of each unit video clip is considered when the key video clip is selected, the video creator can know the user preference through emotion analysis on the barrage.

Referring to fig. 4, fig. 4 is a flowchart of a second embodiment of a method for generating a video summary of the present application. Specifically, the method for generating a video summary of the present embodiment may include the following steps:

step S41: and acquiring a source video, and dividing the source video into a plurality of unit video clips.

Step S42: and screening a plurality of unit video clips from the plurality of unit video clips to serve as key video clips according to bullet screen information corresponding to each unit video clip.

Step S43: and splicing all the key video fragments according to the time sequence to generate a video abstract corresponding to the source video.

In this embodiment scenario, steps S41 to S43 provided in this embodiment are substantially similar to steps S11 to S13 in the previous embodiment, and are not described here again.

Step S44: and taking the key video clips with continuous corresponding time periods as a video clip group.

It can be understood that after the first N unit video clips with the highest criticality are obtained as key video clips, the N key video clips can be processed according to the corresponding time periodsWhether to divide continuously or not, the key video segments with continuous corresponding time periods are taken as a video segment group, for example, the time periods corresponding to the N key video segments in the source video are specifically: t is t ₁ ,t ₂ ,t ₃ ,t ₁₀ ,t ₁₁ ,t ₂₀ ,t ₂₆ ,t ₂₇ ,t ₃₅ ,t ₃₆ ,…t _N Therefore, [ t ] can be calculated ₁ ,t ₂ ,t ₃ ]As a video clip group, [ t ] ₁₀ ,t ₁₁ ]As a group of video clips, and so on.

Step S45: and obtaining candidate keywords of the video clip group according to all barrage texts corresponding to the video clip group.

For each video clip group, the video clip group comprises time continuous unit video clips, so that the correlation of barrage information corresponding to all unit video clips in a certain video clip group is high, candidate keywords of the video clip group can be screened and obtained from all barrage texts corresponding to the certain video clip group, and then labels of the video clip group can be obtained from the candidate keywords.

Further, please refer to fig. 5, fig. 5 is a flowchart illustrating an embodiment of step S45 in fig. 4. In an embodiment, the step S45 may specifically include:

step S451: and obtaining all barrage texts corresponding to the video segment group, and removing invalid barrage texts to obtain effective barrage texts corresponding to the video segment group.

Step S452: and performing word segmentation on the effective barrage text corresponding to the video segment group to obtain a word set after word segmentation.

Step S453: and screening the word set after word segmentation according to a preset stop word bank and a keyword bank to obtain the candidate keywords.

It can be understood that after all the barrage texts corresponding to each video clip group are obtained, firstly, invalid barrages such as symbol barrages and the like need to be removed, so that all the valid barrage texts corresponding to the video clip group are obtained. Then, word segmentation processing can be carried out on all the effective barrage texts corresponding to the video segment group to obtain word sets after word segmentation processing, then the word sets after word segmentation processing can be screened according to a preset stop word bank and a keyword bank, stop words are removed, keywords are reserved, and then candidate keywords can be obtained, so that the tag of the video segment group can be obtained from the candidate keywords. It will be appreciated that different disabling word stores and keyword stores need to be preset for different types of source videos, and the occurrence of each bullet screen type is different for different types of videos, so that different disabling word stores and keyword stores can be set according to the types of source videos, the disabling word stores generally include words that cannot occur or are not interesting to users for the current type of source videos, and the keyword stores generally include words interesting to users for the current type of source videos.

Step S46: and calculating the importance degree of each candidate keyword by adopting a preset statistical analysis method.

Step S47: and selecting the candidate keywords with the highest importance as labels of the video segment groups, and displaying the labels of the video segment groups on a progress bar of the video abstract.

The preset statistical analysis method can be used for carrying out statistical analysis on the keywords so as to evaluate the importance degree of a word or phrase on a corpus. The preset statistical analysis method may be a TF-IDF (term frequency-inverse document frequency, a common weighting technique for information retrieval and data mining) algorithm, and according to the TF-IDF principle, the importance level of each candidate keyword may be calculated, specifically, if a certain word or phrase appears in a video clip group with a high frequency and rarely appears in the video clip group, the word or phrase is considered to have a good category distinguishing capability, and is suitable for being used as a tag of the video clip group. Therefore, each candidate keyword is scored by adopting a TF-IDF algorithm, the higher the score of the candidate keyword is, the higher the importance degree of the candidate keyword is represented, so that the candidate keyword with the highest score can be selected as the label of the video clip group, and the label of the video clip group is displayed on the progress bar of the video abstract.

Referring to fig. 6, fig. 6 is a flowchart of a third embodiment of a method for generating a video summary of the present application. Specifically, the method for generating a video summary of the present embodiment may include the following steps:

step S61: and acquiring a source video, and dividing the source video into a plurality of unit video clips.

Step S62: and screening a plurality of unit video clips from the plurality of unit video clips to serve as key video clips according to bullet screen information corresponding to each unit video clip.

Step S63: and splicing all the key video fragments according to the time sequence to generate a video abstract corresponding to the source video.

In this embodiment scenario, steps S61 to S63 provided in this embodiment are substantially similar to steps S11 to S13 in the above embodiment, and are not repeated here.

Step S64: and obtaining effective barrage texts corresponding to all the key video clips, and performing de-duplication processing to obtain the rest barrage texts.

Step S65: and carrying out theme clustering on the residual barrage texts by adopting a preset clustering algorithm, and selecting the theme with the largest barrage number as a candidate theme.

It can be understood that after all the barrage texts corresponding to each key video segment are obtained, firstly, invalid barrages such as symbol barrages and the like need to be removed, so that valid barrage texts corresponding to all the key video segments are obtained. For all the barrages, the same user may send multiple barrages for the same theme, so that the valid barrage texts corresponding to all the key video clips need to be subjected to de-duplication processing to obtain the rest barrage texts. And then, carrying out theme clustering on the rest barrage texts by adopting a preset clustering algorithm to obtain a plurality of themes, and counting the number of barrages contained in each theme, so that the theme with the largest barrage number can be selected as a candidate theme, and specifically, the preset clustering algorithm can be a K-means (K mean) clustering algorithm or other clustering algorithms.

Step S66: and selecting the barrage meeting the preset condition from the barrages corresponding to the candidate subjects as the title of the video abstract.

It will be appreciated that the candidate topics correspond to a plurality of shots, and thus a suitable one of the shots needs to be selected as the title of the generated video summary. Because the number of the barrages corresponding to the candidate theme is large, a preset condition is required to be set to screen out the suitable barrages. In a specific embodiment, the preset condition includes that the publication time of the bullet screen is earliest, and the length of the bullet screen text meets the preset length; since the barrage texts under the same candidate theme have great similarity, if the barrage selected as the title is too short, the complete semantic meaning cannot be covered, and if the barrage selected as the title is too long, redundancy appears, so that the barrage with the earliest publishing time and proper length (for example, 15 characters can be set) can be selected from the barrages corresponding to the candidate theme to be used as the title of the video abstract. The video summary and title may then be combined together to generate a video summary with the title.

It can be understood that the labels and titles of the video summaries are extracted through the barrage content, and the generated video summaries can more accurately reflect the attention points of the users and enable the users to quickly know the video content, so that the attention of more users is attracted.

In addition, the video abstract generated by the method can be synchronously updated along with the update of the barrage, and as the attention points of the user to the same source video can be changed in different periods, the interest evolution of the user can be tracked in real time through the barrage sent by the user, and then the video abstract meeting the current user requirement is generated.

Referring to fig. 7, fig. 7 is a schematic frame diagram of an embodiment of an electronic device of the present application. The electronic device 70 comprises a memory 701 and a processor 702 coupled to each other, the processor 702 being configured to execute program instructions stored in the memory 701 to implement the steps of any of the video summary generation method embodiments described above. In one particular implementation scenario, electronic device 70 may include, but is not limited to: microcomputer, server.

In particular, the processor 702 is configured to control itself and the memory 701 to implement the steps of any of the video summary generation method embodiments described above. The processor 702 may also be referred to as a CPU (Central Processing Unit ). The processor 702 may be an integrated circuit chip with signal processing capabilities. The processor 702 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 702 may be commonly implemented by an integrated circuit chip.

In the above scheme, after the processor 702 obtains the source video, the source video may be divided into a plurality of unit video segments, and then a plurality of unit video segments are screened out from the plurality of unit video segments as key video segments according to the barrage information corresponding to each unit video segment, so that all the key video segments may be spliced according to the time sequence to generate the video summary corresponding to the source video. Because all the key video clips are screened from a plurality of unit video clips according to barrage information, the characteristics of user interaction are integrated, and the unit video clips which are interested by the user can be captured more accurately, so that the content focused by the user can be reflected in the generated video abstract, namely, the personalized video abstract can be generated; in addition, because the influence of the user group score of each unit video clip is considered when the key video clip is selected, aiming at a specific user group, a video abstract meeting the preference of the specific user can be generated; in addition, because the influence of emotion tendency scores of each unit video clip is considered when the key video clip is selected, the video creator can know the user preference by emotion analysis of the barrage; in addition, the labels and the titles of the video abstracts are extracted through the barrage content, the generated video abstracts can more accurately reflect the attention points of the users, and the users can quickly know the video content, so that the attention of more users is attracted; in addition, the video abstract generated by the method can be synchronously updated along with the update of the barrage, and as the attention points of the user to the same source video can be changed in different periods, the interest evolution of the user can be tracked in real time through the barrage sent by the user, and then the video abstract meeting the current user requirement is generated.

Referring to fig. 8, fig. 8 is a schematic diagram illustrating an embodiment of a computer readable storage medium according to the present application. The computer readable storage medium 80 stores program instructions 800 that can be executed by a processor, the program instructions 800 being configured to implement the steps of any of the video summary generation method embodiments described above.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the model embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

The elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all or part of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. A method for generating a video summary, the method comprising:

acquiring a source video, and dividing the source video into a plurality of unit video clips;

screening a plurality of unit video clips from the plurality of unit video clips as key video clips according to bullet screen information corresponding to each unit video clip;

splicing all the key video clips according to the time sequence to generate a video abstract corresponding to the source video;

the bullet screen information comprises user groups of bullet screens, bullet screen types of bullet screens and emotional tendency of bullet screens; screening a plurality of unit video clips as key video clips according to bullet screen information corresponding to each unit video clip, including:

acquiring bullet screen texts of all bullet screen information corresponding to the unit video clips, dividing all bullet screen texts according to user groups of bullet screens, bullet screen types of bullet screens and emotion tendencies of bullet screens, and counting the bullet screen numbers of all user groups, bullet screen numbers of all bullet screen types and bullet screen numbers of all emotion tendencies;

based on the type of each piece of barrage information and the weighting coefficient of each type, carrying out weighted summation on the barrage information to obtain the key degree of the unit video segment;

and selecting the unit video clips with the highest key degree as the key video clips.

2. The method of generating according to claim 1, wherein the obtaining all bullet screen text corresponding to the unit video clip includes:

according to the first time period corresponding to the unit video clip, acquiring all barrage texts with the publication time in the second time period as all barrage texts corresponding to the unit video clip; the starting time of the first time period and the starting time of the second time period differ by a preset duration, and the duration of the first time period is the same as the duration of the second time period.

3. The method according to claim 1, wherein the step of weighting and summing the bullet screen information based on the type of each bullet screen information and the weighting coefficients of the respective types to obtain the criticality of the unit video clip includes:

calculating to obtain the user group score of the unit video segment according to the bullet screen number of each user group and the preset weight of each user group;

calculating to obtain the bullet screen type score of the unit video segment according to the bullet screen quantity of each bullet screen type and the preset weight of each bullet screen type;

calculating to obtain emotion tendency scores of the unit video segments according to the bullet screen quantity of each emotion tendency and preset weights of each emotion tendency;

and adding the user population score, the barrage type score and the emotion tendency score to obtain the key degree of the unit video segment.

4. The method of generating according to claim 3, wherein the preset weights of the types of the bullet screen are set according to the type of the source video.

5. The generation method according to claim 1, characterized in that the generation method further comprises:

taking the key video clips with continuous corresponding time periods as a video clip group;

obtaining candidate keywords of the video clip group according to all barrage texts corresponding to the video clip group;

calculating the importance degree of each candidate keyword by adopting a preset statistical analysis method;

and selecting the candidate keywords with the highest importance as labels of the video segment groups, and displaying the labels of the video segment groups on a progress bar of the video abstract.

6. The method of generating according to claim 5, wherein the obtaining candidate keywords of the video clip group according to all bullet screen texts corresponding to the video clip group includes:

acquiring all barrage texts corresponding to the video segment group, and removing invalid barrage texts to obtain effective barrage texts corresponding to the video segment group;

word segmentation is carried out on the effective barrage texts corresponding to the video segment groups, and word sets after the word segmentation are obtained;

and screening the word set after word segmentation according to a preset stop word bank and a keyword bank to obtain the candidate keywords.

7. The method of generating according to claim 6, wherein the preset deactivated word stock and keyword stock are set according to the type of the source video.

8. The generation method according to claim 1, characterized in that the generation method further comprises:

obtaining effective barrage texts corresponding to all the key video clips, and performing duplicate removal processing to obtain residual barrage texts;

performing theme clustering on the residual barrage texts by adopting a preset clustering algorithm, and selecting the theme with the largest barrage number as a candidate theme;

and selecting the barrage meeting the preset condition from the barrages corresponding to the candidate subjects as the title of the video abstract.

9. The method of claim 8, wherein the predetermined condition includes that a presentation time of the bullet screen is earliest and a length of the bullet screen text satisfies a predetermined length.

10. An electronic device comprising a memory and a processor coupled to each other, the processor configured to execute program instructions stored in the memory to implement the method of generating a video summary according to any one of claims 1 to 9.

11. A computer readable storage medium having stored thereon program instructions, which when executed by a processor, implement the method of generating a video summary according to any of claims 1 to 9.