CN109729425B

CN109729425B - Method and system for predicting key segments

Info

Publication number: CN109729425B
Application number: CN201711021216.6A
Authority: CN
Inventors: 王往
Original assignee: Youku Network Technology Beijing Co Ltd
Current assignee: Youku Network Technology Beijing Co Ltd
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2021-05-18
Anticipated expiration: 2037-10-27
Also published as: CN109729425A

Abstract

The embodiment of the application discloses a method and a system for predicting key segments, wherein the method comprises the following steps: acquiring operation data and user behavior data of a played episode video; wherein the operation data is used for specifying theoretical key segments in the played episode video, and the user behavior data is used for determining target key segments in the played episode video, which are interesting to users; comparing the operation data with the user behavior data, and determining the actual key segments of the played episode video according to the comparison result; and determining the predicted key segments of the unreleased episode video according to the actual key segments of the played episode video. According to the technical scheme, the prediction accuracy of the unrecast episode can be improved.

Description

Method and system for predicting key segments

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method and a system for predicting a key segment.

Background

Currently, more and more episodes are played while shooting. These episodes will typically be followed by a part of the content, and the story run of the episodes not played will be decided according to the user's discussion about the episodes already played. For example, for scenarios with larger disputes, the focus of the user on the episode is increased because the scenarios with larger disputes attract the user's topics, and thus the emphasis of the user on the episode is generally increased in the unplayed episode.

Currently, when the discussion situation of a user for an already-played episode is collected, staff usually quickly browse comment information of each large website, screen out contents that are popular, and then collect the contents, so as to obtain a feedback opinion of the user for the already-played episode. However, in this way, by means of manual collection, on one hand, the efficiency is low, the period is long, and useful information may not be obtained in a short time; on the other hand, some users may be free to rate without viewing the episode, resulting in the feedback opinions that are gathered may be less accurate. Therefore, according to the current collection manner of the user information, the scenarios of the unplayed episode may not be agreed in time, and the scenarios of the unplayed episode may be inaccurate.

Disclosure of Invention

The embodiment of the application aims to provide a method and a system for predicting key segments, which can improve the prediction accuracy of an unrecast episode.

In order to achieve the above object, an embodiment of the present application provides a method for predicting a key segment, the method including: acquiring operation data and user behavior data of a played episode video; wherein the operation data is used for specifying theoretical key segments in the played episode video, and the user behavior data is used for determining target key segments in the played episode video, which are interesting to users; comparing the operation data with the user behavior data, and determining the actual key segments of the played episode video according to the comparison result; and determining the predicted key segments of the unreleased episode video according to the actual key segments of the played episode video.

In order to achieve the above object, an embodiment of the present application further provides a system for predicting a key segment, where the system includes: the data acquisition unit is used for acquiring operation data and user behavior data of the played episode video; wherein the operation data is used for specifying theoretical key segments in the played episode video, and the user behavior data is used for determining target key segments in the played episode video, which are interesting to users; the data comparison unit is used for comparing the operation data with the user behavior data and determining the actual key segments of the played episode video according to the comparison result; and the prediction unit is used for determining the prediction key segments of the unreleased episode video according to the actual key segments of the played episode video.

As can be seen from the above, according to the technical solution provided by the present application, operation data and user behavior data of a played episode video may be obtained first, where the operation data may represent key segments in the played episode considered by a scenario planner or an operator of a video playing website, and the user behavior data may be used to determine key segments in the played episode considered by a user. In the application, the two data can be compared, so that the actual key segments of the played episode video can be obtained comprehensively according to the two different data. Therefore, according to the actual key segments, the content concerned by the operator and the user can be obtained, and the key segments of the unplayed episode video can be effectively predicted. Therefore, the operation data and the user behavior data are analyzed, the data collected are improved, and on the other hand, the user behavior data can be left only by the user watching the video, so that the correctness of the collected data is ensured, and the accuracy of the prediction of the key segments in the video of the episode which is not played is further ensured.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a method for predicting key segments according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a comparison of time periods in an embodiment of the present application;

FIG. 3 is a flowchart illustrating the generation of a seal image according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a prediction system of a key segment in an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.

The application provides a method for predicting key segments, which can be applied to terminal equipment with a data processing function. The terminal device may be, for example, a desktop computer, a notebook computer, a tablet computer, a workstation, etc. In addition, the method can also be applied to a server of a video playing website. The server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 1, the method for predicting a key segment provided in the present application may include the following steps.

S1: acquiring operation data and user behavior data of a played episode video; wherein the operation data is used for specifying theoretical key segments in the played episode video, and the user behavior data is used for determining target key segments in the played episode video, which are interesting to users.

In the present embodiment, the content presented in the played episode video is typically arranged according to the scenario of the episode. The scenario is often written according to a certain rule, and scenario conflict events among key characters can be generated in the scenario. In this way, according to the content displayed in the scenario, the key segments capable of embodying scenario conflicts can be determined in the played episode video, and the time periods of the key segments in the played episode video can constitute the operation data. In addition, the operation data can be determined by the manager of the video playing website after the video content is quickly browsed. The operation data may include a plurality of time periods, and the video content corresponding to the time periods may be a designated theoretical key segment.

In this embodiment, the theoretical key segments specified by the operation data may be only key segments analyzed from a professional perspective, but the user may be influenced by actors, current events, moods and other factors when watching the episode, and therefore the segments of interest compared by the user during the watching process may not be consistent with the segments specified by the operation data. During the process of watching the video, the user usually generates user behavior data. The user behavior data may include at least one of sending bullet screen behavior data, making comment behavior data, and dragging video progress bar behavior data, for example. These data may each characterize a user's opinion of the scenario. For example, the number of the bulletin screens published by the user in a certain time period is particularly large, which indicates that the video content displayed in the certain time period is relatively concerned by the user. For another example, when the user watches the same video, the user repeatedly watches one of the segments, which indicates that the segment can arouse the user's interest.

In this embodiment, the operation data may be stored in a server of the video playback site after being determined from the scenario, or may be edited in the server in advance by a manager of the video playback site. The user behavior data may be collected from a client of the user. In this embodiment, the user behavior data may be obtained in a point-buried manner. Specifically, the page currently viewed by the user may include a plurality of interaction controls, which may be, for example, a "send" button for posting a bullet screen, or a slide button of a video progress bar. These interactive controls may be pre-bound with code for obtaining user behavior data. Once the interactive control is triggered, the code bound to it is executed. The code execution result may be to record the click or sliding operation of the user this time, and may also record the time node corresponding to the click or sliding operation. After the behavior data of the user are recorded and obtained by the client of the user, the behavior data can be sent to a server of a video playing website, so that the server can obtain the behavior data of the user.

In this embodiment, the operation data may include time periods specified by an editor or an administrator in advance, and the video content corresponding to the time periods may be specified standard key segments. And analyzing the user behavior data, the time period corresponding to the content which the user is interested in can also be determined. Specifically, a time period in which the number of barrages in the played episode video is greater than or equal to a specified number threshold may be determined, and a video segment corresponding to the time period may be used as a target key segment focused by the user. The specified quantity threshold value can be a fixed numerical value obtained by analyzing historical data of the bullet screens. In determining the time period, the video may be divided into a plurality of time periods at regular time intervals in advance. In this way, the number of the user-issued barrages in each time period can be counted, and the time period which is greater than or equal to the specified number threshold is used as the target key segment concerned by the user. In addition, time periods appearing in the comment information of the played episode video can be extracted, and video segments corresponding to the time periods can be used as the target key segments. When a user makes a comment in a comment area of a video, the time period of the comment may be mentioned. In this way, the time period appearing in the user comment information may be a time period in which the user is relatively interested, so that the time period may be taken as the target key segment. In addition, a time period with the watching times larger than or equal to a specified time threshold value can be determined according to the behavior data of the progress bar of the dragged video, and a video clip corresponding to the time period is taken as the target key clip. Specifically, when the user drags the progress bar, the client of the user can obtain the start time node and the end time node of the user dragging in a point burying manner, so that the time period in which the user is interested can be determined. In this way, for the same time period or a time period with a time gap smaller than or equal to a specified time difference threshold, the number of times of watching by the user may be counted, and when the number of times of watching is larger than or equal to the specified number of times threshold, it may be indicated that the user is interested in the content of the time period, so that the video segment corresponding to the time period may be taken as the target key segment.

S3: and comparing the operation data with the user behavior data, and determining the actual key segments of the played episode video according to the comparison result.

In the present embodiment, in order to enable the follow-up scenario not to deviate from the main line of the scenario and to satisfy the interest of the user, the operation data and the user behavior data may be combined to perform a comprehensive analysis. In particular, the operational data and the user behavior data may be compared. The operation data and the user behavior data may both include time periods, so that when the operation data and the user behavior data are compared, a first time period corresponding to the theoretical key segment and a second time period corresponding to the target key segment may be determined, and the first time period and the second time period may be compared. It should be noted that the number of the first time period and the second time period may be multiple, the first time period may be used to characterize the time period determined based on the theoretical key segment, and the second time period may be used to characterize the time period determined based on the target key segment. Referring to fig. 2, when comparing the first time period and the second time period, a time period in which the first time period and the second time period overlap may be determined, and the time period overlaps, which indicates that the overlapped content both meets the trend of the scenario and meets the interest of the user, so that the video segment corresponding to the overlapped time period may be used as the actual key segment of the played episode video.

S5: and determining the predicted key segments of the unreleased episode video according to the actual key segments of the played episode video.

In this embodiment, the determined actual key segments may be used to predict the plot trend of the unplayed episode. Specifically, the content of the actual key segment may be identified to determine the content characteristics corresponding to the actual key segment. In practical applications, the promotion of a scenario is usually performed by a character in the scenario. Therefore, when the content of the actual key segment is identified, the face of the person appearing in the actual key segment can be identified through the face recognition technology. In this way, the name of the person corresponding to the face of the person can be used as the content feature corresponding to the actual key segment.

In this embodiment, for a episode, a lead actor and a cast in the episode may be acquired in advance, and then the pre-acquired cast of the lead actor may be used as a preset facial makeup sample, and after a facial makeup of a person is identified from the actual key segments, the facial makeup of the person identified from the actual key segments may be compared with the preset facial makeup sample to calculate a similarity between the identified facial makeup of the person and the preset facial makeup sample. Specifically, the identified human facial makeup and the preset facial makeup sample can be represented by digitized feature vectors. The feature vector may be constructed based on pixel values of pixels in the face picture. The pixel value may be a numerical value within a specified interval. For example, the pixel value may be any one of values 0 to 255. The magnitude of the value may indicate the shade of the color. In this embodiment, the pixel values of the pixel points in the face image may be obtained, and the feature vector of the face image may be formed by the obtained pixel values. For example, in the case of a face image having 9 × 9-81 pixels, pixel values of the pixels are sequentially obtained, and the obtained pixel values are sequentially arranged in an order from left to right and from top to bottom, thereby forming an 81-dimensional vector. The 81-dimensional vector can be used as a feature vector of the face image. In this embodiment, the feature vector may be constructed based on CNN (Convolutional Neural Network) features of the face image. Specifically, the face image may be input into a convolutional neural network, and then the convolutional neural network may output a feature vector corresponding to the face image. In this embodiment, the similarity between two feature vectors can be calculated by calculating a vector angle or a Pearson correlation coefficient. The similarity of the identified human facial makeup can be calculated with each preset facial makeup sample, so that after the similarity is calculated, the greater the similarity is, the more similar the human facial makeup is to the preset facial makeup sample. In this way, the target preset facial makeup sample with the largest similarity can be determined, and the name of the person corresponding to the target preset facial makeup sample is used as the content feature corresponding to the actual key segment.

In this embodiment, after determining the content features corresponding to the actual key segments, the content features may be clustered according to names of people included in the content features to obtain at least one feature set. Specifically, content features containing the same names of people may be classified into the same feature set. Considering that more than one person may exist in some actual key segments, the content features obtained by identifying the actual key segments may include names of a plurality of persons. Thus, when clustering is performed, the names of the multiple persons can be considered as a whole, and only when the names of the multiple persons also exist in another content feature, the two content features are divided into the same feature set. Thus, after at least one feature set is obtained by clustering, the more content features contained in the feature set, the higher the possibility that the person in the content features is as a main person is. Therefore, a target feature set containing the most content features can be determined, and the names of target persons contained in the content features in the target feature set can be extracted. For example, if a feature set contains a maximum of 10 content features, and the 10 content features each contain the name of a person of "lie four", it may be determined that the scenario related to "lie four" is the main scenario of the episode. In subsequent unplayed episodes, the scenario containing the target character should also be a trend of interest to the user and the scenario. Then the scenario containing the target person may be used as a predicted key segment of the unplayed episode video.

In one embodiment, after predicting a main scenario that both fits the scenario trend and satisfies the user's interest, the video playback website may make a cover image associated with the main scenario and attract the user through the cover image. In this embodiment, the user behavior data may be combined to generate a cover image to be presented to the user, taking into account that different users may be attracted to different cover images.

Specifically, referring to fig. 3, the present embodiment may include the following two steps when generating the cover image.

S7: and extracting picture frames matched with the user behavior data from the actual key fragments.

In this embodiment, in order to make a cover image that meets the user's needs, a frame that matches the user behavior data may be extracted from the actual key clip. Wherein, the frame matched with the user behavior data may refer to that the extracted frame contains the content of interest to the user. In particular, visual features may be included in the user behavior data. The visual features may be used to characterize target objects appearing in video content of interest to the user. For example, the visual features may be a character face, a character action, and the like. The face of the character can represent actors concerned by the user, and the actions of the character can represent the gestures (dancing, fighting and the like) concerned by the user. For example, a certain user often watches entertainment shows during the last week, and among the entertainment shows, the dance shows of Zhang III, Ming. Thus, by analyzing the user behavior data in the period of time, the visual characteristics like Zhang three + dance can be obtained. As such, a picture frame that matches the user behavior data may refer to a picture frame that includes the visual feature in the current picture. For example, if the user prefers the visual feature "zhangsan + dance", the frame of the picture containing the scene where zhangsan dances can be extracted from the actual key segment.

In this embodiment, since the number of the picture frames including the visual features may be large, the picture frames may be further filtered after being extracted. The screening principle may be to screen out relatively clear and complete picture frames containing visual features. In particular, the respective visual features may be associated with respective decision strategies. The decision strategy is used to define the modality in which the visual feature is presented in the screen. For example, for a face of a person, the face of the person appearing in the screen may have multiple forms. For example, the character face may be facing the viewer either frontally or sideways. In order to enable the face of the person in the final cover image to have a high recognition degree, the determination strategy may define an effective rotation range corresponding to the face of the person, and the effective rotation range may include a plurality of rotation angles. The rotation angle may be a combination of azimuth and pitch. For example, with the face of the person facing the viewer in a frontal direction, in which case the face of the person is rotated at 0 ° pitch and 0 ° azimuth, the decision strategy may define an effective range of rotation that is between +45 ° and-45 ° pitch and between +45 ° and-45 ° azimuth.

In this embodiment, after extracting the screen frames from the actual key snippets, the content of the screen frames may be determined, and the target screen frames whose content meets the determination policy may be identified from the screen frames. In particular, the visual characteristics may include a character face, and as described above, a decision policy associated with the character face may be used to define an effective rotation range corresponding to the character face; wherein the effective rotation range includes a plurality of rotation angles. In this embodiment, the rotation angle may be associated with a face template. The face template may be a simplified facial makeup in which the outline of the five sense organs may be highlighted, while other details are ignored. The face template may be used to determine the orientation of a human face. In this way, when a target picture frame whose content meets the determination policy is determined from the picture frames, the face of the person shown in the picture frames can be detected. Specifically, the face of the person in the picture frame can be recognized through a mature face recognition algorithm, and then the similarity between the face of the person in the picture frame and the face template can be calculated. In the present embodiment, the face of the person and the face template that are recognized can be represented by digitized feature vectors. The feature vector may be constructed based on pixel values of pixels in the face picture. The pixel value may be a numerical value within a specified interval. For example, the pixel value may be any one of values 0 to 255. The magnitude of the value may indicate the shade of the color. In this embodiment, the pixel values of the pixel points in the face image may be obtained, and the feature vector of the face image may be formed by the obtained pixel values. For example, in the case of a face image having 9 × 9-81 pixels, pixel values of the pixels are sequentially obtained, and the obtained pixel values are sequentially arranged in an order from left to right and from top to bottom, thereby forming an 81-dimensional vector. The 81-dimensional vector can be used as a feature vector of the face image. In this embodiment, the feature vector may be constructed based on CNN (Convolutional Neural Network) features of the face image. Specifically, the face image may be input into a convolutional neural network, and then the convolutional neural network may output a feature vector corresponding to the face image.

In the present embodiment, the similarity between the face of the identified person and the face template is obtained by calculating a vector angle or Pearson correlation coefficient between two feature vectors. Specifically, the similarity between the face of the person obtained by recognition and each face template can be calculated in sequence, so that a plurality of similarities can be obtained. The greater the similarity, the more similar the face of the person is to the face template. In this way, when the similarity greater than or equal to the specified threshold exists in the calculated similarities, which indicates that the face of the identified person is similar to one or more face templates, the picture frame containing the face of the person can be used as the target picture frame conforming to the judgment strategy.

In this embodiment, the visual features may further include human actions mainly embodied in the placing positions of the head and the limbs of the human. In particular, a decision policy associated with the person's actions may be used to define an action template presented by the person. The action template may reflect the activity the character is currently engaged in. For example, the motion template may be a dance, combat, or a simplified diagram of some fixed gesture. In the action template, the face of the person can be ignored, and the positions of the head, the limbs and the trunk of the person are mainly reflected. In this way, when a target picture frame whose content meets the judgment policy is determined from the picture frames, the action exhibited by the person in the picture frame can be identified. Specifically, the person as a whole can be recognized from the picture frame by the currently mature person capturing technology. Thus, the recognized image can include the motion of the person. After the character action is recognized, whether the recognized action is contained in the action template or not can be judged, if yes, the recognized action is the action which is interested by the user, and therefore the picture frame can be used as the target picture frame which accords with the judgment strategy. Specifically, when determining whether the recognized motion is included in the motion template, the motion template may be digitized in advance. For example, the motion template may be divided into a head, a trunk, and four limbs, wherein each limb of the four limbs may be divided into an upper half limb and a lower half limb, so that 10-dimensional information may be generated. Based on where each physical entity is located, corresponding values can be set for these 10 dimensions, so that a vector of dimension 10 can be obtained. In this way, a vector with a corresponding dimension of 10 can be generated for the recognized motion or a preset motion template. Whether the identified motion is similar to the motion template can be judged by calculating a vector included angle or a Pearson correlation coefficient, so that whether the identified motion is contained in the motion template can be judged.

S9: and generating a cover image of the played episode video based on the extracted picture frame.

In the present embodiment, after the picture frame is extracted, a cover image of the played episode video may be generated based on the extracted picture frame. Specifically, if the number of extracted frame frames is one frame, the frame can be directly used as a cover image of the played episode video. If the number of the extracted picture frames is at least two, the area image containing the visual features can be cut out from each picture frame, and a plurality of cut area images are integrated into one picture frame. For example, two frames of pictures are currently extracted, wherein one frame includes a scene of three dancing and the other frame includes a scene of four singing. At this time, the region image of the dance of three and the region image of the singing of the lie four can be respectively cut out from the two frames of pictures, and then the two cut out region images can be spliced into one frame of picture. In this way, the one-frame picture obtained by integration may be finally used as a cover image of the played episode video.

It should be noted that, because different user behavior data may be different, after the video playing website receives page loading requests sent by different users, the processed cover images may also be different. Therefore, for the user a and the user B having different user behavior data, the cover image of the same episode may be seen differently in the current page. Therefore, the technical scheme provided by the application can provide different cover images for different users, so that the cover images can be dynamically adjusted according to different identities of the currently logged-in users.

The present application also provides a system for predicting a key segment, please refer to fig. 4, which includes the following units.

The data acquisition unit is used for acquiring operation data and user behavior data of the played episode video; wherein the operation data is used for specifying theoretical key segments in the played episode video, and the user behavior data is used for determining target key segments in the played episode video, which are interesting to users.

And the data comparison unit is used for comparing the operation data with the user behavior data and determining the actual key segments of the played episode video according to the comparison result.

And the prediction unit is used for determining the prediction key segments of the unreleased episode video according to the actual key segments of the played episode video.

In one embodiment, the data comparison unit includes:

a time period determining module, configured to determine a first time period corresponding to the theoretical key segment and a second time period corresponding to the target key segment, and compare the first time period and the second time period;

and the key segment determining module is used for determining a time segment in which the first time segment and the second time segment are overlapped, and taking a video segment corresponding to the overlapped time segment as an actual key segment of the played episode video.

In one embodiment, the prediction unit comprises:

a content feature determination module, configured to identify content of the actual key segment to determine a content feature corresponding to the actual key segment;

the clustering module is used for clustering the content characteristics according to the names of the characters contained in the content characteristics to obtain at least one characteristic set;

the extraction module is used for determining a target feature set containing the most content features and extracting the names of target persons contained in the content features in the target feature set;

and the prediction module is used for taking the scenario containing the target person as a prediction key segment of the unplayed episode video.

In one embodiment, the system further comprises:

the picture frame extraction unit is used for extracting picture frames matched with the user behavior data from the actual key fragments;

a cover image generating unit for generating a cover image of the played episode video based on the extracted picture frame.

The specific functions implemented by each unit module of the prediction system of the key segment provided in the embodiment of the present specification may be explained in comparison with the foregoing embodiments in the present specification, and can achieve the technical effects of the foregoing embodiments, and thus, no further description is provided here.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbyscript Description Language (vhr Description Language), and the like, which are currently used by Hardware compiler-software (Hardware Description Language-software). It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

Those skilled in the art will also appreciate that instead of implementing the critical section prediction system in pure computer readable program code, the same functionality can be implemented entirely by logically programming method steps such that the critical section prediction system is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a critical segment prediction system may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, embodiments of the prediction system for key fragments may be explained with reference to the introduction of embodiments of the method described above.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Although the present application has been described in terms of embodiments, those of ordinary skill in the art will recognize that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims

1. A method for predicting a key segment, the method comprising:

acquiring operation data and user behavior data of a played episode video; wherein the operation data is used for specifying theoretical key segments in the played episode video, and the user behavior data is used for determining target key segments in the played episode video, which are interesting to users;

comparing the operation data with the user behavior data, and determining the actual key segments of the played episode video according to the comparison result; the actual key segment is a video segment corresponding to a time period in which the theoretical key segment is overlapped with the target key segment;

determining a predicted key segment of the unplayed episode video according to the actual key segment of the played episode video; wherein, include: identifying the content of the actual key segment to determine the content characteristic corresponding to the actual key segment; clustering the content characteristics according to names of people contained in the content characteristics to obtain at least one characteristic set; determining a target feature set containing the most content features, and extracting the name of a target person contained in the content features in the target feature set; and taking the scenario containing the target person as a prediction key segment of the unplayed episode video.

2. The method of claim 1, wherein the user behavior data comprises: and sending at least one of the bullet screen behavior data, the comment behavior data and the behavior data of the progress bar of the dragged video.

3. The method of claim 1 or 2, wherein determining target key segments of user interest in the played episode video comprises:

determining a time period in which the number of barrages in the played episode video is greater than or equal to a specified number threshold, and taking a video segment corresponding to the time period as the target key segment;

or

Extracting time periods appearing in the comment information of the played episode video, and taking video segments corresponding to the time periods as the target key segments;

or

And determining a time period with the watching times larger than or equal to a specified time threshold according to the behavior data of the progress bar of the dragged video, and taking a video clip corresponding to the time period as the target key clip.

4. The method of claim 1, wherein comparing the operational data and user behavior data comprises:

determining a first time period corresponding to the theoretical key segment and a second time period corresponding to the target key segment, and comparing the first time period with the second time period;

accordingly, determining the actual key segments of the played episode video includes:

determining a time period in which the first time period and the second time period overlap, and taking a video segment corresponding to the overlapped time period as an actual key segment of the played episode video.

5. The method of claim 1, wherein determining the content feature corresponding to the actual key segment comprises:

comparing the figure facial makeup identified from the actual key fragments with a preset facial makeup sample to calculate the similarity between the identified figure facial makeup and the preset facial makeup sample;

determining a target preset facial makeup sample with the maximum similarity;

and taking the name of the figure corresponding to the target preset facial makeup sample as the content characteristic corresponding to the actual key fragment.

6. The method of claim 1, wherein clustering the content features according to names of people included in the content features comprises:

and dividing the content features containing the same character name into the same feature set.

7. The method of claim 1, further comprising:

extracting picture frames matched with the user behavior data from the actual key fragments;

and generating a cover image of the played episode video based on the extracted picture frame.

8. The method of claim 7, wherein visual features are included in the user behavior data; accordingly, the picture frame matched with the user behavior data includes: and the current picture comprises the picture frame of the visual characteristic.

9. The method of claim 8, wherein the visual characteristic is further associated with a decision policy; accordingly, after extracting the picture frame matching the user behavior data from the actual key segment, the method further comprises:

judging the content of the picture frames, and determining a target picture frame of which the content meets the judgment strategy from the picture frames;

and generating a cover page image of the played episode video based on the target picture frame.

10. The method of claim 8, wherein the visual features include a face of a person; correspondingly, the judgment strategy associated with the face of the person is used for limiting the effective rotating range corresponding to the face of the person; wherein the effective rotation range comprises a plurality of rotation angles, and the rotation angles are associated with the face template.

11. The method of claim 10, wherein determining a target picture frame from the picture frames whose content meets the decision policy comprises:

detecting the face of the person shown in the picture frame, and calculating the similarity between the face of the person in the picture frame and the face template;

and when the similarity which is greater than or equal to a specified threshold exists in the calculated similarities, taking the picture frame as a target picture frame which accords with the judgment strategy.

12. The method of claim 8, wherein the visual characteristics include a character action, and wherein a decision policy associated with the character action is used to define an action template presented by the character;

accordingly, determining a target picture frame whose content meets the decision policy from among the picture frames includes:

and identifying the motion shown by the character in the picture frame, judging whether the identified motion is contained in the motion template, and if so, taking the picture frame as a target picture frame conforming to the judgment strategy.

13. The method of claim 8, wherein if the number of extracted frame of the scene is at least two, generating a cover image of the played episode video based on the extracted frame of the scene comprises:

cutting out a region image containing visual features from each picture frame, and integrating a plurality of cut region images into one picture frame;

and taking the frame of picture obtained by integration as a cover image of the played episode video.

14. A system for predicting a key segment, the system comprising:

the data acquisition unit is used for acquiring operation data and user behavior data of the played episode video; wherein the operation data is used for specifying theoretical key segments in the played episode video, and the user behavior data is used for determining target key segments in the played episode video, which are interesting to users;

the data comparison unit is used for comparing the operation data with the user behavior data and determining the actual key segments of the played episode video according to the comparison result; the actual key segment is a video segment corresponding to a time period in which the theoretical key segment is overlapped with the target key segment;

the prediction unit is used for determining the prediction key segments of the unreleased episode video according to the actual key segments of the played episode video;

wherein the prediction unit includes: a content feature determination module, configured to identify content of the actual key segment to determine a content feature corresponding to the actual key segment; the clustering module is used for clustering the content characteristics according to the names of the characters contained in the content characteristics to obtain at least one characteristic set; the extraction module is used for determining a target feature set containing the most content features and extracting the names of target persons contained in the content features in the target feature set; and the prediction module is used for taking the scenario containing the target person as a prediction key segment of the unplayed episode video.

15. The system of claim 14, wherein the data comparison unit comprises:

16. The system of claim 14, further comprising: