WO2020143156A1

WO2020143156A1 - Hotspot video annotation processing method and apparatus, computer device and storage medium

Info

Publication number: WO2020143156A1
Application number: PCT/CN2019/088957
Authority: WO
Inventors: 刘建华; 徐小方
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-01-11
Filing date: 2019-05-29
Publication date: 2020-07-16
Also published as: CN109819325A; CN109819325B

Abstract

Disclosed in the present application are a hotspot video annotation processing method and apparatus, a computer device, and a storage medium. The method comprises: obtaining a recorded video of a user collected while a client plays back an original video, the original video comprising at least one frame of an original video image, and the recorded video comprising at least one frame of an image to be identified; using a micro-expression recognition model to identify each image to be identified; and obtaining instantaneous emotion values corresponding to the images to be identified; according to the instantaneous emotion values, determining the intense emotion probability of an original video image corresponding to a playback timestamp; if the intense emotion probability is greater than a first probability threshold, determining the original video image to be a hotspot video image; and on the basis of the hotspot video image, performing hotspot annotation on the original video to obtain hotspot video clips. The described method may achieve the automatic annotation of hotspot video clips and improve the efficiency of annotating the hotspot video clips.

Description

Hotspot video annotation processing method, device, computer equipment and storage medium

This application is based on the Chinese invention application filed on January 11, 2019, with the application number 201910025355.9, titled "Hot Spot Video Annotation Processing Methods, Devices, Computer Equipment, and Storage Media", and claims its priority.

Technical field

The present application relates to the technical field of micro-expression recognition, in particular to a hotspot video annotation processing method, device, computer equipment, and storage medium.

Background technique

In the mobile Internet, video (especially online video) is the largest and fastest growing type of mobile data traffic. The so-called online video refers to an audio-visual file provided by an online video service provider (for example, Baidu iQiyi), using streaming media as a playback format, and can be broadcasted online or on demand. Network video generally requires an independent player, and the file format is mainly based on the P2P (Peer to Peer, peer-to-peer) technology that takes up less FLV (Flash Video, streaming media) format of client resources.

For smartphone users, they can watch video streams, movies, TV shows, clips made by users themselves, and video calls in both mobile network environments and Wi-Fi environments. In order to maintain the stickiness of video users, most video applications have added social elements, geographic information, and business forms based on personalized recommendations. In the prior art, when a user views a video while watching a hotspot, real-time reviews and sharing of the video content viewed by the user are realized. This method of manually labeling a hotspot is relatively inefficient. With the continuous development of terminal technology and video website design technology, people have higher requirements for video, in order to meet the increasing personalization and convenience of people in the process of watching videos. Traditional network video service providers usually need to configure special editors to manually label attribute labels for different segments of film and television works, and edit and push them based on the attribute labels. This method of manually labeling the attribute tags of the original video clips and pushing them has low efficiency and insufficient push accuracy, which is far from meeting the needs of personalization and convenience.

Summary of the invention

Embodiments of the present application provide a hotspot video annotation processing method, device, computer equipment, and storage medium, to solve the problem of low efficiency in the current manual annotation of original video segment attributes.

A hotspot video annotation processing method, including:

Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;

Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;

Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;

If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;

Hotspot the original video based on the hotspot video image to obtain hotspot video clips.

A hotspot video annotation processing device, including:

Recorded video acquisition module: used to acquire the user's recorded video collected by the client while playing the original video. The original video includes at least one frame of original video image. The recorded video includes at least one frame of image to be recognized. The recording timestamp of the image to be identified is associated with a playback timestamp of the original video image;

Instantaneous emotion value acquisition module: used to identify each of the images to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion values corresponding to the images to be recognized;

Intense emotion probability determination module: used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;

Hotspot video image determination module: used to determine the original video image as a hotspot video image if the intense emotion probability is greater than a first probability threshold;

Hotspot video clip acquisition module: hotspot the original video based on the hotspot video image to obtain hotspot video clips.

A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:

One or more non-volatile readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, Causing the one or more processors to perform the following steps:

The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings, and claims.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings required in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application For those of ordinary skill in the art, without paying creative labor, other drawings can also be obtained based on these drawings.

1 is a schematic diagram of an application environment of a hotspot video annotation processing method in an embodiment of the present application;

2 is a flowchart of a hotspot video annotation processing method in an embodiment of the present application;

3 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;

4 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;

5 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;

6 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;

7 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;

8 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;

9 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;

10 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;

11 is a schematic diagram of a hotspot video annotation processing device in an embodiment of the present application;

12 is a schematic diagram of a computer device in an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the scope of protection of the present application.

The hotspot video annotation processing method provided by the embodiment of the present application can be applied in the application environment shown in FIG. 1. Specifically, the hotspot video annotation processing method is applied in a video playback system. The video playback system includes a client and a server as shown in FIG. 1, and the client and the server communicate through a network to implement hotspot video clips of the original video Automatic tagging improves the efficiency of hotspot video clip annotation, and implements personalized recommendation and sorting display of hotspot video clips. Among them, the client is also called the user, which refers to the program corresponding to the server to provide local services for the client. The client can be installed on but not limited to various personal computers, notebook computers, smart phones, tablets and portable wearable devices. The server can be implemented by an independent server or a server cluster composed of multiple servers.

In an embodiment, as shown in FIG. 2, a hotspot video annotation processing method is provided. The method is applied to the server in FIG. 1 as an example for illustration, including the following steps:

S201: Obtain the user's recorded video collected while the client plays the original video. The original video includes at least one original video image, and the recorded video includes at least one image to be recognized. The recording timestamp of each image to be recognized and an original Video image playback timestamp correlation.

The original video refers to a video played by a video playback program (that is, a client) installed on a terminal device such as a user's mobile phone and computer, for viewing by the user. Recorded video refers to real-time shooting of the user's facial expression changes while watching the original video through the shooting module (such as a built-in camera) of the terminal device installed with the video playback program. The original video includes at least one frame of original video image, and the original video image is a single frame image forming the original video, that is, a single image frame of the smallest unit in the original video. Each original video image carries a playback timestamp, which is the timestamp of the original video image in the original video, for example, the playback timestamp of the 100s original video image in the 10min original video is 100s. The recorded video includes at least one frame of image to be recognized, and the image to be recognized is a single frame image that forms the recorded video, that is, a single image screen of the smallest unit in the recorded video. Each image to be recognized corresponds to a recording timestamp, which is the timestamp of the image to be recognized in the recorded video, for example, the playback timestamp of the 100s-th image to be recognized in the 10-min recorded video is 100s. The recording timestamp is associated with the playback timestamp carried by the original video image, so that the image to be recognized corresponds one-to-one with the original video image, which is convenient for accurately determining the user's emotion when watching the original video.

Specifically, each original video carries a unique video identifier, which is used to uniquely identify the corresponding original video, for example, the original video corresponding to episode XX of "XX", carries a unique video identifier XX0001, so that the server can The video ID is XX0001, and the original video corresponding to episode XX of the corresponding "XX" can be obtained. The playback timestamp carried by each original video image is the timestamp of the original video image in the original video. In this embodiment, while receiving the same original video played by the client, the server acquires a recorded video corresponding to the change in the expression of the original video watched by all users through a shooting module (such as a built-in camera) installed in the terminal device of the client, The recorded video includes at least one frame of image to be identified, and each image to be identified corresponds to a recording time stamp, which is associated with the playback time stamp carried by the original video image. Understandably, by collecting the recorded video when different users watch the original video, it can better determine whether the original video attracts the audience, thereby helping to automatically mark the hot video segments in the original video and improve the hot video The efficiency of segment annotation.

In a specific embodiment, obtaining the user's recorded video collected while the client plays the original video includes: (1) controlling the client to play the original video so that the playback timestamp of each original video image in the original video is Current system time association. (2) Obtain the user's recorded video collected while the client plays the original video, so that the recording timestamp of each image to be identified in the recorded video is associated with the current system time. (3) Based on the current system time, associate the recording timestamp of each image to be identified with the playback timestamp of an original video image. Among them, the current system time is the current time of the system at any moment, for example, the current system time can be obtained by the currentTimeMillis method in the System class. Generally speaking, if the playback time of the original video is synchronized with the recording time of the recorded video, the playback timestamp of the original video corresponds to the recording timestamp of the recorded video, that is, the first frame of the original video image corresponds to the first frame of the image to be identified, So that the image to be recognized can reflect the micro expression of the user when viewing the corresponding original video image. Correspondingly, if the playback time of the original video is not synchronized with the recording time of the recorded video, it is necessary to correlate the playback timestamp of the original video with the recording timestamp of the recorded video through the current system time, so that the associated image to be recognized can be reflected The micro expression of the user when viewing the corresponding original video image. For example, after the first minute of playing the original video, if the user agrees and starts to record the recorded video, the time of the original video playback and the recorded video is related to the current system time, that is, if the first video is played at 10:5:10 1000 frames of the original video image, and the 10th frame of the image to be recognized is recorded at 10:5:10, the playback timestamp of the 1000th frame of the original video image is associated with the 10th frame of the image to be recognized.

S202: Recognize each image to be recognized using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the image to be recognized.

Among them, the micro-expression recognition model is a model for recognizing the micro-expression of the human face in the image to be recognized. In this embodiment, the micro-expression recognition model is to capture the local features of the user's face in the image to be recognized, and determine each target facial action unit of the human face in the image to be recognized according to the local features, and then according to the recognized target face The action unit determines the model of its micro-expression. The instantaneous emotion value corresponding to the image to be recognized is the emotion value corresponding to the micro-expression type of the face in a certain image to be recognized by using the micro-expression recognition model. Specifically, the server first uses a micro-expression recognition model to perform micro-expression recognition on each image to be identified to determine its corresponding micro-expression type, and then queries the emotion value comparison table according to the micro-expression type to obtain the corresponding Instant mood value. The micro-expression types include, but are not limited to, love, interest, surprise, expectation... aggressiveness, conflict, insult, suspicion, and fear. Based on the micro-expression type, the instantaneous emotion value of the face in the image to be recognized is obtained. The micro-expression recognition model can quickly obtain the instantaneous emotion value when different users watch each original video image in the same original video, so as to analyze the hot video segment based on the instantaneous emotion value, so as to achieve the purpose of automatically tagging the hot video segment.

Specifically, the micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on a local binary pattern (LBP). Among them, the micro-expression recognition model is a local recognition model based on classification. When the micro-expression recognition model is pre-trained, a large amount of training image data is collected in advance. The training image data includes positive samples of each facial action unit and facial action unit. Negative samples are used to train the training image data through classification algorithms to obtain a micro-expression recognition model. In this embodiment, a large amount of training image data may be trained through an SVM classification algorithm to obtain SVM classifiers corresponding to multiple facial action units. For example, it may be 39 SVM classifiers corresponding to 39 face action units, or 54 SVM classifiers corresponding to 54 face action units, and the positive samples of different face action units included in the training image data for training The more negative samples, the more SVM classifiers are obtained. Understandably, in multiple micro-expression recognition models formed by multiple SVM classifiers, the more SVM classifiers it acquires, the more accurate the micro-expression types recognized by the formed micro-expression recognition model. Take the micro-expression recognition model formed by the SVM classifier corresponding to 54 facial action units as an example. Using this micro-expression recognition model, 54 types of micro-expressions can be identified, for example, including love, interest, surprise, expectation ... 54 types of micro-expressions such as aggression, conflict, insult, doubt and fear.

S203: Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp.

Among them, the intense emotion probability is a probability for evaluating the motivated emotion of different to-be-recognized images watching the same original video. Understandably, if the probability of intense emotion is high, it means that the user's mood for watching the original video fluctuates greatly, and the original video has a strong attraction to the user. Specifically, the server first obtains the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the playback timestamp according to the playback timestamp corresponding to each original video image, so as to obtain all the viewing time corresponding to the playback timestamp The instantaneous emotion value of the user of the original video image, based on the instantaneous emotion value of each image to be identified, determines whether it is intense emotion, thereby analyzing the probability of intense emotion when all users watch the original video image, so that the probability of intense emotion can be objective Reflects the degree to which the user watching the same original video likes the original video or the degree of resonance.

S204: If the intense emotion probability is greater than the first probability threshold, determine the original video image as a hot video image.

The first probability threshold is a preset probability threshold for evaluating whether the original video is a hot video image. In this embodiment, the preset probability threshold may be set to 60%. If the intense emotion probability is greater than the first probability threshold, it means that a large percentage of all users who viewed the original video image (that is, greater than the first probability threshold) caused strong emotional fluctuations in watching the original video image ( That is, the emotion corresponding to the instantaneous emotion value is intense emotion), which has a higher attraction to the user, so the original video image can be determined as a hot video image.

S205: Hot-spot the original video based on the hot-spot video image to obtain hot-spot video clips.

Specifically, after acquiring all the hotspot video images in the original video, the server may form an original video segment based on any two hotspot video images, and then based on the total number of frames of all the original video images in the original video segment and the preset frame number threshold Make a comparison to determine whether the original video clip is a hot video clip, automatically mark the original video image corresponding to the hot video image, and mark the hot video clip in the original video to realize the hot video in the original video The automatic labeling of clips improves the annotation efficiency of hot video clips.

The hotspot video annotation processing method provided in this embodiment collects the user's recorded video while playing the original video, so that the recording timestamp of each image to be identified is associated with the playback timestamp of an original video image to ensure The objectivity of micro expression analysis of the original video. Then, the micro-expression recognition model is used to recognize the image to be recognized, and the micro-expression recognition model can quickly identify the micro-expression when the user views an original video image in the original video to obtain the intense emotion value of the user watching the original video, so as to be based on the intense The emotion value realizes the hotspot video annotation, thereby ensuring the objectivity of the hotspot video clip annotation. Then, based on the instantaneous emotion values of the to-be-recognized images corresponding to all the recording timestamps associated with the same playback timestamp, determine the intense emotion probability of the original video image corresponding to the playback timestamp, so as to determine whether it is a hot video image, so as to realize the original The hotspot annotation of the video is subdivided into hotspot analysis of the original video image to ensure the objectivity and accuracy of the hotspot analysis. Finally, based on the hotspot video image, the original video is hotspot annotated, and hotspot video clips are obtained to calculate the probability of intense emotion when the user watches the original video, so that the server can obtain hotspot video clips, so that the hotspot video clips are automatically marked, and the hotspot video clips are marked up. Efficiency and accuracy provide users with a better viewing experience.

In an embodiment, as shown in FIG. 3, in step S202, a micro-expression recognition model is used to identify each image to be recognized, and the instantaneous emotion value corresponding to the image to be recognized is obtained, including:

S301: Recognize each image to be recognized by using a micro-expression recognition model to obtain the instantaneous probability corresponding to at least one type of recognized expression.

The recognition expression type refers to a model that recognizes that it belongs to a certain pre-configured micro expression type when the image to be recognized is recognized by using a micro expression recognition model.

Specifically, the micro-expression recognition model pre-trained by the server includes multiple SVM classifiers, and each SVM classifier is used to identify a facial action unit. In this embodiment, the micro-expression recognition model includes 54 SVM classifiers to establish a facial action unit number mapping table, and each facial action unit is represented by a predetermined number. For example, AU1 is the inner eyebrow lift, AU2 is the outer eyebrow lift, AU5 is the upper eyelid lift, and AU26 is the lower jaw opening. Each facial action unit has a corresponding SVM classifier trained. For example, the SVM classifier corresponding to the inner eyebrow can be identified as the probability that the inner feature of the inner eyebrow is belonged to the inner eyebrow, and the SVM classifier corresponding to the outer eyebrow can be identified that the local feature of the outer eyebrow is belonged to the outer eyebrow. Probability values, etc.

In this embodiment, when the server uses a pre-trained micro-expression recognition model to recognize the image to be recognized, it may first perform key point detection and feature extraction on each image to be recognized to obtain local features of the image to be recognized. Among them, the face key point algorithm can be, but not limited to, Ensemble of Regression Tress (ERT) algorithm, SIFT (scale-invariant feature) transform algorithm, SURF (Speeded UpRobust Features) algorithm, LBP (Local Binary Patterns) algorithm and HOG (Histogram of Oriented Gridients) algorithm. The feature extraction algorithm may be a CNN (Convolutional Neural Network) algorithm. Then input the local features into multiple SVM classifiers, identify all the local features of the input by the multiple SVM classifiers, and obtain the probability values corresponding to the facial action unit output by the multiple SVM classifiers, and convert the probability The facial action unit corresponding to the SVM classifier whose value is greater than the preset threshold is determined as the target facial action unit. The target facial action unit refers to the facial action unit (Action Unit, AU) obtained by recognizing the image to be recognized according to the micro-expression recognition model. The probability value may specifically be a value between 0-1. If the output probability value is 0.6 and the preset threshold value is 0.5, then the probability value 0.6 is greater than the preset threshold value 0.5, and the facial action unit corresponding to 0.6 is used as the image to be recognized Target facial action unit. Finally, all the acquired target facial action units are comprehensively evaluated to obtain the probability corresponding to the micro-expression type pre-configured in the micro-expression recognition model, that is, the instantaneous probability belonging to each type of recognized expression. The comprehensive evaluation of all the acquired target facial action units specifically refers to obtaining the probability that this combination belongs to a pre-configured micro-expression type based on the combination of all target facial action units to determine the instantaneous probability of identifying the expression type.

S302: Determine the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized.

Specifically, after recognizing that each image to be recognized belongs to the instantaneous probability of at least one recognized expression type, the recognized expression type with the largest instantaneous probability needs to be determined as the micro expression type corresponding to the image to be recognized. For example, when it is recognized that the image to be recognized belongs to the recognition expression type of "love", the instantaneous probability is 0.9, while the instantaneous probability of the two recognition expression types of "doubt" and "quiet" are 0.05, respectively, then the instantaneous probability The identified expression type corresponding to a probability of 0.9 is determined as the micro-expression type of the image to be recognized, so as to ensure the accuracy of the identified micro-expression type.

S303: Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.

The emotion value comparison table is a preset data table for recording the emotion attribute corresponding to each micro-expression type. In the emotion value comparison table, the association relationship between the micro-expression type and the emotion value is stored. After acquiring the micro-expression type to which the image to be recognized belongs, the server queries the emotion value comparison table based on the micro-expression type to obtain the corresponding instantaneous emotion value. Among them, the instantaneous emotion value is a value between [-1,1], the larger the value, the more the user likes the original video image corresponding to the recording timestamp associated with the image to be recognized; the smaller the data, the more the user hates the treatment Identify the original video image corresponding to the recording timestamp associated with the image. For example, to facilitate subsequent calculations, the instantaneous emotion values corresponding to the 54 micro-expression types identified by the micro-expression recognition model can be set to 1, 0.8, 0.5, 0.3, 0, -0.3, -0.5, -0.8, and -1, respectively. Any of them.

The hotspot video annotation processing method provided in this embodiment first uses a micro-expression recognition model to recognize the image to be recognized, so as to quickly obtain the instantaneous probability corresponding to at least one recognized expression type, and selects the identified expression type with the largest instantaneous probability to determine to be recognized The micro-expression type of the image to ensure the accuracy of the identified micro-expression type. Then query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized, so as to ensure the efficiency of acquiring the instantaneous emotion value of the image to be recognized.

Further, after acquiring the instantaneous emotion value corresponding to each image to be recognized, the server may query the database based on the instantaneous emotion value to obtain the standard volume or standard tone corresponding to the instantaneous emotion value; and obtain the client currently playing the to-be-recognized The current volume or current color tone of the image, based on the standard volume or standard color tone, automatically adjust the current volume and current color tone respectively, so that the current volume and current color tone of the image to be recognized match the user's current mood, you can make the video The volume or hue of the match with the user's mood at the time, it is easier to cause empathy, thereby increasing the appeal of the original video to the user.

In an embodiment, as shown in FIG. 4, in step S203, the original video image corresponding to the playback timestamp is determined according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp The steps of intense emotion probability include:

S401: Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp.

Wherein, the total number of images is the sum of the images to be recognized corresponding to all users who have collected the original video image and collected by the server. Specifically, when annotating a hot video segment of any original video, it is necessary to obtain all recorded videos corresponding to viewing the original video, and to count the images to be identified corresponding to all the recording time stamps associated with the playback time stamp corresponding to the same original video image The number is determined as the total number of images. For example, for an original video with a video ID of XX0001, a certain original video image is an original video image with a playback timestamp of the 10th second in the original video, and the number of all images to be recognized associated with the 10th original video image is The total number of images.

S402: If the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than the preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion.

Wherein, the preset emotion threshold is a preset threshold for evaluating whether the instantaneous emotion value is intense emotion. The preset emotion threshold may be set to 0.6 or other values. Specifically, the server compares the absolute value of the instantaneous emotion value corresponding to the image to be recognized with the preset emotion threshold, if the absolute value is greater than the preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion; otherwise, if the absolute If the value is not greater than the preset emotion threshold, the emotion attribute of the image to be recognized is plain emotion. That is, the micro-expression recognition model recognizes that the instantaneous emotion value corresponding to each image to be recognized is a value between [-1,1]. The closer the absolute value of the instantaneous emotion value is to 1, it means that the user has viewed the original video The greater the degree of likes or dislikes of the original video image of, the micro-expression emotions can be considered as intense emotions. Such intense emotions easily resonate with users and have strong appeal. Correspondingly, if the absolute value of the instantaneous emotion value is close to 0, it means that the user’s preference or dislike of the original video image in the original video being watched is smaller, indicating that the original video image does not resonate with the user. The lower the user's attractiveness, the micro-expression emotion can be regarded as a dull emotion.

S403: Count the images to be recognized corresponding to all the recording timestamps associated with the same playback timestamp, and the number of images to be recognized whose emotion attribute is intense emotion is the number of intense emotions.

Specifically, the server determines the number of all images to be recognized that have an emotion of intense emotion value from the images to be recognized corresponding to all recorded time stamps associated with the same playback time stamp, and determines the number as the number of intense emotions. For example, if 100 users watch the original video image corresponding to a certain playback timestamp in the same original video at the same time, then obtain 100 to-be-recognized images corresponding to all the recording timestamps associated with the same playback timestamp and use the micro-expression recognition model to identify The instantaneous emotion values of all 100 images to be identified, and whether or not they are intense emotions is determined based on the instantaneous emotion values, and the number of images to be recognized whose emotional attribute is intense emotions is determined as the intense emotion quantity, in which case the intense emotion quantity is 0 Values between -100.

S404: Calculate the total number of images and the number of intense emotions using the intense emotion probability formula to determine the intense emotion probability of the original video image corresponding to the playback timestamp. The intense emotion probability formula is L=A/B, L is the intense emotion probability, A Is the number of intense emotions, and B is the total number of images.

Specifically, after acquiring the total number of images and the number of intense emotions of any original video image, the server may quickly calculate the intense emotion probability using the intense emotion probability formula. The intense emotion probability reflects the probability of causing strong emotion fluctuations to the original video image among all users who viewed the original video image, which can well reflect the attractiveness of the original video image to the user or the degree of resonance caused by the user.

In the hotspot video annotation processing method provided in this embodiment, first obtain the total number of images of all the images to be identified corresponding to the same playback timestamp, and determine from the images to be identified corresponding to the same playback timestamp that the emotional attribute is intense The number of intense emotions is calculated using the intense emotion probability formula, which makes the acquisition of intense emotion probabilities more objective and can intuitively show the attractiveness of the original video image to the user.

In an embodiment, as shown in FIG. 5, in step S205, the original video is hot-spot-marked based on the hot-spot video image to obtain hot-spot video clips, including:

S501: Count the number of frames of the original video clip formed between any two hot-spot video images, and determine the number of frames of the video clip.

The frame number of the video clip refers to the total number of frames of the original video clip formed between the two hot video images. In this embodiment, after acquiring hotspot video images, the number of frames of the original video clip formed between any two hotspot video images is counted and determined as the number of frames of the video clip. Since the original video clip contains two hotspot video images, so The number of video clip frames is at least two. For example, if the 20th original video image and the 40th original video image in the original video are hotspot video images, it is determined that the number of video clip frames of the original video clip formed between the two hotspot video images is 21 frames.

S502: If the number of video clip frames is less than or equal to the first frame number threshold, determine the original video clip as a hot video clip.

The first frame number threshold refers to a preset threshold for determining whether the original video clip is the minimum value of the time interval of the hot video clip. The first frame number threshold is set independently, and its value is generally relatively small. For example, the threshold of the first frame number is set to 120 frames, and the frame rate of the original video playback is generally 24 frames/second, therefore, the original video segment that can be determined by it is an original video segment of 5 seconds. If the frame number of the video clip is less than the threshold of the first frame number, it means that the interval between the original video clips of the two adjacent hotspot video images is short, and the original video clips cause the user's intense emotional value in a short time and attract the user's attention , Then the original video clip is determined as a hot video clip.

S503: If the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold, then based on the playback timestamp corresponding to the original video clip, the fluctuation emotion probability corresponding to the original video clip is obtained.

The second frame number threshold refers to a preset threshold for determining whether the segment video is the maximum time interval of the hot video segment. Generally, the second frame number threshold is set larger. For example, when the first frame number threshold is set to 120 frames, the second frame number threshold can be set to 1200 frames. If the playback frame rate of the original video is 24 frames/second, therefore, It can be determined that the original video segment formed between the two hotspot video images is a 50-second original video segment. According to the playback timestamp corresponding to each original video image in the 50-second original video clip, obtain the emotion fluctuation probability of the image to be identified associated with the playback timestamp; if the emotion fluctuation probability is large, it means that the original video clip caused the user Is more likely to have intense emotions; on the contrary, if the probability of emotional fluctuation is small, it means that the original video clip is less likely to cause intense emotions of the user. Among them, the emotion fluctuation probability refers to the probability of causing a large emotion fluctuation during the user watching the original video clip, where the large emotion fluctuation can understandably change from overjoy to great compassion or other emotion change processes.

S504: If the fluctuation emotion probability is greater than the second probability threshold, the original video clip is determined as a hot video clip.

The second probability threshold is a probability-related threshold set for evaluating hot-spot video clips based on the fluctuation emotion probability. Understandably, if the fluctuation emotion probability of an original video clip is greater than the second probability threshold, it means that the original video clip causes a strong emotional fluctuation of the user, attracts the user's attention, and can be determined as a hot video clip.

In the hotspot video annotation processing method provided in this embodiment, the video clip frame number of the original video clip formed between two hotspot video images is first obtained. If the video clip frame number is less than or equal to the first frame number threshold, then directly The frame number of the video clip is a hot video clip. If the video clip frame number is greater than the first frame number threshold and less than or equal to the second frame number threshold, you need to obtain the fluctuation emotion probability of the original video clip, and then compare the fluctuation emotion probability of the clip video with the second probability threshold Determine whether the original video clip is a hot video clip. In this embodiment, it is determined whether the video clip frame number and fluctuation emotion probability of the original video clip formed between the two hot-spot video images are hot-spot video clips, thereby automatically tagging the hot-spot video clips in the original video, and Ensure the objectivity of the marked hot video clips.

In one embodiment, as shown in FIG. 6, step S503, that is, based on the playback timestamp corresponding to the original video segment, acquiring the fluctuating emotion probability corresponding to the original video segment includes:

S601: Based on the playback timestamp corresponding to the original video segment, the recorded video segment corresponding to the playback timestamp is intercepted from the recorded video corresponding to the original video.

Specifically, the server intercepts the recorded video segment associated with the playback timestamp in the recorded video from the recorded video corresponding to the original video according to the playback timestamp of the original video segment, so as to identify the image to be recognized of the recorded video segment. For example, if the playback timestamp of the original video segment in an original video is 10-50 seconds, then from the recorded video corresponding to the original video, the interception of the recording timestamp corresponds to the playback timestamp 10-50 seconds Recorded video clips, so that each image to be recognized in the recorded video clips can reflect the user's facial expression changes when viewing the original video clips.

S602: Obtain the instantaneous emotion value corresponding to each image to be recognized in the recorded video segment.

Since in step S202, a micro-expression recognition model has been used to identify each image to be recognized in all recorded videos and obtain the corresponding instantaneous emotion value, therefore, this step can directly obtain each image to be recognized in the recorded video clip The corresponding instantaneous emotion value does not need to be re-identified to improve the efficiency of acquiring the instantaneous emotion value.

S603: Calculate the instantaneous emotion value corresponding to all the images to be recognized in the recorded video clip using the standard deviation formula to obtain the standard deviation of the emotion value; the standard deviation formula is

Among them, S _N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x _i is the instantaneous emotion value of each image to be recognized,

It is the average value of all instantaneous emotion values x _i in the recorded video clip.

The standard deviation of the emotion value refers to the standard deviation of the instantaneous emotion value when the user views all the images to be recognized in the original video clip, which can objectively reflect the mood fluctuation of the user when viewing the original video clip. Understandably, if the instantaneous emotion value of each user is used to calculate the standard deviation of the emotion value, the hot video segment determined by the standard deviation of the emotional value greater than the preset standard deviation is the hot video segment concerned by the user. If the average sentiment value of all users who have viewed this original video clip is used to calculate the standard deviation of the sentiment value, the hot video segment determined based on the sentiment value standard deviation being greater than the preset standard deviation is the hot video segment that all users are concerned about .

S604: If the standard deviation of the sentiment value is greater than the standard deviation threshold, the recorded video clip is a video clip of emotional fluctuation.

The standard deviation threshold is a value preset by the server, and the standard deviation threshold can be set independently by the user according to requirements. In this embodiment, if the standard deviation of the sentiment value of an original video clip is greater than the standard deviation threshold, it means that the user's emotional fluctuations are large when viewing the original video clip, which may be from great joy to great compassion, or from great compassion to great joy , The recorded video clip is a mood swing video clip. This emotional fluctuation is reflected by the standard deviation of the emotional value, which can objectively reflect the user's emotional changes during watching the original video clip.

S605: Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip. The fluctuation emotion probability formula is P=C/D, P is the fluctuation emotion probability, C Is the number of emotionally fluctuating video clips, and D is the number of recorded video clips.

Specifically, the fluctuating emotion probability can intuitively express the mood fluctuation of the user when viewing the original video clip. If the user views the original video, the greater the number of mood swing video clips, the greater the fluctuating mood probability, it means that the original video clip can Resonate with user emotions. In the above fluctuating emotion probability, the number D of recorded video clips is the number of recorded video clips from which the same original video clip is viewed from the recorded video corresponding to all users, which can be understood as all the original video clips viewed and recorded to the user The number of users whose facial expressions change. The number C of emotional fluctuation video clips is the number D of recorded video clips, and the standard deviation of the sentiment value is greater than the standard deviation threshold.

In the hotspot video annotation processing method provided in this embodiment, the instantaneous emotion value corresponding to each image to be recognized in the recorded video clip is obtained, and the standard deviation of the emotion value is calculated using a standard deviation formula to determine whether each recorded video clip is Emotional fluctuation video clips to determine the emotional fluctuation video clips that can cause strong emotion fluctuations; then the number of emotional fluctuation video clips and the number of recorded video clips are calculated to obtain the fluctuation emotion probability of the original video clip to achieve The probability reflects the emotional fluctuation of all users watching the original video clip.

In one embodiment, as shown in FIG. 7, each recorded video is associated with a user ID, which is an identifier used to uniquely identify the user's identity in the video playback system. After step S205, the hotspot video annotation processing method further includes:

S701: Based on the playback timestamp corresponding to the hot video segment, intercept the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID.

The target video segment is a recorded video segment corresponding to the playback timestamp of the recorded video corresponding to the user ID and the hot video segment. Specifically, the server obtains the recorded video clip corresponding to the playback timestamp of the hot video clip in the recorded video corresponding to the user ID according to the playback time stamp corresponding to the hot video clip, and determines the acquired recorded video clip as Target video clip.

S702: Acquire the instantaneous emotion value corresponding to each image to be recognized in the target video segment.

In step S202, a micro-expression recognition model has been used to identify each image to be recognized in all recorded videos and obtain the corresponding instantaneous emotion value. Therefore, this step can directly obtain each image to be recognized in the target video segment The corresponding instantaneous emotion value does not need to be re-identified to improve the efficiency of acquiring the instantaneous emotion value.

S703: Query the emotion tag comparison table based on the instantaneous emotion value to obtain a single frame of emotion tags corresponding to the image to be recognized.

The emotion tag comparison table is a preset comparison table for recording the emotion tags corresponding to each instantaneous emotion value. Since the instantaneous emotion value is set to any one of 1, 0.8, 0.5, 0.3, 0, -0.3, -0.5, -0.8, and -1, and each instantaneous emotion value can correspond to at least one micro-expression type, therefore, An emotion label may be determined according to each instantaneous emotion value, or according to the size of the instantaneous emotion value, and a preset rule for dividing the emotion label, so that each instantaneous emotion value corresponds to an emotion label. For example, the emotion label can be divided into emotion labels such as joy, anger, ... sorrow and joy, and can also be divided into emotion level 1 and emotion level 2 according to the emotion label division rules (such as the emotion value from large to small). Emotion, each level of emotion corresponds to a range of emotion values. The single-frame emotion label refers to the emotion label corresponding to the instantaneous emotion value corresponding to the image to be recognized in the emotion label comparison table. That is, according to the instantaneous emotion value of the user in each image to be recognized, a single frame of emotion labels corresponding to the instantaneous emotion value is queried in the emotion label comparison table, so as to determine that the user determines the corresponding original video image according to the single frame of emotion labels Degree of preference.

S704: Based on the single-frame emotion tag corresponding to the image to be recognized, obtain the segment emotion tag corresponding to the target video segment.

Among them, since the image to be recognized in the target video segment is a real-time image taken when the user views the hot video segment, the single-frame emotion tag corresponding to each image to be recognized can reflect each original video image in the target video segment viewed by the user After obtaining the single-frame emotional tags corresponding to all the images to be identified in the target video clip, the emotional tags of the clip when the user views the target video clip can be obtained. Specifically, a single-frame emotion label with the largest number may be selected from the single-frame emotion labels of all the images to be recognized as the segment emotion identifier.

S705: If the clip emotion tag is a preset emotion tag, query the user portrait database based on the user ID, obtain the user tag corresponding to the user ID, determine the target user based on the user tag, and push the hot video clip to the client corresponding to the target user .

Wherein, the user tag is based on the user ID to query the user portrait database, and the acquired gender, age, occupation, interest, or other preset tags in the user portrait database corresponding to the user ID are obtained. The target user refers to a user who has the same preferences as the original video obtained by the server and the user ID. Specifically, the user profile database can be queried based on the user ID to obtain a user tag corresponding to the user ID, and then the target user can be quickly obtained based on the user tag, so as to facilitate the push of the target user's favorite hot video clip.

Among them, the preset emotion tags are preset tags that can be used for video push. For example, if the preset emotional tag is a hi tag or a level 1 tag, and the server recognizes that the segment emotional tag of a target video clip is a level 1 tag, then the corresponding hot video clip is deemed to be more attractive to the user corresponding to the user ID , Hotspot video clips can be pushed to target users with the same user tags (that is, with the same preferences) corresponding to the user ID to ensure the attractiveness of the hotspot video clips to the target users.

In the hotspot video annotation processing method provided in this embodiment, the target video segment corresponding to the playback timestamp of the hotspot video segment is intercepted from the recorded video, and a single frame of emotion tags corresponding to each image to be identified is obtained to determine The emotional tag of the segment corresponding to the target video segment, the emotional tag of the segment may reflect the preference of the user corresponding to the user ID in the process of watching the hot video segment. Then, query the user portrait database based on the user ID to obtain the user tag of the user, so as to determine the target user with the same user tag that the user corresponding to the user ID has, so that the target user has the same preferences as the user corresponding to the user ID. When the clip emotion tag is a preset emotion tag, push the hot video clip to the target user to increase the attractiveness of the hot video clip corresponding to the target user, thereby increasing the playback volume of the hot video clip or even the original video containing the hot video clip .

In one embodiment, each recorded video is associated with a user ID, which is an identifier used to uniquely identify the user's identity in the video playback system. As shown in FIG. 8, after step S205, the hotspot video annotation processing method further includes:

S801: Based on the playback timestamp corresponding to the hot video segment, the target video segment corresponding to the playback timestamp is intercepted from the recorded video corresponding to the user ID.

The specific implementation process of step S801 is the same as that of step S701. In order to avoid redundant description, details are not described here one by one.

S802: Obtain the instantaneous emotion value corresponding to each image to be recognized in the target video segment.

The specific implementation process of step S802 is the same as that of step S702. In order to avoid redundant description, details are not described here one by one.

S803: Query the emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized.

The specific implementation process of step S803 is the same as that of step S703. To avoid redundant description, details are not described here one by one.

S804: Based on the single-frame emotion tag corresponding to the image to be recognized, obtain the segment emotion tag corresponding to the target video segment.

The specific implementation process of step S804 is the same as that of step S704. To avoid redundant description, details are not described here one by one.

S805: If the clip emotion tag is a preset emotion tag, query the video database based on the playback timestamp corresponding to the hot video clip, obtain the content tag corresponding to the hot video clip, and determine the hot video clip corresponding to the content tag as Recommend video clips, push the recommended video clips to the client corresponding to the user ID.

Among them, the content tag refers to the tag of the content played in the original video. The content may be funny, food, fashion, travel, entertainment, life, information, parent-child, knowledge, games, cars, finance, cute pets, sports, music, Category labels such as anime, technology, and health can also be other labels that subdivide specific descriptions of video content. Specifically, the server determines that the segment emotion tag of the target video segment is a preset emotion tag, and determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about. At this time, based on the playback time corresponding to the hotspot video segment Poke to query the video database to obtain the content tag corresponding to the hot video segment. Since the user corresponding to the user ID pays more attention to the hot video segment, the analogy determines that the user will pay attention to all the hot video segments corresponding to the content tags corresponding to the hot video segment.

The recommended video clip is a hot video clip that can be recommended to the user corresponding to the user ID determined based on the content tag. Specifically, the server queries the video database according to the content tag, obtains other hot video clips corresponding to the content tag, the hot video clip is determined to be a recommended video clip, and recommends the recommended video clip to the client of the user ID to implement automatic recommendation and Hot video clips with the same content label are given to the user corresponding to the user ID.

In the hotspot video annotation processing method provided in this embodiment, the target video segment corresponding to the playback timestamp of the hotspot video segment is intercepted from the recorded video, and a single frame of emotion tags corresponding to each image to be identified is obtained to determine The emotional tag of the segment corresponding to the target video segment, the emotional tag of the segment may reflect the preference of the user corresponding to the user ID in the process of watching the hot video segment. Then, based on the playback timestamp of the hotspot video clip, query the video database to determine the pre-configured content tag of the hotspot video clip, so as to determine other hotspot video clips stored by the server corresponding to the content tag as recommended video clips, and Recommend the recommended video clip to the client corresponding to the user ID, so that the recommended video clip can more easily cater to the preferences of the user corresponding to the user ID, and improve the attractiveness of the user corresponding to the user ID to the recommended video clip.

In an embodiment, as shown in FIG. 9, after step S205, the hotspot video annotation processing method further includes:

S901: Based on the hotspot video clips, the hotspot video frame rate corresponding to the original video is counted.

Among them, the hot-spot video frame rate refers to the probability that the number of frames of all hot-spot video clips in an original video occupies the number of frames of the entire original video. Specifically, the server obtains the number of frames of an original video, and then counts the number of frames of all hot video segments in the original video, and divides the number of frames of all hot video segments by the number of frames of the original video to obtain the corresponding Hot video frame rate. For example, the number of frames of the original video is 10000, that is, the original video contains 10000 original video images, and the frame number of the first hotspot video clip is 1000, the frame number of the second hotspot video clip is 2000, and the third hotspot If the frame number of the video clip is 3000, the frame rate of the hotspot video corresponding to the original video is (1000+2000+3000)/10000=60%, indicating that 60% of the original video images in the original video are in the hotspot video clip The original video image can objectively reflect the attractiveness of the original video to users.

S902: Sort the original video based on the hot video frame rate corresponding to the original video and display it on the client according to the sorting result.

Among them, the server sorts the display position of the original video on the client according to the order of the hot video frame rate from high to low, so that the user can watch the original video with a higher hot video frame rate, so that the user can choose according to the hot video frame rate Watching, so as to improve the user's playback volume of the original video displayed by the video playback system.

In the hotspot video annotation processing method provided in this embodiment, after obtaining the hotspot video frame rate of each original video, the original video is sorted and displayed on the user's client, so that the user can selectively watch more attractive Original video to increase the playback volume of the original video displayed by the video playback system.

In an embodiment, as shown in FIG. 10, in step S901, based on the hotspot video segment, the hotspot video frame rate corresponding to the original video is counted, including:

S1001: Count the number of original video images in each hot video segment, and determine the total number of frames of the hot video segment.

Among them, the total frame number of the hot video clip refers to the total frame number of all the hot video clips in the same original video. For example, an original video has 6 hotspot video clips. At this time, the server counts the total number of frames of the 6 hotspot video clips as the total number of hotspot video clips.

S1002: Count the number of original video images in the original video and determine the total number of video frames of the original video.

Specifically, the server counts the number of original video images in the original video and determines the total number of video frames of the original video, that is, the total number of video frames is the number of all original video images in the original video. Specifically, on the premise that the playback frame rate is determined, the total number of video frames of the original video may be determined according to the product of the playback frame rate and the playback duration of the original video, so as to quickly determine the total number of frames of the original video,

S1003: The hotspot video frame rate formula is used to calculate the total frame number of the hotspot video clip and the original video video frame to obtain the hotspot video frame rate corresponding to the original video. The hotspot video frame rate formula is

Where, Z is the hot spot video frame rate, w _j is the total frame number of the jth hot spot video segment, m is the number of hot spot video segments, and K is the total video frame number of the original video.

The server can determine the total frame number of the hot video segment and the total video frame of the original video, and can quickly calculate the frame rate of the hot video using the formula of the frame rate of the hot video, based on the original frame of the hot video The videos are sorted so that users can selectively watch the original video with a higher frame rate of the hotspot video and improve the playback volume of the original video.

In the hotspot video annotation processing method provided in this embodiment, the total frame number of the hotspot video segment and the total video frame number of the original video are obtained, and the hotspot video frame rate formula is used to calculate the hotspot video frame rate corresponding to the original video, so that According to the hot video frame rate to reflect the attractiveness of the original video to the user, so as to sort, in order to improve the playback volume of the original video.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In an embodiment, a hotspot video annotation processing device is provided, and the hotspot video annotation processing device corresponds one-to-one to the hotspot video annotation processing method in the foregoing embodiment. As shown in FIG. 11, the hotspot video annotation processing device includes a recorded video acquisition module 1101, an instant emotion value acquisition module 1102, an intense emotion probability determination module 1103, a hotspot video image determination module 1104, and a hotspot video segment acquisition module 1105. The detailed description of each functional module is as follows:

The recorded video obtaining module 1101 is used to obtain the user's recorded video collected while the client plays the original video. The original video includes at least one frame of original video image, and the recorded video includes at least one frame of image to be recognized. The recording timestamp is associated with the playback timestamp of an original video image.

The instantaneous emotion value acquisition module 1102 is used to identify each image to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the image to be recognized.

The intense emotion probability determination module 1103 is used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp.

The hotspot video image determination module 1104 is configured to determine the original video image as a hotspot video image if the intense emotion probability is greater than the first probability threshold.

The hotspot video clip acquisition module 1105 is configured to perform hotspot annotation on the original video based on the hotspot video image to obtain hotspot video clips.

Preferably, the instantaneous emotion value acquisition module 1102 includes an instantaneous probability acquisition unit, a micro-expression type determination unit, and an instantaneous emotion value acquisition unit.

The instantaneous probability acquisition unit is used to identify each image to be recognized by using a micro-expression recognition model to acquire the instantaneous probability corresponding to at least one type of recognized expression.

The micro-expression type determination unit is used to determine the identified expression type with the largest instantaneous probability as the micro-expression type of the image to be recognized.

The instantaneous emotion value acquisition unit is used to query an emotion value comparison table based on the micro-expression type to acquire the instantaneous emotion value of the image to be recognized.

Preferably, the intense emotion probability determination module 1103 includes a total number of image statistics unit, an intense emotion judgment unit, an intense emotion quantity statistical unit, and an intense emotion probability determination unit.

The total number of image counting unit is used to count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp.

The intense emotion judgment unit is configured to: if the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion.

The intense emotion quantity counting unit is used to count the images to be recognized corresponding to all the recording timestamps associated with the same playback timestamp, and the number of images to be recognized whose emotion attribute is intense emotion is the number of intense emotions.

Severe emotion probability determination unit, used to calculate the total number of images and the number of intense emotions using the intense emotion probability formula to determine the intense emotion probability of the original video image corresponding to the playback timestamp, the intense emotion probability formula is L=A/B, L Is the probability of intense emotion, A is the number of intense emotions, and B is the total number of images.

Preferably, the hotspot video clip acquisition module 1105 includes a video clip frame number counting unit, a first hotspot video clip determination unit, a fluctuation emotion probability acquisition unit, and a second hotspot video clip determination unit.

The video clip frame number counting unit is used to count the number of frames of the original video clip formed between any two hot-spot video images and determine the frame number of the video clip.

The first hotspot video segment determination unit is configured to determine the original video segment as a hotspot video segment if the video segment frame number is less than or equal to the first frame number threshold.

Fluctuation mood probability acquisition unit, used to obtain the fluctuation mood probability corresponding to the original video clip based on the playback timestamp corresponding to the original video clip if the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold .

The second hotspot video segment determination unit is configured to determine the original video segment as a hotspot video segment if the fluctuation emotion probability is greater than the second probability threshold.

Preferably, the fluctuation emotion probability acquisition unit includes a recorded video clip interception subunit, an instant emotion value acquisition subunit, an emotion value standard deviation acquisition subunit, an emotion fluctuation video clip determination subunit, and a fluctuation emotion probability calculation subunit.

The recorded video clip interception subunit is used to intercept the recorded video clip corresponding to the playback timestamp from the recorded video corresponding to the original video based on the playback timestamp corresponding to the original video clip.

The instantaneous emotion value acquisition subunit is used to acquire the instantaneous emotion value corresponding to each image to be identified in the recorded video segment.

Emotion value standard deviation acquisition subunit, used to calculate the instantaneous emotion value corresponding to all the images to be recognized in the recorded video clip using the standard deviation formula to obtain the standard deviation of the emotion value, the standard deviation formula is

The emotional fluctuation video clip determination subunit is used to record a video clip as an emotional fluctuation video clip if the standard deviation of the emotional value is greater than the standard deviation threshold.

Fluctuation emotion probability calculation subunit, used to calculate the quantity of emotion fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip, the fluctuation emotion probability formula is P=C/D, P is the probability of fluctuating emotions, C is the number of video clips of emotional fluctuations, and D is the number of recorded video clips.

Preferably, each recorded video is associated with a user ID; after the hotspot video clip acquisition module 1105, the hotspot video tagging device further includes a target video clip interception module, a target emotion value acquisition module, a single-frame emotion tag acquisition module, a clip emotion Tag acquisition module, target user determination module and hotspot video clip pushing module.

The target video clip interception module is used to intercept the target video clip corresponding to the playback timestamp from the recorded video corresponding to the user ID based on the playback timestamp corresponding to the hot video clip.

The target emotion value acquisition module is used to acquire the instantaneous emotion value corresponding to each image to be identified in the target video segment.

The single-frame emotion label acquisition module is used to query the emotion label comparison table based on the instantaneous emotion value and obtain the single-frame emotion label corresponding to the image to be recognized.

The segment emotion tag acquisition module is used to acquire the segment emotion tag corresponding to the target video segment based on the single frame emotion tag corresponding to the image to be recognized.

The first video segment pushing module is used to query the user portrait database based on the user ID if the emotional tag of the segment is a preset emotional tag, obtain the user tag corresponding to the user ID, determine the target user based on the user tag, and push the hot video clip To the client corresponding to the target user.

The second video clip push module is used to query the video database based on the playback timestamp corresponding to the hotspot video clip if the clip's emotion tag is the preset emotion tag, and obtain the content tag corresponding to the hotspot video clip, which will be related to the content tag The corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.

Preferably, after the hotspot video segment acquisition module 1105, the hotspot video annotation processing device further includes a hotspot video frame rate statistics module and an original video sorting module.

The hotspot video frame rate statistics module is used to calculate the hotspot video frame rate corresponding to the original video based on the hotspot video clips.

The original video sorting module is used to sort the original video based on the hot video frame rate corresponding to the original video and display it on the client according to the sorting result.

Preferably, the hotspot video frame rate statistics module includes a total frame number determination unit for the clip, a total video frame number determination unit, and a hotspot video frame rate acquisition unit.

The total frame number determining unit of the clip is used to count the number of original video images in each hot video segment and determine the total frame number of the hot video segment.

The total video frame number determining unit is used to count the number of original video images in the original video and determine the total number of video frames of the original video.

Hotspot video frame rate acquisition unit, used to calculate the total frame number of the hotspot video clip and the total video frame of the original video using the hotspot video frame rate formula, to obtain the hotspot video frame rate corresponding to the original video, and the hotspot video frame rate formula for

For specific definitions of the hotspot video annotation processing device, reference may be made to the above limitation on the hotspot video annotation processing method, and details are not described herein again. Each module in the above hotspot video annotation processing device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure may be as shown in FIG. 12. The computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer device is used to store data used or generated during the execution of the above hot-spot video annotation processing method, such as the number of original video images. The network interface of the computer device is used to communicate with external terminals through a network connection. When the computer readable instructions are executed by the processor to implement a hotspot video annotation processing method.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. When the processor executes the computer-readable instructions, the hot spots in the above embodiments are implemented Video annotation processing methods, such as steps S201-S205 shown in FIG. 2 or steps shown in FIGS. 3-10, are not repeated here to avoid repetition. Alternatively, the processor implements the functions of each module/unit in the embodiment of the hotspot video annotation processing device when executing computer-readable instructions, for example, the recorded video acquisition module 1101 shown in FIG. 11, the instant emotion value acquisition module 1102, and the intense emotion The functions of the probability determination module 1103, the hotspot video image determination module 1104, and the hotspot video segment acquisition module 1105 are described here to avoid repetition.

In an embodiment, a computer-readable storage medium is provided, and the computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by a processor, the hotspot video annotation processing method in the foregoing embodiment is implemented, for example The steps S201-S205 shown in FIG. 2 or the steps shown in FIGS. 3-10 are not repeated here to avoid repetition. Alternatively, when the computer-readable instructions are executed by the processor, the functions of each module/unit in the embodiment of the above-mentioned hotspot video annotation processing apparatus are realized, for example, the recorded video acquisition module 1101 shown in FIG. 11 and the instantaneous emotion value acquisition module 1102 1. The functions of the intense emotion probability determination module 1103, the hotspot video image determination module 1104, and the hotspot video segment acquisition module 1105. To avoid repetition, they are not described here.

A person of ordinary skill in the art may understand that all or part of the process in the method of the above embodiments may be completed by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions may be stored in a non-volatile computer-readable In the storage medium, when the computer-readable instructions are executed, they may include the processes of the foregoing method embodiments. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Those skilled in the art can clearly understand that, for convenience and conciseness of description, only the above-mentioned division of each functional unit and module is used as an example for illustration. In actual applications, the above-mentioned functions can be allocated by different functional units, Module completion means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

The above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that: The recorded technical solutions are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of this application, and should be included in this application. Within the scope of protection.

Claims

A hotspot video annotation processing method, which is characterized by including:

Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;

Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;

Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;

If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;

Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
The hotspot video annotation processing method according to claim 1, wherein the micro-expression recognition model is used to identify each of the images to be recognized, and obtaining the instantaneous emotion value corresponding to the images to be recognized includes:

Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous probability corresponding to at least one type of recognized expression;

Determining the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized;

Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.
The hotspot video annotation processing method according to claim 1, wherein the original emotion corresponding to the playback timestamp is determined according to the instantaneous emotion value of the image to be identified corresponding to all recording timestamps associated with the same playback timestamp Probability of intense emotions in video images, including:

Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp;

If the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion;

Count the number of images to be recognized corresponding to all recording timestamps associated with the same playback timestamp, and the number of images to be recognized with intense emotions is the number of intense emotions;

Use the intense emotion probability formula to calculate the total number of images and the intense emotion quantity to determine the intense emotion probability of the original video image corresponding to the playback timestamp. The intense emotion probability formula is L=A/B, L Is the probability of intense emotion, A is the number of intense emotions, and B is the total number of images.
The hotspot video annotation processing method according to claim 1, wherein the hotspot annotation of the original video based on the hotspot video image to obtain hotspot video clips includes:

Count the number of frames of the original video clip formed between any two hot video images and determine the number of frames of the video clip;

If the frame number of the video clip is less than or equal to the first frame number threshold, the original video clip is determined to be a hot video clip;

If the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold, then based on the playback timestamp corresponding to the original video clip, the fluctuation emotion probability corresponding to the original video clip is obtained;

If the fluctuation emotion probability is greater than the second probability threshold, the original video clip is determined to be a hot video clip.
The hotspot video annotation processing method according to claim 1, wherein the acquiring the fluctuating emotion probability corresponding to the original video segment based on the playback time stamp corresponding to the original video segment includes:

Based on the playback timestamp corresponding to the original video segment, intercepting the recorded video segment corresponding to the playback timestamp from the recorded video corresponding to the original video;

Acquiring the instantaneous emotion value corresponding to each image to be identified in the recorded video segment;

The standard deviation formula is used to calculate the instantaneous emotion values corresponding to all the images to be recognized in the recorded video clip to obtain the standard deviation of the emotion value; the standard deviation formula is
Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized,
Is the average of all instantaneous emotion values x i in the recorded video clip;

If the standard deviation of the sentiment value is greater than the standard deviation threshold, the recorded video clip is a video clip of emotional fluctuation;

Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip, the fluctuation emotion probability formula is P=C/D, P Is the fluctuating emotion probability, C is the number of the fluctuating video clips, and D is the number of the recorded video clips.
The hotspot video annotation processing method according to claim 1, wherein each of the recorded videos is associated with a user ID;

After the hot video segment is obtained, the hot video annotation processing method further includes:

Based on the playback timestamp corresponding to the hot video segment, intercepting the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID;

Acquiring the instantaneous emotion value corresponding to each of the images to be recognized in the target video segment;

Query an emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized;

Based on the single-frame emotional label corresponding to the image to be identified, acquiring the segment emotional label corresponding to the target video segment;

If the segment emotion tag is a preset emotion tag, query a user portrait database based on the user ID, obtain a user tag corresponding to the user ID, determine a target user based on the user tag, and push the hot video segment To the client corresponding to the target user;

Alternatively, if the clip emotion tag is a preset emotion tag, then query the video database based on the playback timestamp corresponding to the hotspot video clip to obtain the content tag corresponding to the hotspot video clip, which will be the content tag The corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.
The hotspot video annotation processing method according to claim 1, wherein after the hotspot video segment is acquired, the hotspot video annotation processing method further comprises:

Based on the hotspot video segment, counting the hotspot video frame rate corresponding to the original video;

Based on the hot video frame rate corresponding to the original video, the original video is sorted and displayed on the client according to the sorting result.
A hotspot video annotation processing device, which is characterized by comprising:

The recorded video acquisition module is used to acquire the user's recorded video collected by the client while playing the original video. The original video includes at least one frame of original video image, and the recorded video includes at least one frame of image to be recognized. The recording timestamp of the image to be identified is associated with a playback timestamp of the original video image;

An instantaneous emotion value acquisition module, which is used to identify each of the images to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the images to be recognized;

The intense emotion probability determination module is used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;

A hotspot video image determination module, configured to determine the original video image as a hotspot video image if the intense emotion probability is greater than a first probability threshold;

The hot-spot video clip acquisition module performs hot-spot annotation on the original video based on the hot-spot video image to obtain hot-spot video clips.
A computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, characterized in that, when the processor executes the computer-readable instructions, it is implemented as follows step:

Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;

Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;

Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;

If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;

Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
The computer device according to claim 9, wherein the micro-expression recognition model is used to recognize each of the images to be recognized, and obtaining the instantaneous emotion value corresponding to the images to be recognized includes:

Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous probability corresponding to at least one type of recognized expression;

Determining the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized;

Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.
The computer device according to claim 9, wherein the original video image corresponding to the playback timestamp is determined according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp Probability of intense emotions, including:

Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp;

If the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion;

Count the number of images to be recognized corresponding to all recording timestamps associated with the same playback timestamp, and the number of images to be recognized with intense emotions is the number of intense emotions;

Use the intense emotion probability formula to calculate the total number of images and the intense emotion quantity to determine the intense emotion probability of the original video image corresponding to the playback timestamp. The intense emotion probability formula is L=A/B, L Is the probability of intense emotion, A is the number of intense emotions, and B is the total number of images.
The computer device according to claim 9, wherein the hotspot annotation of the original video based on the hotspot video image to obtain hotspot video clips includes:

Count the number of frames of the original video clip formed between any two hot video images and determine the number of frames of the video clip;

If the frame number of the video clip is less than or equal to the first frame number threshold, the original video clip is determined to be a hot video clip;

If the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold, then based on the playback timestamp corresponding to the original video clip, the fluctuation emotion probability corresponding to the original video clip is obtained;

If the fluctuation emotion probability is greater than the second probability threshold, the original video clip is determined to be a hot video clip.
The computer device according to claim 9, wherein the obtaining the probability of fluctuating emotion corresponding to the original video segment based on the playback timestamp corresponding to the original video segment includes:

Based on the playback timestamp corresponding to the original video segment, intercepting the recorded video segment corresponding to the playback timestamp from the recorded video corresponding to the original video;

Acquiring the instantaneous emotion value corresponding to each image to be identified in the recorded video segment;

The standard deviation formula is used to calculate the instantaneous emotion values corresponding to all the images to be recognized in the recorded video clip to obtain the standard deviation of the emotion value; the standard deviation formula is
Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized,
Is the average of all instantaneous emotion values x i in the recorded video clip;

If the standard deviation of the sentiment value is greater than the standard deviation threshold, the recorded video clip is a video clip of emotional fluctuation;

Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip, the fluctuation emotion probability formula is P=C/D, P Is the fluctuating emotion probability, C is the number of the fluctuating video clips, and D is the number of the recorded video clips.
The computer device of claim 9, wherein each of the recorded videos is associated with a user ID;

After the hot video segment is acquired, the processor further implements the following steps when executing the computer-readable instructions:

Based on the playback timestamp corresponding to the hot video segment, intercepting the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID;

Acquiring the instantaneous emotion value corresponding to each of the images to be recognized in the target video segment;

Query an emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized;

Based on the single-frame emotional label corresponding to the image to be identified, acquiring the segment emotional label corresponding to the target video segment;

If the segment emotion tag is a preset emotion tag, query a user portrait database based on the user ID, obtain a user tag corresponding to the user ID, determine a target user based on the user tag, and push the hot video segment To the client corresponding to the target user;

Alternatively, if the clip emotion tag is a preset emotion tag, then query the video database based on the playback timestamp corresponding to the hotspot video clip to obtain the content tag corresponding to the hotspot video clip, which will be the content tag The corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.
One or more non-volatile readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, characterized in that the computer readable instructions are processed by one or more When the processor executes, the one or more processors execute the following steps:

Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;

Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;

Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;

If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;

Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
The non-volatile readable storage medium according to claim 15, wherein the micro-expression recognition model is used to identify each of the images to be recognized, and the instantaneous emotion value corresponding to the images to be recognized is obtained, include:

Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous probability corresponding to at least one type of recognized expression;

Determining the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized;

Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.
The non-volatile readable storage medium according to claim 15, wherein the playback timestamp is determined according to the instantaneous emotion value of the image to be identified corresponding to all recording timestamps associated with the same playback timestamp The probabilistic emotional probability of the corresponding original video image, including:

Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp;

If the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion;

Count the number of images to be recognized corresponding to all recording timestamps associated with the same playback timestamp, and the number of images to be recognized with intense emotions is the number of intense emotions;

Use the intense emotion probability formula to calculate the total number of images and the intense emotion quantity to determine the intense emotion probability of the original video image corresponding to the playback timestamp. The intense emotion probability formula is L=A/B, L Is the probability of intense emotion, A is the number of intense emotions, and B is the total number of images.
The non-volatile readable storage medium according to claim 15, wherein the hotspot annotation of the original video based on the hotspot video image to obtain hotspot video clips includes:

Count the number of frames of the original video clip formed between any two hot video images and determine the number of frames of the video clip;

If the frame number of the video clip is less than or equal to the first frame number threshold, the original video clip is determined to be a hot video clip;

If the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold, then based on the playback timestamp corresponding to the original video clip, the fluctuation emotion probability corresponding to the original video clip is obtained;

If the fluctuation emotion probability is greater than the second probability threshold, the original video clip is determined to be a hot video clip.
The non-volatile readable storage medium according to claim 15, wherein the acquiring the fluctuating emotion probability corresponding to the original video segment based on the playback time stamp corresponding to the original video segment includes:

Based on the playback timestamp corresponding to the original video segment, intercepting the recorded video segment corresponding to the playback timestamp from the recorded video corresponding to the original video;

Acquiring the instantaneous emotion value corresponding to each image to be identified in the recorded video segment;

The standard deviation formula is used to calculate the instantaneous emotion values corresponding to all the images to be recognized in the recorded video clip to obtain the standard deviation of the emotion value; the standard deviation formula is
Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized,
Is the average of all instantaneous emotion values x i in the recorded video clip;

If the standard deviation of the sentiment value is greater than the standard deviation threshold, the recorded video clip is a video clip of emotional fluctuation;

Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip, the fluctuation emotion probability formula is P=C/D, P Is the fluctuating emotion probability, C is the number of the fluctuating video clips, and D is the number of the recorded video clips.
The non-volatile readable storage medium of claim 15, wherein each of the recorded videos is associated with a user ID;

After the hot video segment is acquired, when the computer-readable instructions are executed by one or more processors, the one or more processors further perform the following steps:

Based on the playback timestamp corresponding to the hot video segment, intercepting the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID;

Acquiring the instantaneous emotion value corresponding to each of the images to be recognized in the target video segment;

Query an emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized;

Based on the single-frame emotional label corresponding to the image to be identified, acquiring the segment emotional label corresponding to the target video segment;

If the segment emotion tag is a preset emotion tag, query a user portrait database based on the user ID, obtain a user tag corresponding to the user ID, determine a target user based on the user tag, and push the hot video segment To the client corresponding to the target user;

Alternatively, if the clip emotion tag is a preset emotion tag, then query the video database based on the playback timestamp corresponding to the hotspot video clip to obtain the content tag corresponding to the hotspot video clip, which will be the content tag The corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.