WO2020143156A1 - Hotspot video annotation processing method and apparatus, computer device and storage medium - Google Patents

Hotspot video annotation processing method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2020143156A1
WO2020143156A1 PCT/CN2019/088957 CN2019088957W WO2020143156A1 WO 2020143156 A1 WO2020143156 A1 WO 2020143156A1 CN 2019088957 W CN2019088957 W CN 2019088957W WO 2020143156 A1 WO2020143156 A1 WO 2020143156A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
emotion
image
recognized
probability
Prior art date
Application number
PCT/CN2019/088957
Other languages
French (fr)
Chinese (zh)
Inventor
刘建华
徐小方
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020143156A1 publication Critical patent/WO2020143156A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording

Definitions

  • the present application relates to the technical field of micro-expression recognition, in particular to a hotspot video annotation processing method, device, computer equipment, and storage medium.
  • video (especially online video) is the largest and fastest growing type of mobile data traffic.
  • the so-called online video refers to an audio-visual file provided by an online video service provider (for example, Baidu iQiyi), using streaming media as a playback format, and can be broadcasted online or on demand.
  • Network video generally requires an independent player, and the file format is mainly based on the P2P (Peer to Peer, peer-to-peer) technology that takes up less FLV (Flash Video, streaming media) format of client resources.
  • P2P Peer to Peer, peer-to-peer
  • FLV Flash Video, streaming media
  • Embodiments of the present application provide a hotspot video annotation processing method, device, computer equipment, and storage medium, to solve the problem of low efficiency in the current manual annotation of original video segment attributes.
  • a hotspot video annotation processing method including:
  • the original video includes at least one frame of original video image
  • the recorded video includes at least one frame of image to be recognized
  • the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image
  • the original video image is determined to be a hot video image
  • Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  • a hotspot video annotation processing device including:
  • Recorded video acquisition module used to acquire the user's recorded video collected by the client while playing the original video.
  • the original video includes at least one frame of original video image.
  • the recorded video includes at least one frame of image to be recognized.
  • the recording timestamp of the image to be identified is associated with a playback timestamp of the original video image;
  • Instantaneous emotion value acquisition module used to identify each of the images to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion values corresponding to the images to be recognized;
  • Intense emotion probability determination module used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
  • Hotspot video image determination module used to determine the original video image as a hotspot video image if the intense emotion probability is greater than a first probability threshold;
  • Hotspot video clip acquisition module hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • the original video includes at least one frame of original video image
  • the recorded video includes at least one frame of image to be recognized
  • the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image
  • the original video image is determined to be a hot video image
  • Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  • One or more non-volatile readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, Causing the one or more processors to perform the following steps:
  • the original video includes at least one frame of original video image
  • the recorded video includes at least one frame of image to be recognized
  • the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image
  • the original video image is determined to be a hot video image
  • Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  • FIG. 1 is a schematic diagram of an application environment of a hotspot video annotation processing method in an embodiment of the present application
  • FIG. 2 is a flowchart of a hotspot video annotation processing method in an embodiment of the present application
  • FIG. 3 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 4 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 5 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 6 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 7 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 8 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 9 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 10 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a hotspot video annotation processing device in an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a computer device in an embodiment of the present application.
  • the hotspot video annotation processing method provided by the embodiment of the present application can be applied in the application environment shown in FIG. 1.
  • the hotspot video annotation processing method is applied in a video playback system.
  • the video playback system includes a client and a server as shown in FIG. 1, and the client and the server communicate through a network to implement hotspot video clips of the original video Automatic tagging improves the efficiency of hotspot video clip annotation, and implements personalized recommendation and sorting display of hotspot video clips.
  • the client is also called the user, which refers to the program corresponding to the server to provide local services for the client.
  • the client can be installed on but not limited to various personal computers, notebook computers, smart phones, tablets and portable wearable devices.
  • the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • a hotspot video annotation processing method is provided.
  • the method is applied to the server in FIG. 1 as an example for illustration, including the following steps:
  • S201 Obtain the user's recorded video collected while the client plays the original video.
  • the original video includes at least one original video image
  • the recorded video includes at least one image to be recognized.
  • the original video refers to a video played by a video playback program (that is, a client) installed on a terminal device such as a user's mobile phone and computer, for viewing by the user.
  • Recorded video refers to real-time shooting of the user's facial expression changes while watching the original video through the shooting module (such as a built-in camera) of the terminal device installed with the video playback program.
  • the original video includes at least one frame of original video image, and the original video image is a single frame image forming the original video, that is, a single image frame of the smallest unit in the original video.
  • Each original video image carries a playback timestamp, which is the timestamp of the original video image in the original video, for example, the playback timestamp of the 100s original video image in the 10min original video is 100s.
  • the recorded video includes at least one frame of image to be recognized, and the image to be recognized is a single frame image that forms the recorded video, that is, a single image screen of the smallest unit in the recorded video.
  • Each image to be recognized corresponds to a recording timestamp, which is the timestamp of the image to be recognized in the recorded video, for example, the playback timestamp of the 100s-th image to be recognized in the 10-min recorded video is 100s.
  • the recording timestamp is associated with the playback timestamp carried by the original video image, so that the image to be recognized corresponds one-to-one with the original video image, which is convenient for accurately determining the user's emotion when watching the original video.
  • each original video carries a unique video identifier, which is used to uniquely identify the corresponding original video, for example, the original video corresponding to episode XX of "XX", carries a unique video identifier XX0001, so that the server can
  • the video ID is XX0001, and the original video corresponding to episode XX of the corresponding "XX” can be obtained.
  • the playback timestamp carried by each original video image is the timestamp of the original video image in the original video.
  • the server while receiving the same original video played by the client, acquires a recorded video corresponding to the change in the expression of the original video watched by all users through a shooting module (such as a built-in camera) installed in the terminal device of the client,
  • the recorded video includes at least one frame of image to be identified, and each image to be identified corresponds to a recording time stamp, which is associated with the playback time stamp carried by the original video image. Understandably, by collecting the recorded video when different users watch the original video, it can better determine whether the original video attracts the audience, thereby helping to automatically mark the hot video segments in the original video and improve the hot video The efficiency of segment annotation.
  • obtaining the user's recorded video collected while the client plays the original video includes: (1) controlling the client to play the original video so that the playback timestamp of each original video image in the original video is Current system time association. (2) Obtain the user's recorded video collected while the client plays the original video, so that the recording timestamp of each image to be identified in the recorded video is associated with the current system time. (3) Based on the current system time, associate the recording timestamp of each image to be identified with the playback timestamp of an original video image.
  • the current system time is the current time of the system at any moment, for example, the current system time can be obtained by the currentTimeMillis method in the System class.
  • the playback timestamp of the original video corresponds to the recording timestamp of the recorded video, that is, the first frame of the original video image corresponds to the first frame of the image to be identified, So that the image to be recognized can reflect the micro expression of the user when viewing the corresponding original video image.
  • the playback time of the original video is not synchronized with the recording time of the recorded video, it is necessary to correlate the playback timestamp of the original video with the recording timestamp of the recorded video through the current system time, so that the associated image to be recognized can be reflected The micro expression of the user when viewing the corresponding original video image.
  • the time of the original video playback and the recorded video is related to the current system time, that is, if the first video is played at 10:5:10 1000 frames of the original video image, and the 10th frame of the image to be recognized is recorded at 10:5:10, the playback timestamp of the 1000th frame of the original video image is associated with the 10th frame of the image to be recognized.
  • S202 Recognize each image to be recognized using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the image to be recognized.
  • the micro-expression recognition model is a model for recognizing the micro-expression of the human face in the image to be recognized.
  • the micro-expression recognition model is to capture the local features of the user's face in the image to be recognized, and determine each target facial action unit of the human face in the image to be recognized according to the local features, and then according to the recognized target face The action unit determines the model of its micro-expression.
  • the instantaneous emotion value corresponding to the image to be recognized is the emotion value corresponding to the micro-expression type of the face in a certain image to be recognized by using the micro-expression recognition model.
  • the server first uses a micro-expression recognition model to perform micro-expression recognition on each image to be identified to determine its corresponding micro-expression type, and then queries the emotion value comparison table according to the micro-expression type to obtain the corresponding Instant mood value.
  • the micro-expression types include, but are not limited to, love, interest, surprise, expectation... aggressiveness, conflict, insult, suspicion, and fear.
  • the instantaneous emotion value of the face in the image to be recognized is obtained.
  • the micro-expression recognition model can quickly obtain the instantaneous emotion value when different users watch each original video image in the same original video, so as to analyze the hot video segment based on the instantaneous emotion value, so as to achieve the purpose of automatically tagging the hot video segment.
  • the micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on a local binary pattern (LBP).
  • the micro-expression recognition model is a local recognition model based on classification.
  • the training image data includes positive samples of each facial action unit and facial action unit. Negative samples are used to train the training image data through classification algorithms to obtain a micro-expression recognition model.
  • a large amount of training image data may be trained through an SVM classification algorithm to obtain SVM classifiers corresponding to multiple facial action units.
  • it may be 39 SVM classifiers corresponding to 39 face action units, or 54 SVM classifiers corresponding to 54 face action units, and the positive samples of different face action units included in the training image data for training The more negative samples, the more SVM classifiers are obtained. Understandably, in multiple micro-expression recognition models formed by multiple SVM classifiers, the more SVM classifiers it acquires, the more accurate the micro-expression types recognized by the formed micro-expression recognition model. Take the micro-expression recognition model formed by the SVM classifier corresponding to 54 facial action units as an example. Using this micro-expression recognition model, 54 types of micro-expressions can be identified, for example, including love, interest, surprise, expectation ... 54 types of micro-expressions such as aggression, conflict, insult, doubt and fear.
  • S203 Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp.
  • the intense emotion probability is a probability for evaluating the motivated emotion of different to-be-recognized images watching the same original video. Understandably, if the probability of intense emotion is high, it means that the user's mood for watching the original video fluctuates greatly, and the original video has a strong attraction to the user.
  • the server first obtains the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the playback timestamp according to the playback timestamp corresponding to each original video image, so as to obtain all the viewing time corresponding to the playback timestamp
  • the instantaneous emotion value of the user of the original video image determines whether it is intense emotion, thereby analyzing the probability of intense emotion when all users watch the original video image, so that the probability of intense emotion can be objective Reflects the degree to which the user watching the same original video likes the original video or the degree of resonance.
  • the first probability threshold is a preset probability threshold for evaluating whether the original video is a hot video image.
  • the preset probability threshold may be set to 60%. If the intense emotion probability is greater than the first probability threshold, it means that a large percentage of all users who viewed the original video image (that is, greater than the first probability threshold) caused strong emotional fluctuations in watching the original video image ( That is, the emotion corresponding to the instantaneous emotion value is intense emotion), which has a higher attraction to the user, so the original video image can be determined as a hot video image.
  • S205 Hot-spot the original video based on the hot-spot video image to obtain hot-spot video clips.
  • the server may form an original video segment based on any two hotspot video images, and then based on the total number of frames of all the original video images in the original video segment and the preset frame number threshold Make a comparison to determine whether the original video clip is a hot video clip, automatically mark the original video image corresponding to the hot video image, and mark the hot video clip in the original video to realize the hot video in the original video
  • the automatic labeling of clips improves the annotation efficiency of hot video clips.
  • the hotspot video annotation processing method provided in this embodiment collects the user's recorded video while playing the original video, so that the recording timestamp of each image to be identified is associated with the playback timestamp of an original video image to ensure The objectivity of micro expression analysis of the original video. Then, the micro-expression recognition model is used to recognize the image to be recognized, and the micro-expression recognition model can quickly identify the micro-expression when the user views an original video image in the original video to obtain the intense emotion value of the user watching the original video, so as to be based on the intense The emotion value realizes the hotspot video annotation, thereby ensuring the objectivity of the hotspot video clip annotation.
  • the hotspot annotation of the video is subdivided into hotspot analysis of the original video image to ensure the objectivity and accuracy of the hotspot analysis.
  • the original video is hotspot annotated, and hotspot video clips are obtained to calculate the probability of intense emotion when the user watches the original video, so that the server can obtain hotspot video clips, so that the hotspot video clips are automatically marked, and the hotspot video clips are marked up.
  • Efficiency and accuracy provide users with a better viewing experience.
  • step S202 a micro-expression recognition model is used to identify each image to be recognized, and the instantaneous emotion value corresponding to the image to be recognized is obtained, including:
  • S301 Recognize each image to be recognized by using a micro-expression recognition model to obtain the instantaneous probability corresponding to at least one type of recognized expression.
  • the recognition expression type refers to a model that recognizes that it belongs to a certain pre-configured micro expression type when the image to be recognized is recognized by using a micro expression recognition model.
  • the micro-expression recognition model pre-trained by the server includes multiple SVM classifiers, and each SVM classifier is used to identify a facial action unit.
  • the micro-expression recognition model includes 54 SVM classifiers to establish a facial action unit number mapping table, and each facial action unit is represented by a predetermined number. For example, AU1 is the inner eyebrow lift, AU2 is the outer eyebrow lift, AU5 is the upper eyelid lift, and AU26 is the lower jaw opening.
  • Each facial action unit has a corresponding SVM classifier trained.
  • the SVM classifier corresponding to the inner eyebrow can be identified as the probability that the inner feature of the inner eyebrow is belonged to the inner eyebrow, and the SVM classifier corresponding to the outer eyebrow can be identified that the local feature of the outer eyebrow is belonged to the outer eyebrow. Probability values, etc.
  • the server when it uses a pre-trained micro-expression recognition model to recognize the image to be recognized, it may first perform key point detection and feature extraction on each image to be recognized to obtain local features of the image to be recognized.
  • the face key point algorithm can be, but not limited to, Ensemble of Regression Tress (ERT) algorithm, SIFT (scale-invariant feature) transform algorithm, SURF (Speeded UpRobust Features) algorithm, LBP (Local Binary Patterns) algorithm and HOG (Histogram of Oriented Gridients) algorithm.
  • the feature extraction algorithm may be a CNN (Convolutional Neural Network) algorithm.
  • the target facial action unit refers to the facial action unit (Action Unit, AU) obtained by recognizing the image to be recognized according to the micro-expression recognition model.
  • the probability value may specifically be a value between 0-1.
  • the output probability value is 0.6 and the preset threshold value is 0.5
  • the probability value 0.6 is greater than the preset threshold value 0.5
  • the facial action unit corresponding to 0.6 is used as the image to be recognized Target facial action unit.
  • all the acquired target facial action units are comprehensively evaluated to obtain the probability corresponding to the micro-expression type pre-configured in the micro-expression recognition model, that is, the instantaneous probability belonging to each type of recognized expression.
  • the comprehensive evaluation of all the acquired target facial action units specifically refers to obtaining the probability that this combination belongs to a pre-configured micro-expression type based on the combination of all target facial action units to determine the instantaneous probability of identifying the expression type.
  • S302 Determine the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized.
  • the recognized expression type with the largest instantaneous probability needs to be determined as the micro expression type corresponding to the image to be recognized. For example, when it is recognized that the image to be recognized belongs to the recognition expression type of "love”, the instantaneous probability is 0.9, while the instantaneous probability of the two recognition expression types of "doubt" and "quiet” are 0.05, respectively, then the instantaneous probability The identified expression type corresponding to a probability of 0.9 is determined as the micro-expression type of the image to be recognized, so as to ensure the accuracy of the identified micro-expression type.
  • S303 Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.
  • the emotion value comparison table is a preset data table for recording the emotion attribute corresponding to each micro-expression type.
  • the emotion value comparison table the association relationship between the micro-expression type and the emotion value is stored.
  • the server queries the emotion value comparison table based on the micro-expression type to obtain the corresponding instantaneous emotion value.
  • the instantaneous emotion value is a value between [-1,1], the larger the value, the more the user likes the original video image corresponding to the recording timestamp associated with the image to be recognized; the smaller the data, the more the user hates the treatment Identify the original video image corresponding to the recording timestamp associated with the image.
  • the instantaneous emotion values corresponding to the 54 micro-expression types identified by the micro-expression recognition model can be set to 1, 0.8, 0.5, 0.3, 0, -0.3, -0.5, -0.8, and -1, respectively. Any of them.
  • the hotspot video annotation processing method provided in this embodiment first uses a micro-expression recognition model to recognize the image to be recognized, so as to quickly obtain the instantaneous probability corresponding to at least one recognized expression type, and selects the identified expression type with the largest instantaneous probability to determine to be recognized The micro-expression type of the image to ensure the accuracy of the identified micro-expression type. Then query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized, so as to ensure the efficiency of acquiring the instantaneous emotion value of the image to be recognized.
  • the server may query the database based on the instantaneous emotion value to obtain the standard volume or standard tone corresponding to the instantaneous emotion value; and obtain the client currently playing the to-be-recognized
  • the current volume or current color tone of the image based on the standard volume or standard color tone, automatically adjust the current volume and current color tone respectively, so that the current volume and current color tone of the image to be recognized match the user's current mood, you can make the video
  • the volume or hue of the match with the user's mood at the time it is easier to cause empathy, thereby increasing the appeal of the original video to the user.
  • step S203 the original video image corresponding to the playback timestamp is determined according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp
  • the steps of intense emotion probability include:
  • S401 Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp.
  • the total number of images is the sum of the images to be recognized corresponding to all users who have collected the original video image and collected by the server. Specifically, when annotating a hot video segment of any original video, it is necessary to obtain all recorded videos corresponding to viewing the original video, and to count the images to be identified corresponding to all the recording time stamps associated with the playback time stamp corresponding to the same original video image The number is determined as the total number of images. For example, for an original video with a video ID of XX0001, a certain original video image is an original video image with a playback timestamp of the 10th second in the original video, and the number of all images to be recognized associated with the 10th original video image is The total number of images.
  • the preset emotion threshold is a preset threshold for evaluating whether the instantaneous emotion value is intense emotion.
  • the preset emotion threshold may be set to 0.6 or other values.
  • the server compares the absolute value of the instantaneous emotion value corresponding to the image to be recognized with the preset emotion threshold, if the absolute value is greater than the preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion; otherwise, if the absolute If the value is not greater than the preset emotion threshold, the emotion attribute of the image to be recognized is plain emotion. That is, the micro-expression recognition model recognizes that the instantaneous emotion value corresponding to each image to be recognized is a value between [-1,1].
  • the micro-expression emotions can be considered as intense emotions. Such intense emotions easily resonate with users and have strong appeal.
  • the absolute value of the instantaneous emotion value is close to 0, it means that the user’s preference or dislike of the original video image in the original video being watched is smaller, indicating that the original video image does not resonate with the user.
  • the micro-expression emotion can be regarded as a dull emotion.
  • S403 Count the images to be recognized corresponding to all the recording timestamps associated with the same playback timestamp, and the number of images to be recognized whose emotion attribute is intense emotion is the number of intense emotions.
  • the server determines the number of all images to be recognized that have an emotion of intense emotion value from the images to be recognized corresponding to all recorded time stamps associated with the same playback time stamp, and determines the number as the number of intense emotions. For example, if 100 users watch the original video image corresponding to a certain playback timestamp in the same original video at the same time, then obtain 100 to-be-recognized images corresponding to all the recording timestamps associated with the same playback timestamp and use the micro-expression recognition model to identify The instantaneous emotion values of all 100 images to be identified, and whether or not they are intense emotions is determined based on the instantaneous emotion values, and the number of images to be recognized whose emotional attribute is intense emotions is determined as the intense emotion quantity, in which case the intense emotion quantity is 0 Values between -100.
  • S404 Calculate the total number of images and the number of intense emotions using the intense emotion probability formula to determine the intense emotion probability of the original video image corresponding to the playback timestamp.
  • the server may quickly calculate the intense emotion probability using the intense emotion probability formula.
  • the intense emotion probability reflects the probability of causing strong emotion fluctuations to the original video image among all users who viewed the original video image, which can well reflect the attractiveness of the original video image to the user or the degree of resonance caused by the user.
  • the hotspot video annotation processing method first obtain the total number of images of all the images to be identified corresponding to the same playback timestamp, and determine from the images to be identified corresponding to the same playback timestamp that the emotional attribute is intense.
  • the number of intense emotions is calculated using the intense emotion probability formula, which makes the acquisition of intense emotion probabilities more objective and can intuitively show the attractiveness of the original video image to the user.
  • step S205 the original video is hot-spot-marked based on the hot-spot video image to obtain hot-spot video clips, including:
  • S501 Count the number of frames of the original video clip formed between any two hot-spot video images, and determine the number of frames of the video clip.
  • the frame number of the video clip refers to the total number of frames of the original video clip formed between the two hot video images.
  • the number of frames of the original video clip formed between any two hotspot video images is counted and determined as the number of frames of the video clip. Since the original video clip contains two hotspot video images, so The number of video clip frames is at least two. For example, if the 20th original video image and the 40th original video image in the original video are hotspot video images, it is determined that the number of video clip frames of the original video clip formed between the two hotspot video images is 21 frames.
  • the first frame number threshold refers to a preset threshold for determining whether the original video clip is the minimum value of the time interval of the hot video clip.
  • the first frame number threshold is set independently, and its value is generally relatively small.
  • the threshold of the first frame number is set to 120 frames, and the frame rate of the original video playback is generally 24 frames/second, therefore, the original video segment that can be determined by it is an original video segment of 5 seconds. If the frame number of the video clip is less than the threshold of the first frame number, it means that the interval between the original video clips of the two adjacent hotspot video images is short, and the original video clips cause the user's intense emotional value in a short time and attract the user's attention , Then the original video clip is determined as a hot video clip.
  • the second frame number threshold refers to a preset threshold for determining whether the segment video is the maximum time interval of the hot video segment.
  • the second frame number threshold is set larger. For example, when the first frame number threshold is set to 120 frames, the second frame number threshold can be set to 1200 frames. If the playback frame rate of the original video is 24 frames/second, therefore, It can be determined that the original video segment formed between the two hotspot video images is a 50-second original video segment.
  • the playback timestamp corresponding to each original video image in the 50-second original video clip obtain the emotion fluctuation probability of the image to be identified associated with the playback timestamp; if the emotion fluctuation probability is large, it means that the original video clip caused the user Is more likely to have intense emotions; on the contrary, if the probability of emotional fluctuation is small, it means that the original video clip is less likely to cause intense emotions of the user.
  • the emotion fluctuation probability refers to the probability of causing a large emotion fluctuation during the user watching the original video clip, where the large emotion fluctuation can understandably change from overjoy to great compassion or other emotion change processes.
  • the second probability threshold is a probability-related threshold set for evaluating hot-spot video clips based on the fluctuation emotion probability. Understandably, if the fluctuation emotion probability of an original video clip is greater than the second probability threshold, it means that the original video clip causes a strong emotional fluctuation of the user, attracts the user's attention, and can be determined as a hot video clip.
  • the video clip frame number of the original video clip formed between two hotspot video images is first obtained. If the video clip frame number is less than or equal to the first frame number threshold, then directly The frame number of the video clip is a hot video clip. If the video clip frame number is greater than the first frame number threshold and less than or equal to the second frame number threshold, you need to obtain the fluctuation emotion probability of the original video clip, and then compare the fluctuation emotion probability of the clip video with the second probability threshold Determine whether the original video clip is a hot video clip.
  • the video clip frame number and fluctuation emotion probability of the original video clip formed between the two hot-spot video images are hot-spot video clips, thereby automatically tagging the hot-spot video clips in the original video, and Ensure the objectivity of the marked hot video clips.
  • step S503 that is, based on the playback timestamp corresponding to the original video segment, acquiring the fluctuating emotion probability corresponding to the original video segment includes:
  • the server intercepts the recorded video segment associated with the playback timestamp in the recorded video from the recorded video corresponding to the original video according to the playback timestamp of the original video segment, so as to identify the image to be recognized of the recorded video segment. For example, if the playback timestamp of the original video segment in an original video is 10-50 seconds, then from the recorded video corresponding to the original video, the interception of the recording timestamp corresponds to the playback timestamp 10-50 seconds Recorded video clips, so that each image to be recognized in the recorded video clips can reflect the user's facial expression changes when viewing the original video clips.
  • S602 Obtain the instantaneous emotion value corresponding to each image to be recognized in the recorded video segment.
  • step S202 Since in step S202, a micro-expression recognition model has been used to identify each image to be recognized in all recorded videos and obtain the corresponding instantaneous emotion value, therefore, this step can directly obtain each image to be recognized in the recorded video clip The corresponding instantaneous emotion value does not need to be re-identified to improve the efficiency of acquiring the instantaneous emotion value.
  • S603 Calculate the instantaneous emotion value corresponding to all the images to be recognized in the recorded video clip using the standard deviation formula to obtain the standard deviation of the emotion value; the standard deviation formula is Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized, It is the average value of all instantaneous emotion values x i in the recorded video clip.
  • the standard deviation of the emotion value refers to the standard deviation of the instantaneous emotion value when the user views all the images to be recognized in the original video clip, which can objectively reflect the mood fluctuation of the user when viewing the original video clip. Understandably, if the instantaneous emotion value of each user is used to calculate the standard deviation of the emotion value, the hot video segment determined by the standard deviation of the emotional value greater than the preset standard deviation is the hot video segment concerned by the user. If the average sentiment value of all users who have viewed this original video clip is used to calculate the standard deviation of the sentiment value, the hot video segment determined based on the sentiment value standard deviation being greater than the preset standard deviation is the hot video segment that all users are concerned about .
  • the standard deviation threshold is a value preset by the server, and the standard deviation threshold can be set independently by the user according to requirements.
  • the standard deviation of the sentiment value of an original video clip is greater than the standard deviation threshold, it means that the user's emotional fluctuations are large when viewing the original video clip, which may be from great joy to great compassion, or from great compassion to great joy
  • the recorded video clip is a mood swing video clip. This emotional fluctuation is reflected by the standard deviation of the emotional value, which can objectively reflect the user's emotional changes during watching the original video clip.
  • S605 Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip.
  • the fluctuating emotion probability can intuitively express the mood fluctuation of the user when viewing the original video clip. If the user views the original video, the greater the number of mood swing video clips, the greater the fluctuating mood probability, it means that the original video clip can Resonate with user emotions.
  • the number D of recorded video clips is the number of recorded video clips from which the same original video clip is viewed from the recorded video corresponding to all users, which can be understood as all the original video clips viewed and recorded to the user The number of users whose facial expressions change.
  • the number C of emotional fluctuation video clips is the number D of recorded video clips, and the standard deviation of the sentiment value is greater than the standard deviation threshold.
  • the instantaneous emotion value corresponding to each image to be recognized in the recorded video clip is obtained, and the standard deviation of the emotion value is calculated using a standard deviation formula to determine whether each recorded video clip is Emotional fluctuation video clips to determine the emotional fluctuation video clips that can cause strong emotion fluctuations; then the number of emotional fluctuation video clips and the number of recorded video clips are calculated to obtain the fluctuation emotion probability of the original video clip to achieve The probability reflects the emotional fluctuation of all users watching the original video clip.
  • each recorded video is associated with a user ID, which is an identifier used to uniquely identify the user's identity in the video playback system.
  • the hotspot video annotation processing method further includes:
  • S701 Based on the playback timestamp corresponding to the hot video segment, intercept the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID.
  • the target video segment is a recorded video segment corresponding to the playback timestamp of the recorded video corresponding to the user ID and the hot video segment.
  • the server obtains the recorded video clip corresponding to the playback timestamp of the hot video clip in the recorded video corresponding to the user ID according to the playback time stamp corresponding to the hot video clip, and determines the acquired recorded video clip as Target video clip.
  • S702 Acquire the instantaneous emotion value corresponding to each image to be recognized in the target video segment.
  • step S202 a micro-expression recognition model has been used to identify each image to be recognized in all recorded videos and obtain the corresponding instantaneous emotion value. Therefore, this step can directly obtain each image to be recognized in the target video segment The corresponding instantaneous emotion value does not need to be re-identified to improve the efficiency of acquiring the instantaneous emotion value.
  • S703 Query the emotion tag comparison table based on the instantaneous emotion value to obtain a single frame of emotion tags corresponding to the image to be recognized.
  • the emotion tag comparison table is a preset comparison table for recording the emotion tags corresponding to each instantaneous emotion value. Since the instantaneous emotion value is set to any one of 1, 0.8, 0.5, 0.3, 0, -0.3, -0.5, -0.8, and -1, and each instantaneous emotion value can correspond to at least one micro-expression type, therefore, An emotion label may be determined according to each instantaneous emotion value, or according to the size of the instantaneous emotion value, and a preset rule for dividing the emotion label, so that each instantaneous emotion value corresponds to an emotion label. For example, the emotion label can be divided into emotion labels such as joy, anger, ...
  • each level of emotion corresponds to a range of emotion values.
  • the single-frame emotion label refers to the emotion label corresponding to the instantaneous emotion value corresponding to the image to be recognized in the emotion label comparison table. That is, according to the instantaneous emotion value of the user in each image to be recognized, a single frame of emotion labels corresponding to the instantaneous emotion value is queried in the emotion label comparison table, so as to determine that the user determines the corresponding original video image according to the single frame of emotion labels Degree of preference.
  • S704 Based on the single-frame emotion tag corresponding to the image to be recognized, obtain the segment emotion tag corresponding to the target video segment.
  • the single-frame emotion tag corresponding to each image to be recognized can reflect each original video image in the target video segment viewed by the user
  • the emotional tags of the clip when the user views the target video clip can be obtained.
  • a single-frame emotion label with the largest number may be selected from the single-frame emotion labels of all the images to be recognized as the segment emotion identifier.
  • S705 If the clip emotion tag is a preset emotion tag, query the user portrait database based on the user ID, obtain the user tag corresponding to the user ID, determine the target user based on the user tag, and push the hot video clip to the client corresponding to the target user .
  • the user tag is based on the user ID to query the user portrait database, and the acquired gender, age, occupation, interest, or other preset tags in the user portrait database corresponding to the user ID are obtained.
  • the target user refers to a user who has the same preferences as the original video obtained by the server and the user ID.
  • the user profile database can be queried based on the user ID to obtain a user tag corresponding to the user ID, and then the target user can be quickly obtained based on the user tag, so as to facilitate the push of the target user's favorite hot video clip.
  • the preset emotion tags are preset tags that can be used for video push. For example, if the preset emotional tag is a hi tag or a level 1 tag, and the server recognizes that the segment emotional tag of a target video clip is a level 1 tag, then the corresponding hot video clip is deemed to be more attractive to the user corresponding to the user ID , Hotspot video clips can be pushed to target users with the same user tags (that is, with the same preferences) corresponding to the user ID to ensure the attractiveness of the hotspot video clips to the target users.
  • the target video segment corresponding to the playback timestamp of the hotspot video segment is intercepted from the recorded video, and a single frame of emotion tags corresponding to each image to be identified is obtained to determine
  • the emotional tag of the segment corresponding to the target video segment may reflect the preference of the user corresponding to the user ID in the process of watching the hot video segment.
  • query the user portrait database based on the user ID to obtain the user tag of the user, so as to determine the target user with the same user tag that the user corresponding to the user ID has, so that the target user has the same preferences as the user corresponding to the user ID.
  • the clip emotion tag is a preset emotion tag
  • each recorded video is associated with a user ID, which is an identifier used to uniquely identify the user's identity in the video playback system.
  • the hotspot video annotation processing method further includes:
  • step S801 The specific implementation process of step S801 is the same as that of step S701. In order to avoid redundant description, details are not described here one by one.
  • S802 Obtain the instantaneous emotion value corresponding to each image to be recognized in the target video segment.
  • step S802 The specific implementation process of step S802 is the same as that of step S702. In order to avoid redundant description, details are not described here one by one.
  • S803 Query the emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized.
  • step S803 The specific implementation process of step S803 is the same as that of step S703. To avoid redundant description, details are not described here one by one.
  • step S804 The specific implementation process of step S804 is the same as that of step S704. To avoid redundant description, details are not described here one by one.
  • S805 If the clip emotion tag is a preset emotion tag, query the video database based on the playback timestamp corresponding to the hot video clip, obtain the content tag corresponding to the hot video clip, and determine the hot video clip corresponding to the content tag as Recommend video clips, push the recommended video clips to the client corresponding to the user ID.
  • the content tag refers to the tag of the content played in the original video.
  • the content may be funny, food, fashion, travel, entertainment, life, information, parent-child, knowledge, games, cars, finance, cute pets, sports, music, Category labels such as anime, technology, and health can also be other labels that subdivide specific descriptions of video content.
  • the server determines that the segment emotion tag of the target video segment is a preset emotion tag, and determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about.
  • the server determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about.
  • the server determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about.
  • the server determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about.
  • the recommended video clip is a hot video clip that can be recommended to the user corresponding to the user ID determined based on the content tag.
  • the server queries the video database according to the content tag, obtains other hot video clips corresponding to the content tag, the hot video clip is determined to be a recommended video clip, and recommends the recommended video clip to the client of the user ID to implement automatic recommendation and Hot video clips with the same content label are given to the user corresponding to the user ID.
  • the target video segment corresponding to the playback timestamp of the hotspot video segment is intercepted from the recorded video, and a single frame of emotion tags corresponding to each image to be identified is obtained to determine
  • the emotional tag of the segment corresponding to the target video segment, the emotional tag of the segment may reflect the preference of the user corresponding to the user ID in the process of watching the hot video segment.
  • the video database based on the playback timestamp of the hotspot video clip, query the video database to determine the pre-configured content tag of the hotspot video clip, so as to determine other hotspot video clips stored by the server corresponding to the content tag as recommended video clips, and Recommend the recommended video clip to the client corresponding to the user ID, so that the recommended video clip can more easily cater to the preferences of the user corresponding to the user ID, and improve the attractiveness of the user corresponding to the user ID to the recommended video clip.
  • the hotspot video annotation processing method further includes:
  • the hot-spot video frame rate refers to the probability that the number of frames of all hot-spot video clips in an original video occupies the number of frames of the entire original video.
  • the server obtains the number of frames of an original video, and then counts the number of frames of all hot video segments in the original video, and divides the number of frames of all hot video segments by the number of frames of the original video to obtain the corresponding Hot video frame rate.
  • the number of frames of the original video is 10000, that is, the original video contains 10000 original video images
  • the frame number of the first hotspot video clip is 1000
  • the frame number of the second hotspot video clip is 2000
  • the original video image can objectively reflect the attractiveness of the original video to users.
  • S902 Sort the original video based on the hot video frame rate corresponding to the original video and display it on the client according to the sorting result.
  • the server sorts the display position of the original video on the client according to the order of the hot video frame rate from high to low, so that the user can watch the original video with a higher hot video frame rate, so that the user can choose according to the hot video frame rate Watching, so as to improve the user's playback volume of the original video displayed by the video playback system.
  • the hotspot video annotation processing method After obtaining the hotspot video frame rate of each original video, the original video is sorted and displayed on the user's client, so that the user can selectively watch more attractive Original video to increase the playback volume of the original video displayed by the video playback system.
  • step S901 based on the hotspot video segment, the hotspot video frame rate corresponding to the original video is counted, including:
  • S1001 Count the number of original video images in each hot video segment, and determine the total number of frames of the hot video segment.
  • the total frame number of the hot video clip refers to the total frame number of all the hot video clips in the same original video.
  • an original video has 6 hotspot video clips.
  • the server counts the total number of frames of the 6 hotspot video clips as the total number of hotspot video clips.
  • S1002 Count the number of original video images in the original video and determine the total number of video frames of the original video.
  • the server counts the number of original video images in the original video and determines the total number of video frames of the original video, that is, the total number of video frames is the number of all original video images in the original video.
  • the total number of video frames of the original video may be determined according to the product of the playback frame rate and the playback duration of the original video, so as to quickly determine the total number of frames of the original video,
  • the hotspot video frame rate formula is used to calculate the total frame number of the hotspot video clip and the original video video frame to obtain the hotspot video frame rate corresponding to the original video.
  • the hotspot video frame rate formula is Where, Z is the hot spot video frame rate, w j is the total frame number of the jth hot spot video segment, m is the number of hot spot video segments, and K is the total video frame number of the original video.
  • the server can determine the total frame number of the hot video segment and the total video frame of the original video, and can quickly calculate the frame rate of the hot video using the formula of the frame rate of the hot video, based on the original frame of the hot video
  • the videos are sorted so that users can selectively watch the original video with a higher frame rate of the hotspot video and improve the playback volume of the original video.
  • the hotspot video frame rate formula is used to calculate the hotspot video frame rate corresponding to the original video, so that According to the hot video frame rate to reflect the attractiveness of the original video to the user, so as to sort, in order to improve the playback volume of the original video.
  • a hotspot video annotation processing device is provided, and the hotspot video annotation processing device corresponds one-to-one to the hotspot video annotation processing method in the foregoing embodiment.
  • the hotspot video annotation processing device includes a recorded video acquisition module 1101, an instant emotion value acquisition module 1102, an intense emotion probability determination module 1103, a hotspot video image determination module 1104, and a hotspot video segment acquisition module 1105.
  • the detailed description of each functional module is as follows:
  • the recorded video obtaining module 1101 is used to obtain the user's recorded video collected while the client plays the original video.
  • the original video includes at least one frame of original video image, and the recorded video includes at least one frame of image to be recognized.
  • the recording timestamp is associated with the playback timestamp of an original video image.
  • the instantaneous emotion value acquisition module 1102 is used to identify each image to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the image to be recognized.
  • the intense emotion probability determination module 1103 is used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp.
  • the hotspot video image determination module 1104 is configured to determine the original video image as a hotspot video image if the intense emotion probability is greater than the first probability threshold.
  • the hotspot video clip acquisition module 1105 is configured to perform hotspot annotation on the original video based on the hotspot video image to obtain hotspot video clips.
  • the instantaneous emotion value acquisition module 1102 includes an instantaneous probability acquisition unit, a micro-expression type determination unit, and an instantaneous emotion value acquisition unit.
  • the instantaneous probability acquisition unit is used to identify each image to be recognized by using a micro-expression recognition model to acquire the instantaneous probability corresponding to at least one type of recognized expression.
  • the micro-expression type determination unit is used to determine the identified expression type with the largest instantaneous probability as the micro-expression type of the image to be recognized.
  • the instantaneous emotion value acquisition unit is used to query an emotion value comparison table based on the micro-expression type to acquire the instantaneous emotion value of the image to be recognized.
  • the intense emotion probability determination module 1103 includes a total number of image statistics unit, an intense emotion judgment unit, an intense emotion quantity statistical unit, and an intense emotion probability determination unit.
  • the total number of image counting unit is used to count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp.
  • the intense emotion judgment unit is configured to: if the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion.
  • the intense emotion quantity counting unit is used to count the images to be recognized corresponding to all the recording timestamps associated with the same playback timestamp, and the number of images to be recognized whose emotion attribute is intense emotion is the number of intense emotions.
  • the hotspot video clip acquisition module 1105 includes a video clip frame number counting unit, a first hotspot video clip determination unit, a fluctuation emotion probability acquisition unit, and a second hotspot video clip determination unit.
  • the video clip frame number counting unit is used to count the number of frames of the original video clip formed between any two hot-spot video images and determine the frame number of the video clip.
  • the first hotspot video segment determination unit is configured to determine the original video segment as a hotspot video segment if the video segment frame number is less than or equal to the first frame number threshold.
  • Fluctuation mood probability acquisition unit used to obtain the fluctuation mood probability corresponding to the original video clip based on the playback timestamp corresponding to the original video clip if the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold .
  • the second hotspot video segment determination unit is configured to determine the original video segment as a hotspot video segment if the fluctuation emotion probability is greater than the second probability threshold.
  • the fluctuation emotion probability acquisition unit includes a recorded video clip interception subunit, an instant emotion value acquisition subunit, an emotion value standard deviation acquisition subunit, an emotion fluctuation video clip determination subunit, and a fluctuation emotion probability calculation subunit.
  • the recorded video clip interception subunit is used to intercept the recorded video clip corresponding to the playback timestamp from the recorded video corresponding to the original video based on the playback timestamp corresponding to the original video clip.
  • the instantaneous emotion value acquisition subunit is used to acquire the instantaneous emotion value corresponding to each image to be identified in the recorded video segment.
  • Emotion value standard deviation acquisition subunit used to calculate the instantaneous emotion value corresponding to all the images to be recognized in the recorded video clip using the standard deviation formula to obtain the standard deviation of the emotion value
  • the standard deviation formula is Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized, It is the average value of all instantaneous emotion values x i in the recorded video clip.
  • the emotional fluctuation video clip determination subunit is used to record a video clip as an emotional fluctuation video clip if the standard deviation of the emotional value is greater than the standard deviation threshold.
  • each recorded video is associated with a user ID; after the hotspot video clip acquisition module 1105, the hotspot video tagging device further includes a target video clip interception module, a target emotion value acquisition module, a single-frame emotion tag acquisition module, a clip emotion Tag acquisition module, target user determination module and hotspot video clip pushing module.
  • the target video clip interception module is used to intercept the target video clip corresponding to the playback timestamp from the recorded video corresponding to the user ID based on the playback timestamp corresponding to the hot video clip.
  • the target emotion value acquisition module is used to acquire the instantaneous emotion value corresponding to each image to be identified in the target video segment.
  • the single-frame emotion label acquisition module is used to query the emotion label comparison table based on the instantaneous emotion value and obtain the single-frame emotion label corresponding to the image to be recognized.
  • the segment emotion tag acquisition module is used to acquire the segment emotion tag corresponding to the target video segment based on the single frame emotion tag corresponding to the image to be recognized.
  • the first video segment pushing module is used to query the user portrait database based on the user ID if the emotional tag of the segment is a preset emotional tag, obtain the user tag corresponding to the user ID, determine the target user based on the user tag, and push the hot video clip To the client corresponding to the target user.
  • the second video clip push module is used to query the video database based on the playback timestamp corresponding to the hotspot video clip if the clip's emotion tag is the preset emotion tag, and obtain the content tag corresponding to the hotspot video clip, which will be related to the content tag
  • the corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.
  • the hotspot video annotation processing device further includes a hotspot video frame rate statistics module and an original video sorting module.
  • the hotspot video frame rate statistics module is used to calculate the hotspot video frame rate corresponding to the original video based on the hotspot video clips.
  • the original video sorting module is used to sort the original video based on the hot video frame rate corresponding to the original video and display it on the client according to the sorting result.
  • the hotspot video frame rate statistics module includes a total frame number determination unit for the clip, a total video frame number determination unit, and a hotspot video frame rate acquisition unit.
  • the total frame number determining unit of the clip is used to count the number of original video images in each hot video segment and determine the total frame number of the hot video segment.
  • the total video frame number determining unit is used to count the number of original video images in the original video and determine the total number of video frames of the original video.
  • Hotspot video frame rate acquisition unit used to calculate the total frame number of the hotspot video clip and the total video frame of the original video using the hotspot video frame rate formula, to obtain the hotspot video frame rate corresponding to the original video, and the hotspot video frame rate formula for Where, Z is the hot spot video frame rate, w j is the total frame number of the jth hot spot video segment, m is the number of hot spot video segments, and K is the total video frame number of the original video.
  • Each module in the above hotspot video annotation processing device may be implemented in whole or in part by software, hardware, and combinations thereof.
  • the above modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 12.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store data used or generated during the execution of the above hot-spot video annotation processing method, such as the number of original video images.
  • the network interface of the computer device is used to communicate with external terminals through a network connection. When the computer readable instructions are executed by the processor to implement a hotspot video annotation processing method.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions
  • the hot spots in the above embodiments are implemented Video annotation processing methods, such as steps S201-S205 shown in FIG. 2 or steps shown in FIGS. 3-10, are not repeated here to avoid repetition.
  • the processor implements the functions of each module/unit in the embodiment of the hotspot video annotation processing device when executing computer-readable instructions, for example, the recorded video acquisition module 1101 shown in FIG. 11, the instant emotion value acquisition module 1102, and the intense emotion
  • the functions of the probability determination module 1103, the hotspot video image determination module 1104, and the hotspot video segment acquisition module 1105 are described here to avoid repetition.
  • a computer-readable storage medium stores computer-readable instructions.
  • the hotspot video annotation processing method in the foregoing embodiment is implemented, for example The steps S201-S205 shown in FIG. 2 or the steps shown in FIGS. 3-10 are not repeated here to avoid repetition.
  • the functions of each module/unit in the embodiment of the above-mentioned hotspot video annotation processing apparatus are realized, for example, the recorded video acquisition module 1101 shown in FIG. 11 and the instantaneous emotion value acquisition module 1102 1.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

Disclosed in the present application are a hotspot video annotation processing method and apparatus, a computer device, and a storage medium. The method comprises: obtaining a recorded video of a user collected while a client plays back an original video, the original video comprising at least one frame of an original video image, and the recorded video comprising at least one frame of an image to be identified; using a micro-expression recognition model to identify each image to be identified; and obtaining instantaneous emotion values corresponding to the images to be identified; according to the instantaneous emotion values, determining the intense emotion probability of an original video image corresponding to a playback timestamp; if the intense emotion probability is greater than a first probability threshold, determining the original video image to be a hotspot video image; and on the basis of the hotspot video image, performing hotspot annotation on the original video to obtain hotspot video clips. The described method may achieve the automatic annotation of hotspot video clips and improve the efficiency of annotating the hotspot video clips.

Description

热点视频标注处理方法、装置、计算机设备及存储介质Hotspot video annotation processing method, device, computer equipment and storage medium
本申请以2019年1月11日提交的申请号为201910025355.9,名称为“热点视频标注处理方法、装置、计算机设备及存储介质”的中国发明申请为基础,并要求其优先权。This application is based on the Chinese invention application filed on January 11, 2019, with the application number 201910025355.9, titled "Hot Spot Video Annotation Processing Methods, Devices, Computer Equipment, and Storage Media", and claims its priority.
技术领域Technical field
本申请涉及微表情识别技术领域,尤其涉及一种热点视频标注处理方法、装置、计算机设备及存储介质。The present application relates to the technical field of micro-expression recognition, in particular to a hotspot video annotation processing method, device, computer equipment, and storage medium.
背景技术Background technique
在移动互联网中,视频(尤其是网络视频)是各类移动数据流量中规模最大且发展最快的一类。所谓网络视频,是指由网络视频服务商(例如,百度爱奇艺)提供的、以流媒体为播放格式的、可以在线直播或点播的声像文件。网络视频一般需要独立的播放器,文件格式主要是基于P2P(Peer to Peer,对等网络)技术占用客户端资源较少的FLV(Flash Video,流媒体)格式。In the mobile Internet, video (especially online video) is the largest and fastest growing type of mobile data traffic. The so-called online video refers to an audio-visual file provided by an online video service provider (for example, Baidu iQiyi), using streaming media as a playback format, and can be broadcasted online or on demand. Network video generally requires an independent player, and the file format is mainly based on the P2P (Peer to Peer, peer-to-peer) technology that takes up less FLV (Flash Video, streaming media) format of client resources.
对于智能手机用户,既可以在移动网络环境下,也可以在Wi-Fi环境下收看视频流、电影、电视节目、用户自己制作的剪辑片段以及视频通话等。而为了保持视频用户的粘性,大多视频应用都增加了社交元素、地理信息和基于个性化推荐的业务形态。现有技术中,用户在观看视频过程标注热点标注,实现对用户观看视频内容的实时点评及分享,这种需要人工标注热点标注的方式,效率比较低。随着终端技术以及视频网站设计技术的不断发展,使得人们对于视频的要求变得更高,为满足人们在观看视频过程中,日益增强的个性化和便捷化的需求。传统的网络视频服务商通常需配置专门的编辑人员,给影视作品进行不同片段人工标注属性标签,并依据属性标签进行编辑和推送。这种人工标注原始视频片段的属性标签并推送的方式,效率较低且推送精确率不够,远远无法满足个性化和便捷化的需求。For smartphone users, they can watch video streams, movies, TV shows, clips made by users themselves, and video calls in both mobile network environments and Wi-Fi environments. In order to maintain the stickiness of video users, most video applications have added social elements, geographic information, and business forms based on personalized recommendations. In the prior art, when a user views a video while watching a hotspot, real-time reviews and sharing of the video content viewed by the user are realized. This method of manually labeling a hotspot is relatively inefficient. With the continuous development of terminal technology and video website design technology, people have higher requirements for video, in order to meet the increasing personalization and convenience of people in the process of watching videos. Traditional network video service providers usually need to configure special editors to manually label attribute labels for different segments of film and television works, and edit and push them based on the attribute labels. This method of manually labeling the attribute tags of the original video clips and pushing them has low efficiency and insufficient push accuracy, which is far from meeting the needs of personalization and convenience.
发明内容Summary of the invention
本申请实施例提供一种热点视频标注处理方法、装置、计算机设备及存储介质,以解决当前人工标注原始视频片段属性过程中存在的效率低的问题。Embodiments of the present application provide a hotspot video annotation processing method, device, computer equipment, and storage medium, to solve the problem of low efficiency in the current manual annotation of original video segment attributes.
一种热点视频标注处理方法,包括:A hotspot video annotation processing method, including:
获取客户端播放原始视频的同时采集到的用户的录制视频,所述原始视频包括至少一帧原始视频图像,所述录制视频包括至少一帧待识别图像,每一所述待识别图像的录制时间戳与一所述原始视频图像的播放时间戳关联;Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;
采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值;Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;
依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率;Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
若所述激烈情绪概率大于第一概率阈值,则将所述原始视频图像确定为热点视频图像;If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;
基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段。Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
一种热点视频标注处理装置,包括:A hotspot video annotation processing device, including:
录制视频获取模块:用于获取客户端播放原始视频的同时采集到的用户的录制视频,所述原始视频包括至少一帧原始视频图像,所述录制视频包括至少一帧待识别图像,每一所述待识别图像的录制时间戳与一所述原始视频图像的播放时间戳关联;Recorded video acquisition module: used to acquire the user's recorded video collected by the client while playing the original video. The original video includes at least one frame of original video image. The recorded video includes at least one frame of image to be recognized. The recording timestamp of the image to be identified is associated with a playback timestamp of the original video image;
瞬时情绪值获取模块:用于采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值;Instantaneous emotion value acquisition module: used to identify each of the images to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion values corresponding to the images to be recognized;
激烈情绪概率确定模块:用于依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率;Intense emotion probability determination module: used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
热点视频图像确定模块:用于若所述激烈情绪概率大于第一概率阈值,则将所述原始视频图像确定 为热点视频图像;Hotspot video image determination module: used to determine the original video image as a hotspot video image if the intense emotion probability is greater than a first probability threshold;
热点视频片段获取模块:基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段。Hotspot video clip acquisition module: hotspot the original video based on the hotspot video image to obtain hotspot video clips.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
获取客户端播放原始视频的同时采集到的用户的录制视频,所述原始视频包括至少一帧原始视频图像,所述录制视频包括至少一帧待识别图像,每一所述待识别图像的录制时间戳与一所述原始视频图像的播放时间戳关联;Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;
采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值;Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;
依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率;Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
若所述激烈情绪概率大于第一概率阈值,则将所述原始视频图像确定为热点视频图像;If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;
基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段。Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, Causing the one or more processors to perform the following steps:
获取客户端播放原始视频的同时采集到的用户的录制视频,所述原始视频包括至少一帧原始视频图像,所述录制视频包括至少一帧待识别图像,每一所述待识别图像的录制时间戳与一所述原始视频图像的播放时间戳关联;Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;
采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值;Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;
依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率;Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
若所述激烈情绪概率大于第一概率阈值,则将所述原始视频图像确定为热点视频图像;If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;
基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段。Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings, and claims.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings required in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application For those of ordinary skill in the art, without paying creative labor, other drawings can also be obtained based on these drawings.
图1是本申请一实施例中热点视频标注处理方法的一应用环境示意图;1 is a schematic diagram of an application environment of a hotspot video annotation processing method in an embodiment of the present application;
图2是本申请一实施例中热点视频标注处理方法的一流程图;2 is a flowchart of a hotspot video annotation processing method in an embodiment of the present application;
图3是本申请一实施例中热点视频标注处理方法的另一流程图;3 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;
图4是本申请一实施例中热点视频标注处理方法的另一流程图;4 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;
图5是本申请一实施例中热点视频标注处理方法的另一流程图;5 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;
图6是本申请一实施例中热点视频标注处理方法的另一流程图;6 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;
图7是本申请一实施例中热点视频标注处理方法的另一流程图;7 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;
图8是本申请一实施例中热点视频标注处理方法的另一流程图;8 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;
图9是本申请一实施例中热点视频标注处理方法的另一流程图;9 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;
图10是本申请一实施例中热点视频标注处理方法的另一流程图;10 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application;
图11是本申请一实施例中热点视频标注处理装置的一示意图;11 is a schematic diagram of a hotspot video annotation processing device in an embodiment of the present application;
图12是本申请一实施例中计算机设备的一示意图。12 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然, 所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the scope of protection of the present application.
本申请实施例提供的热点视频标注处理方法,该热点视频标注处理方法可应用如图1所示的应用环境中。具体地,该热点视频标注处理方法应用在视频播放系统中,该视频播放系统包括如图1所示的客户端和服务器,客户端与服务器通过网络进行通信,用于实现原始视频进行热点视频片段自动标注,提高热点视频片段标注的效率,并实现热点视频片段个性化推荐和排序显示。其中,客户端又称为用户端,是指与服务器相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The hotspot video annotation processing method provided by the embodiment of the present application can be applied in the application environment shown in FIG. 1. Specifically, the hotspot video annotation processing method is applied in a video playback system. The video playback system includes a client and a server as shown in FIG. 1, and the client and the server communicate through a network to implement hotspot video clips of the original video Automatic tagging improves the efficiency of hotspot video clip annotation, and implements personalized recommendation and sorting display of hotspot video clips. Among them, the client is also called the user, which refers to the program corresponding to the server to provide local services for the client. The client can be installed on but not limited to various personal computers, notebook computers, smart phones, tablets and portable wearable devices. The server can be implemented by an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种热点视频标注处理方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:In an embodiment, as shown in FIG. 2, a hotspot video annotation processing method is provided. The method is applied to the server in FIG. 1 as an example for illustration, including the following steps:
S201:获取客户端播放原始视频的同时采集到的用户的录制视频,原始视频包括至少一帧原始视频图像,录制视频包括至少一帧待识别图像,每一待识别图像的录制时间戳与一原始视频图像的播放时间戳关联。S201: Obtain the user's recorded video collected while the client plays the original video. The original video includes at least one original video image, and the recorded video includes at least one image to be recognized. The recording timestamp of each image to be recognized and an original Video image playback timestamp correlation.
其中,原始视频是指用户的手机和电脑等终端设备所安装的视频播放程序(即客户端)所播放的视频,用于供用户观看。录制视频是指通过安装视频播放程序的终端设备的拍摄模块(如内置摄像头)实时拍摄用户观看该原始视频的同时,用户面部表情变化的视频。原始视频包括至少一帧原始视频图像,该原始视频图像为形成原始视频的单帧图像,即原始视频中最小单位的单幅影像画面。每一原始视频图像携带有一播放时间戳,该播放时间戳为原始视频图像在原始视频中的时间戳,例如,10min的原始视频中第100s的原始视频图像的播放时间戳为100s。录制视频包括至少一帧待识别图像,该待识别图像为形成录制视频的单帧图像,即录制视频中最小单位的单幅影像画面。每一待识别图像对应一录制时间戳,该录制时间戳为待识别图像在录制视频中的时间戳,例如,10min的录制视频中第100s的待识别图像的播放时间戳为100s。该录制时间戳与原始视频图像携带的播放时间戳相关联,以使待识别图像与原始视频图像一一对应,便于精准确定用户观看原始视频时的情绪。The original video refers to a video played by a video playback program (that is, a client) installed on a terminal device such as a user's mobile phone and computer, for viewing by the user. Recorded video refers to real-time shooting of the user's facial expression changes while watching the original video through the shooting module (such as a built-in camera) of the terminal device installed with the video playback program. The original video includes at least one frame of original video image, and the original video image is a single frame image forming the original video, that is, a single image frame of the smallest unit in the original video. Each original video image carries a playback timestamp, which is the timestamp of the original video image in the original video, for example, the playback timestamp of the 100s original video image in the 10min original video is 100s. The recorded video includes at least one frame of image to be recognized, and the image to be recognized is a single frame image that forms the recorded video, that is, a single image screen of the smallest unit in the recorded video. Each image to be recognized corresponds to a recording timestamp, which is the timestamp of the image to be recognized in the recorded video, for example, the playback timestamp of the 100s-th image to be recognized in the 10-min recorded video is 100s. The recording timestamp is associated with the playback timestamp carried by the original video image, so that the image to be recognized corresponds one-to-one with the original video image, which is convenient for accurately determining the user's emotion when watching the original video.
具体地,每一原始视频携带有唯一的视频标识,该视频标识用于唯一识别对应的原始视频,例如《XX》第XX集对应的原始视频,携带有唯一的视频标识XX0001,以使服务器根据该视频标识XX0001,可获取其对应的《XX》第XX集对应的原始视频。每一原始视频图像携带的播放时间戳为原始视频图像在原始视频中的时间戳。在本实施例中,服务器接收到客户端播放同一原始视频的同时,获取通过安装在客户端的终端设备的拍摄模块(如内置摄像头)实时拍摄所有用户观看该原始视频的表情变化对应的录制视频,该录制视频包括至少一帧待识别图像,每一待识别图像对应一录制时间戳,该录制时间戳与原始视频图像携带的播放时间戳相关联。可以理解地,通过收集不同的用户观看该原始视频时的录制视频,可以更好的确定该原始视频是否吸引观众,从而有助于实现对原始视频中的热点视频片段进行自动标注,提高热点视频片段标注的效率。Specifically, each original video carries a unique video identifier, which is used to uniquely identify the corresponding original video, for example, the original video corresponding to episode XX of "XX", carries a unique video identifier XX0001, so that the server can The video ID is XX0001, and the original video corresponding to episode XX of the corresponding "XX" can be obtained. The playback timestamp carried by each original video image is the timestamp of the original video image in the original video. In this embodiment, while receiving the same original video played by the client, the server acquires a recorded video corresponding to the change in the expression of the original video watched by all users through a shooting module (such as a built-in camera) installed in the terminal device of the client, The recorded video includes at least one frame of image to be identified, and each image to be identified corresponds to a recording time stamp, which is associated with the playback time stamp carried by the original video image. Understandably, by collecting the recorded video when different users watch the original video, it can better determine whether the original video attracts the audience, thereby helping to automatically mark the hot video segments in the original video and improve the hot video The efficiency of segment annotation.
在一具体实施方式中,获取客户端播放原始视频的同时采集到的用户的录制视频,包括:(1)控制客户端播放原始视频,使原始视频中的每一原始视频图像的播放时间戳与当前系统时间关联。(2)获取客户端播放原始视频的同时采集到的用户的录制视频,使录制视频中的每一待识别图像的录制时间戳与当前系统时间关联。(3)基于当前系统时间,使每一待识别图像的录制时间戳与一原始视频图像的播放时间戳关联。其中,当前系统时间为任一时刻系统的当前时间,如可通过System类中的currentTimeMillis方法来获取当前系统时间。一般来说,若原始视频的播放与录制视频的录制的时间同步,则原始视频的播放时间戳与录制视频的录制时间戳相对应,即第1帧原始视频图像对应第1帧待识别图像,以使该待识别图像可反映该用户观看对应的原始视频图像时的微表情。相应地,若原始视频的播放与录制视频的录制的时间不同步,则需通过当前系统时间关联原始视频的播放时间戳与录制视频的录制时间戳,以使存在关联关系的待识别图像可反映该用户观看对应的原始视频图像时的微表情。例如,在播放原始视频的第1min后,若用户同意并开始拍摄录制视频,则原始视频的播放与录制视频的录制的时间通过当前系统时间关联,即若在10点5分10秒时播放第1000帧原始视频图像,且在10 点5分10秒时录制第10帧待识别图像,则第1000帧原始视频图像的播放时间戳与第10帧待识别图像的录制时间戳关联。In a specific embodiment, obtaining the user's recorded video collected while the client plays the original video includes: (1) controlling the client to play the original video so that the playback timestamp of each original video image in the original video is Current system time association. (2) Obtain the user's recorded video collected while the client plays the original video, so that the recording timestamp of each image to be identified in the recorded video is associated with the current system time. (3) Based on the current system time, associate the recording timestamp of each image to be identified with the playback timestamp of an original video image. Among them, the current system time is the current time of the system at any moment, for example, the current system time can be obtained by the currentTimeMillis method in the System class. Generally speaking, if the playback time of the original video is synchronized with the recording time of the recorded video, the playback timestamp of the original video corresponds to the recording timestamp of the recorded video, that is, the first frame of the original video image corresponds to the first frame of the image to be identified, So that the image to be recognized can reflect the micro expression of the user when viewing the corresponding original video image. Correspondingly, if the playback time of the original video is not synchronized with the recording time of the recorded video, it is necessary to correlate the playback timestamp of the original video with the recording timestamp of the recorded video through the current system time, so that the associated image to be recognized can be reflected The micro expression of the user when viewing the corresponding original video image. For example, after the first minute of playing the original video, if the user agrees and starts to record the recorded video, the time of the original video playback and the recorded video is related to the current system time, that is, if the first video is played at 10:5:10 1000 frames of the original video image, and the 10th frame of the image to be recognized is recorded at 10:5:10, the playback timestamp of the 1000th frame of the original video image is associated with the 10th frame of the image to be recognized.
S202:采用微表情识别模型对每一待识别图像进行识别,获取待识别图像对应的瞬时情绪值。S202: Recognize each image to be recognized using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the image to be recognized.
其中,微表情识别模型是用于识别待识别图像中人脸微表情的模型。本实施例中,微表情识别模型是通过捕捉待识别图像中的用户脸部的局部特征,并根据局部特征确定待识别图像中人脸的各个目标面部动作单元,再根据所识别出的目标面部动作单元确定其微表情的模型。待识别图像对应的瞬时情绪值是采用微表情识别模型识别出某一待识别图像中人脸微表情类型对应的情绪值。具体地,服务器先采用微表情识别模型对每一待识别图像进行微表情识别,以确定其对应的微表情类型,然后,根据该微表情类型查询情绪值对照表,以获取待识别图像对应的瞬时情绪值。该微表情类型包括但不限于本实施例提及的爱、感兴趣、惊喜、期待……攻击性、冲突、侮辱、怀疑和恐惧等微表情类型。基于微表情类型,获取待识别图像中人脸的瞬时情绪值。利用微表情识别模型可以快速地获取不同用户观看同一原始视频中每一原始视频图像时的瞬时情绪值,以便基于该瞬时情绪值进行热点视频片段分析,从而达到热点视频片段自动标注的目的。Among them, the micro-expression recognition model is a model for recognizing the micro-expression of the human face in the image to be recognized. In this embodiment, the micro-expression recognition model is to capture the local features of the user's face in the image to be recognized, and determine each target facial action unit of the human face in the image to be recognized according to the local features, and then according to the recognized target face The action unit determines the model of its micro-expression. The instantaneous emotion value corresponding to the image to be recognized is the emotion value corresponding to the micro-expression type of the face in a certain image to be recognized by using the micro-expression recognition model. Specifically, the server first uses a micro-expression recognition model to perform micro-expression recognition on each image to be identified to determine its corresponding micro-expression type, and then queries the emotion value comparison table according to the micro-expression type to obtain the corresponding Instant mood value. The micro-expression types include, but are not limited to, love, interest, surprise, expectation... aggressiveness, conflict, insult, suspicion, and fear. Based on the micro-expression type, the instantaneous emotion value of the face in the image to be recognized is obtained. The micro-expression recognition model can quickly obtain the instantaneous emotion value when different users watch each original video image in the same original video, so as to analyze the hot video segment based on the instantaneous emotion value, so as to achieve the purpose of automatically tagging the hot video segment.
具体地,微表情识别模型可以是基于深度学习的神经网络识别模型,也可以是基于分类的局部识别模型,还可以是基于局部二值模式(Local Binary Pattern,LBP)的局部情绪识别模型。其中,微表情识别模型是基于分类的局部识别模型,微表情识别模型预先进行训练时,通过预先收集大量的训练图像数据,训练图像数据中包含每一面部动作单元的正样本和面部动作单元的负样本,通过分类算法对训练图像数据进行训练,获取微表情识别模型。本实施例中,可以是通过SVM分类算法对大量的训练图像数据进行训练,以获取到与多个面部动作单元对应的SVM分类器。例如,可以是39个面部动作单元对应的39个SVM分类器,也可以是54个面部动作单元对应的54个SVM分类器,进行训练的训练图像数据中包含的不同面部动作单元的正样本和负样本越多,则获取到的SVM分类器数量越多。可以理解地,通过多个SVM分类器以形成微表情识别模型中,其获取到的SVM分类器越多,则形成的微表情识别模型所识别出的微表情类型越精准。以54个面部动作单元对应的SVM分类器所形成的微表情识别模型为例,采用这一微表情识别模型可识别出54种微表情类型,例如可识别出包含爱、感兴趣、惊喜、期待……攻击性、冲突、侮辱、怀疑和恐惧等54种微表情类型。Specifically, the micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on a local binary pattern (LBP). Among them, the micro-expression recognition model is a local recognition model based on classification. When the micro-expression recognition model is pre-trained, a large amount of training image data is collected in advance. The training image data includes positive samples of each facial action unit and facial action unit. Negative samples are used to train the training image data through classification algorithms to obtain a micro-expression recognition model. In this embodiment, a large amount of training image data may be trained through an SVM classification algorithm to obtain SVM classifiers corresponding to multiple facial action units. For example, it may be 39 SVM classifiers corresponding to 39 face action units, or 54 SVM classifiers corresponding to 54 face action units, and the positive samples of different face action units included in the training image data for training The more negative samples, the more SVM classifiers are obtained. Understandably, in multiple micro-expression recognition models formed by multiple SVM classifiers, the more SVM classifiers it acquires, the more accurate the micro-expression types recognized by the formed micro-expression recognition model. Take the micro-expression recognition model formed by the SVM classifier corresponding to 54 facial action units as an example. Using this micro-expression recognition model, 54 types of micro-expressions can be identified, for example, including love, interest, surprise, expectation ... 54 types of micro-expressions such as aggression, conflict, insult, doubt and fear.
S203:依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定播放时间戳对应的原始视频图像的激烈情绪概率。S203: Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp.
其中,激烈情绪概率是用于评估观看同一原始视频的不同待识别图像的激励情绪的概率。可以理解地,若激烈情绪概率高,则说明用户观看该原始视频的情绪波动较大,该原始视频对用户具有较强的吸引力。具体地,服务器先根据每一原始视频图像对应的播放时间戳,获取与该播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,从而获取所有观看到该播放时间戳对应的原始视频图像的用户的瞬时情绪值,基于每一待识别图像的瞬时情绪值确定其是否为激烈情绪,从而分析出所有用户观看该原始视频图像时的激烈情绪概率,使得该激烈情绪概率可客观反映观看同一原始视频的用户对该原始视频的喜爱程度或者引起共鸣的程度。Among them, the intense emotion probability is a probability for evaluating the motivated emotion of different to-be-recognized images watching the same original video. Understandably, if the probability of intense emotion is high, it means that the user's mood for watching the original video fluctuates greatly, and the original video has a strong attraction to the user. Specifically, the server first obtains the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the playback timestamp according to the playback timestamp corresponding to each original video image, so as to obtain all the viewing time corresponding to the playback timestamp The instantaneous emotion value of the user of the original video image, based on the instantaneous emotion value of each image to be identified, determines whether it is intense emotion, thereby analyzing the probability of intense emotion when all users watch the original video image, so that the probability of intense emotion can be objective Reflects the degree to which the user watching the same original video likes the original video or the degree of resonance.
S204:若激烈情绪概率大于第一概率阈值,则将原始视频图像确定为热点视频图像。S204: If the intense emotion probability is greater than the first probability threshold, determine the original video image as a hot video image.
其中,第一概率阈值是预先设定的用于评估原始视频是否为热点视频图像的概率阈值。本实施例中,该预设概率阈值可以设置为60%。若激烈情绪概率大于第一概率阈值,则说明所有观看到该原始视频图像的用户中,有较大比例(即大于第一概率阈值)的用户在观看该原始视频图像中引起强烈的情绪波动(即其瞬时情绪值对应的情绪为激烈情绪),对用户的吸引力较高,因此可将原始视频图像确定为热点视频图像。The first probability threshold is a preset probability threshold for evaluating whether the original video is a hot video image. In this embodiment, the preset probability threshold may be set to 60%. If the intense emotion probability is greater than the first probability threshold, it means that a large percentage of all users who viewed the original video image (that is, greater than the first probability threshold) caused strong emotional fluctuations in watching the original video image ( That is, the emotion corresponding to the instantaneous emotion value is intense emotion), which has a higher attraction to the user, so the original video image can be determined as a hot video image.
S205:基于热点视频图像对原始视频进行热点标注,获取热点视频片段。S205: Hot-spot the original video based on the hot-spot video image to obtain hot-spot video clips.
具体地,服务器在获取原始视频中所有的热点视频图像后,可基于任意两个热点视频图像形成原始视频片段,再基于原始视频片段中所有原始视频图像的总帧数与预先设置的帧数阈值进行比较,从而确定所形成的原始视频片段是否为热点视频片段,自动将与热点视频图像对应的原始视频图像进行标记,并在原始视频中标注热点视频片段,以实现对原始视频中的热点视频片段的自动标注,提高热点视频片 段的标注效率。Specifically, after acquiring all the hotspot video images in the original video, the server may form an original video segment based on any two hotspot video images, and then based on the total number of frames of all the original video images in the original video segment and the preset frame number threshold Make a comparison to determine whether the original video clip is a hot video clip, automatically mark the original video image corresponding to the hot video image, and mark the hot video clip in the original video to realize the hot video in the original video The automatic labeling of clips improves the annotation efficiency of hot video clips.
本实施例所提供的热点视频标注处理方法,在播放原始视频的同时采集到的用户的录制视频,以使每一待识别图像的录制时间戳与一原始视频图像的播放时间戳关联,以保证对原始视频进行微表情分析的客观性。再采用微表情识别模型对待识别图像进行识别,通过微表情识别模型可以快速识别用户观看原始视频中某一原始视频图像时的微表情,以获取用户观看原始视频的激烈情绪值,以便基于该激烈情绪值实现热点视频标注,从而保证热点视频片段标注的客观性。然后,基于同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定该播放时间戳对应的原始视频图像的激烈情绪概率,以便确定是否为热点视频图像,以实现将原始视频的热点标注细分到对原始视频图像进行热点分析,以保证热点分析的客观性和准确性。最后,基于热点视频图像对原始视频进行热点标注,获取热点视频片段计算得到用户观看原始视频时的情绪激烈概率,以便服务器获取热点视频片段,从而实现自动标注热点视频片段,提高标注热点视频片段的效率及精确度,为用户提供更好的观看体验。The hotspot video annotation processing method provided in this embodiment collects the user's recorded video while playing the original video, so that the recording timestamp of each image to be identified is associated with the playback timestamp of an original video image to ensure The objectivity of micro expression analysis of the original video. Then, the micro-expression recognition model is used to recognize the image to be recognized, and the micro-expression recognition model can quickly identify the micro-expression when the user views an original video image in the original video to obtain the intense emotion value of the user watching the original video, so as to be based on the intense The emotion value realizes the hotspot video annotation, thereby ensuring the objectivity of the hotspot video clip annotation. Then, based on the instantaneous emotion values of the to-be-recognized images corresponding to all the recording timestamps associated with the same playback timestamp, determine the intense emotion probability of the original video image corresponding to the playback timestamp, so as to determine whether it is a hot video image, so as to realize the original The hotspot annotation of the video is subdivided into hotspot analysis of the original video image to ensure the objectivity and accuracy of the hotspot analysis. Finally, based on the hotspot video image, the original video is hotspot annotated, and hotspot video clips are obtained to calculate the probability of intense emotion when the user watches the original video, so that the server can obtain hotspot video clips, so that the hotspot video clips are automatically marked, and the hotspot video clips are marked up. Efficiency and accuracy provide users with a better viewing experience.
在一实施例中,如图3所示,在步骤S202中,采用微表情识别模型对每一待识别图像进行识别,获取待识别图像对应的瞬时情绪值,包括:In an embodiment, as shown in FIG. 3, in step S202, a micro-expression recognition model is used to identify each image to be recognized, and the instantaneous emotion value corresponding to the image to be recognized is obtained, including:
S301:采用微表情识别模型对每一待识别图像进行识别,获取至少一种识别表情类型对应的瞬时概率。S301: Recognize each image to be recognized by using a micro-expression recognition model to obtain the instantaneous probability corresponding to at least one type of recognized expression.
其中,识别表情类型是指采用微表情识别模型对待识别图像进行识别时,识别到其属于预先配置的某一种微表情类型的模型。The recognition expression type refers to a model that recognizes that it belongs to a certain pre-configured micro expression type when the image to be recognized is recognized by using a micro expression recognition model.
具体地,服务器预先训练好的微表情识别模型中包括多个SVM分类器,每一SVM分类器用于识别一种面部动作单元。本实施例中,微表情识别模型中包含54个SVM分类器,建立面部动作单元编号映射表,每个面部动作单元用一个预先规定的编号表示。例如,AU1为内眉上扬,AU2为外眉上扬,AU5为上眼睑上扬和AU26为下颚张开等。每个面部动作单元有训练好对应的SVM分类器。例如,通过内眉上扬对应的SVM分类器可识别出内眉上扬的局部特征属于内眉上扬的概率值,通过外眉上扬对应的SVM分类器可识别出外眉上扬的局部特征属于外眉上扬的概率值等。Specifically, the micro-expression recognition model pre-trained by the server includes multiple SVM classifiers, and each SVM classifier is used to identify a facial action unit. In this embodiment, the micro-expression recognition model includes 54 SVM classifiers to establish a facial action unit number mapping table, and each facial action unit is represented by a predetermined number. For example, AU1 is the inner eyebrow lift, AU2 is the outer eyebrow lift, AU5 is the upper eyelid lift, and AU26 is the lower jaw opening. Each facial action unit has a corresponding SVM classifier trained. For example, the SVM classifier corresponding to the inner eyebrow can be identified as the probability that the inner feature of the inner eyebrow is belonged to the inner eyebrow, and the SVM classifier corresponding to the outer eyebrow can be identified that the local feature of the outer eyebrow is belonged to the outer eyebrow. Probability values, etc.
本实施例中,服务器采用预先训练好的微表情识别模型对待识别图像进行识别时,可先对每一待识别图像进行人脸关键点检测和特征提取等,以获取待识别图像的局部特征。其中,人脸关键点算法可以是但不限于Ensemble of Regression Tress(简称ERT)算法、SIFT(scale-invariant feature transform)算法,SURF(Speeded Up Robust Features)算法,LBP(Local Binary Patterns)算法和HOG(Histogram of Oriented Gridients)算法。特征提取算法可以CNN(Convolutional Neural Network,卷积神经网)算法。再将该局部特征输入到多个SVM分类器中,通过多个SVM分类器对的输入的所有局部特征进行识别,获取多个SVM分类器输出的与该面部动作单元对应的概率值,将概率值大于预设阈值的SVM分类器对应的面部动作单元确定为目标面部动作单元。其中,目标面部动作单元是指根据微表情识别模型对待识别图像进行识别,获取到的面部动作单元(Action Unit,AU)。概率值具体可以是0-1之间的值,若输出的概率值为0.6,预设阈值为0.5,那么概率值0.6大于预设阈值0.5,则将0.6对应的面部动作单元,作为待识别图像的目标面部动作单元。最后,将所获取到的所有目标面部动作单元进行综合评估,获取其属于微表情识别模型预先配置的微表情类型对应的概率,即属于每一种识别表情类型的瞬时概率。将所获取到的所有目标面部动作单元进行综合评估具体是指基于所有目标面部动作单元的组合,获取这一组合属于预先配置的微表情类型的概率,以确定其识别表情类型的瞬时概率。In this embodiment, when the server uses a pre-trained micro-expression recognition model to recognize the image to be recognized, it may first perform key point detection and feature extraction on each image to be recognized to obtain local features of the image to be recognized. Among them, the face key point algorithm can be, but not limited to, Ensemble of Regression Tress (ERT) algorithm, SIFT (scale-invariant feature) transform algorithm, SURF (Speeded UpRobust Features) algorithm, LBP (Local Binary Patterns) algorithm and HOG (Histogram of Oriented Gridients) algorithm. The feature extraction algorithm may be a CNN (Convolutional Neural Network) algorithm. Then input the local features into multiple SVM classifiers, identify all the local features of the input by the multiple SVM classifiers, and obtain the probability values corresponding to the facial action unit output by the multiple SVM classifiers, and convert the probability The facial action unit corresponding to the SVM classifier whose value is greater than the preset threshold is determined as the target facial action unit. The target facial action unit refers to the facial action unit (Action Unit, AU) obtained by recognizing the image to be recognized according to the micro-expression recognition model. The probability value may specifically be a value between 0-1. If the output probability value is 0.6 and the preset threshold value is 0.5, then the probability value 0.6 is greater than the preset threshold value 0.5, and the facial action unit corresponding to 0.6 is used as the image to be recognized Target facial action unit. Finally, all the acquired target facial action units are comprehensively evaluated to obtain the probability corresponding to the micro-expression type pre-configured in the micro-expression recognition model, that is, the instantaneous probability belonging to each type of recognized expression. The comprehensive evaluation of all the acquired target facial action units specifically refers to obtaining the probability that this combination belongs to a pre-configured micro-expression type based on the combination of all target facial action units to determine the instantaneous probability of identifying the expression type.
S302:将瞬时概率最大的识别表情类型确定为待识别图像的微表情类型。S302: Determine the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized.
具体地,在识别到每一待识别图像属于至少一种识别表情类型的瞬时概率之后,需将瞬时概率最大的识别表情类型确定为待识别图像对应的微表情类型。例如,在识别到其该待识别图像属于“爱”这一识别表情类型的瞬时概率为0.9,而属于“怀疑”和“宁静”这两个识别表情类型的瞬时概率分别为0.05,则将瞬时概率为0.9对应的识别表情类型确定为该待识别图像的微表情类型,以保证所识别出的微表情类型的准确性。Specifically, after recognizing that each image to be recognized belongs to the instantaneous probability of at least one recognized expression type, the recognized expression type with the largest instantaneous probability needs to be determined as the micro expression type corresponding to the image to be recognized. For example, when it is recognized that the image to be recognized belongs to the recognition expression type of "love", the instantaneous probability is 0.9, while the instantaneous probability of the two recognition expression types of "doubt" and "quiet" are 0.05, respectively, then the instantaneous probability The identified expression type corresponding to a probability of 0.9 is determined as the micro-expression type of the image to be recognized, so as to ensure the accuracy of the identified micro-expression type.
S303:基于微表情类型查询情绪值对照表,获取待识别图像的瞬时情绪值。S303: Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.
其中,情绪值对照表是预先设置的用于记录每一种微表情类型对应的情绪属性的数据表。在情绪值 对照表中,存储有微表情类型和情绪值的关联关系。服务器在获取到待识别图像所属的微表情类型后,基于该微表情类型查询情绪值对照表,获取相对应的瞬时情绪值。其中,瞬时情绪值是[-1,1]之间的数值,数值越大,反映用户越喜好该待识别图像关联的录制时间戳对应的原始视频图像;数据越小,反映用户越厌恶该待识别图像关联的录制时间戳对应的原始视频图像。例如,为了便于后续计算,可将微表情识别模型识别出的54种微表情类型对应的瞬时情绪值分别设置为1、0.8、0.5、0.3、0、-0.3、-0.5、-0.8和-1中的任一个。The emotion value comparison table is a preset data table for recording the emotion attribute corresponding to each micro-expression type. In the emotion value comparison table, the association relationship between the micro-expression type and the emotion value is stored. After acquiring the micro-expression type to which the image to be recognized belongs, the server queries the emotion value comparison table based on the micro-expression type to obtain the corresponding instantaneous emotion value. Among them, the instantaneous emotion value is a value between [-1,1], the larger the value, the more the user likes the original video image corresponding to the recording timestamp associated with the image to be recognized; the smaller the data, the more the user hates the treatment Identify the original video image corresponding to the recording timestamp associated with the image. For example, to facilitate subsequent calculations, the instantaneous emotion values corresponding to the 54 micro-expression types identified by the micro-expression recognition model can be set to 1, 0.8, 0.5, 0.3, 0, -0.3, -0.5, -0.8, and -1, respectively. Any of them.
本实施例所提供的热点视频标注处理方法,先采用微表情识别模型对待识别图像进行识别,以快速获取至少一种识别表情类型对应的瞬时概率,并选取瞬时概率最大的识别表情类型确定待识别图像的微表情类型,以保证所识别出的微表情类型的准确性。再基于微表情类型查询情绪值对照表获取待识别图像的瞬时情绪值,以确保待识别图像的瞬时情绪值的获取效率。The hotspot video annotation processing method provided in this embodiment first uses a micro-expression recognition model to recognize the image to be recognized, so as to quickly obtain the instantaneous probability corresponding to at least one recognized expression type, and selects the identified expression type with the largest instantaneous probability to determine to be recognized The micro-expression type of the image to ensure the accuracy of the identified micro-expression type. Then query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized, so as to ensure the efficiency of acquiring the instantaneous emotion value of the image to be recognized.
进一步地,在获取每一待识别图像对应的瞬时情绪值之后,服务器可基于该瞬时情绪值查询数据库,获取与瞬时情绪值相对应的标准音量或标准色调;并获取客户端当前播放该待识别图像时的当前音量或当前色调,基于标准音量或标准色调对当前音量和当前色调分别进行自动化调整,以使播放该待识别图像的当前音量和当前色调与用户当前情绪相匹配,即可使视频的音量或色调与用户当时的心情相匹配,更容易引起同感,从而提高原始视频对用户的吸引力。Further, after acquiring the instantaneous emotion value corresponding to each image to be recognized, the server may query the database based on the instantaneous emotion value to obtain the standard volume or standard tone corresponding to the instantaneous emotion value; and obtain the client currently playing the to-be-recognized The current volume or current color tone of the image, based on the standard volume or standard color tone, automatically adjust the current volume and current color tone respectively, so that the current volume and current color tone of the image to be recognized match the user's current mood, you can make the video The volume or hue of the match with the user's mood at the time, it is easier to cause empathy, thereby increasing the appeal of the original video to the user.
在一实施例中,如图4所示,在步骤S203中,即依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定播放时间戳对应的原始视频图像的激烈情绪概率的步骤,包括:In an embodiment, as shown in FIG. 4, in step S203, the original video image corresponding to the playback timestamp is determined according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp The steps of intense emotion probability include:
S401:统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像的图像总数量。S401: Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp.
其中,该图像总数量为服务器采集到所有观看过该原始视频图像对应的用户对应的待识别图像的总和。具体地,在对任一原始视频进行热点视频片段标注时,需获取所有观看该原始视频对应的录制视频,统计同一原始视频图像相对应的播放时间戳关联的所有录制时间戳对应的待识别图像的数量,确定为图像总数量。例如,对于视频标识为XX0001的原始视频中,某一原始视频图像是播放时间戳为原始视频中第10秒的原始视频图像,则与第10秒原始视频图像关联的所有待识别图像的数量为图像总数量。Wherein, the total number of images is the sum of the images to be recognized corresponding to all users who have collected the original video image and collected by the server. Specifically, when annotating a hot video segment of any original video, it is necessary to obtain all recorded videos corresponding to viewing the original video, and to count the images to be identified corresponding to all the recording time stamps associated with the playback time stamp corresponding to the same original video image The number is determined as the total number of images. For example, for an original video with a video ID of XX0001, a certain original video image is an original video image with a playback timestamp of the 10th second in the original video, and the number of all images to be recognized associated with the 10th original video image is The total number of images.
S402:若待识别图像对应的瞬时情绪值的绝对值大于预设情绪阈值,则待识别图像的情绪属性为激烈情绪。S402: If the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than the preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion.
其中,预设情绪阈值是预先设置的用于评估瞬时情绪值是否为激烈情绪的阈值。该预设情绪阈值可以设置为0.6或其他值。具体地,服务器将待识别图像对应的瞬时情绪值的绝对值与预设情绪阈值进行比较,若该绝对值大于预设情绪阈值,则待识别图像的情绪属性为激烈情绪;反之,若该绝对值不大于预设情绪阈值,则待识别图像的情绪属性为平淡情绪。即微表情识别模型识别出每一待识别图像对应的瞬时情绪值是[-1,1]之间的数值,该瞬时情绪值的绝对值越接近1,则说明用户对所观看的原始视频中的原始视频图像的喜好程度或厌恶程度越大,可认定其微表情情绪为激烈情绪,这种激烈情绪容易引起用户的共鸣,具有较强的吸引力。相应地,若该瞬时情绪值的绝对值接近0,则说明用户对所观看的原始视频中的原始视频图像的喜好程度或厌恶程度越小,说明该原始视频图像没有引起用户的共鸣,其对用户的吸引力越低,可认定其微表情情绪为平淡情绪。Wherein, the preset emotion threshold is a preset threshold for evaluating whether the instantaneous emotion value is intense emotion. The preset emotion threshold may be set to 0.6 or other values. Specifically, the server compares the absolute value of the instantaneous emotion value corresponding to the image to be recognized with the preset emotion threshold, if the absolute value is greater than the preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion; otherwise, if the absolute If the value is not greater than the preset emotion threshold, the emotion attribute of the image to be recognized is plain emotion. That is, the micro-expression recognition model recognizes that the instantaneous emotion value corresponding to each image to be recognized is a value between [-1,1]. The closer the absolute value of the instantaneous emotion value is to 1, it means that the user has viewed the original video The greater the degree of likes or dislikes of the original video image of, the micro-expression emotions can be considered as intense emotions. Such intense emotions easily resonate with users and have strong appeal. Correspondingly, if the absolute value of the instantaneous emotion value is close to 0, it means that the user’s preference or dislike of the original video image in the original video being watched is smaller, indicating that the original video image does not resonate with the user. The lower the user's attractiveness, the micro-expression emotion can be regarded as a dull emotion.
S403:统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像中,情绪属性为激烈情绪的待识别图像的数量为激烈情绪数量。S403: Count the images to be recognized corresponding to all the recording timestamps associated with the same playback timestamp, and the number of images to be recognized whose emotion attribute is intense emotion is the number of intense emotions.
具体地,服务器从统计出的与同一播放时间戳关联的所有录制时间戳对应的待识别图像中,情绪情绪为激烈情绪值的所有待识别图像的数量,将该数量确定为激烈情绪数量。例如,100名用户同时观看同一个原始视频中的某一个播放时间戳对应的原始视频图像,则获取与同一播放时间戳关联的所有录制时间戳对应的100待识别图像,采用微表情识别模型识别所有100张待识别图像的瞬时情绪值,并基于该瞬时情绪值确定其是否为激烈情绪,并将情绪属性为激烈情绪的待识别图像的数量确定为激烈情绪数量,此时激烈情绪数量为0-100之间的数值。Specifically, the server determines the number of all images to be recognized that have an emotion of intense emotion value from the images to be recognized corresponding to all recorded time stamps associated with the same playback time stamp, and determines the number as the number of intense emotions. For example, if 100 users watch the original video image corresponding to a certain playback timestamp in the same original video at the same time, then obtain 100 to-be-recognized images corresponding to all the recording timestamps associated with the same playback timestamp and use the micro-expression recognition model to identify The instantaneous emotion values of all 100 images to be identified, and whether or not they are intense emotions is determined based on the instantaneous emotion values, and the number of images to be recognized whose emotional attribute is intense emotions is determined as the intense emotion quantity, in which case the intense emotion quantity is 0 Values between -100.
S404:采用激烈情绪概率公式对图像总数量和激烈情绪数量进行计算,确定播放时间戳对应的原始视频图像的激烈情绪概率,激烈情绪概率公式为L=A/B,L为激烈情绪概率,A为激烈情绪数量, B为图像总数量。S404: Calculate the total number of images and the number of intense emotions using the intense emotion probability formula to determine the intense emotion probability of the original video image corresponding to the playback timestamp. The intense emotion probability formula is L=A/B, L is the intense emotion probability, A Is the number of intense emotions, and B is the total number of images.
具体地,服务器在获取任一原始视频图像的图像总数量和激烈情绪数量之后,可采用激烈情绪概率公式快速计算出其激烈情绪概率。该激烈情绪概率反映观看该原始视频图像的所有用户中,对该原始视频图像引起强烈的情绪波动的概率,可以很好地反映该原始视频图像对用户的吸引力或者引起用户的共鸣程度。Specifically, after acquiring the total number of images and the number of intense emotions of any original video image, the server may quickly calculate the intense emotion probability using the intense emotion probability formula. The intense emotion probability reflects the probability of causing strong emotion fluctuations to the original video image among all users who viewed the original video image, which can well reflect the attractiveness of the original video image to the user or the degree of resonance caused by the user.
本实施例所提供的热点视频标注处理方法中,先获取与同一播放时间戳相对应的所有待识别图像的图像总数量,并从同一播放时间戳相对应的待识别图像中确定情绪属性为激烈情绪的激烈情绪数量,利用激烈情绪概率公式计算激烈情绪概率,使得激烈情绪概率的获取更具客观性可以直观地表现出原始视频图像对用户的吸引力。In the hotspot video annotation processing method provided in this embodiment, first obtain the total number of images of all the images to be identified corresponding to the same playback timestamp, and determine from the images to be identified corresponding to the same playback timestamp that the emotional attribute is intense The number of intense emotions is calculated using the intense emotion probability formula, which makes the acquisition of intense emotion probabilities more objective and can intuitively show the attractiveness of the original video image to the user.
在一实施例中,如图5所示,在步骤S205中,即基于热点视频图像对原始视频进行热点标注,获取热点视频片段,包括:In an embodiment, as shown in FIG. 5, in step S205, the original video is hot-spot-marked based on the hot-spot video image to obtain hot-spot video clips, including:
S501:统计任意两个热点视频图像之间形成的原始视频片段的帧数,确定为视频片段帧数。S501: Count the number of frames of the original video clip formed between any two hot-spot video images, and determine the number of frames of the video clip.
其中,视频片段帧数是指两个热点视频图像之间形成的原始视频片段的帧数的总和。在本实施例中,获取热点视频图像后,统计任意两个热点视频图像之间形成的原始视频片段的帧数,确定为视频片段帧数,由于该原始视频片段包含两个热点视频图像,因此该视频片段帧数为至少两个。例如,原始视频中第20帧原始视频图像和第40帧原始视频图像为热点视频图像,则确定这两个热点视频图像之间形成的原始视频片段的视频片段帧数为21帧。The frame number of the video clip refers to the total number of frames of the original video clip formed between the two hot video images. In this embodiment, after acquiring hotspot video images, the number of frames of the original video clip formed between any two hotspot video images is counted and determined as the number of frames of the video clip. Since the original video clip contains two hotspot video images, so The number of video clip frames is at least two. For example, if the 20th original video image and the 40th original video image in the original video are hotspot video images, it is determined that the number of video clip frames of the original video clip formed between the two hotspot video images is 21 frames.
S502:若视频片段帧数小于或等于第一帧数阈值,则将原始视频片段确定为热点视频片段。S502: If the number of video clip frames is less than or equal to the first frame number threshold, determine the original video clip as a hot video clip.
其中,第一帧数阈值是指预先设定好的用于判断原始视频片段是否为热点视频片段的时间间隔的最小值的阈值。该第一帧数阈值是自主设定的,其数值一般比较小。例如,该第一帧数阈值设置为120帧,而原始视频播放的帧率一般为24帧/秒,因此,其可确定的原始视频片段为5秒的原始视频片段。若视频片段帧数小于第一帧数阈值,则说明相邻两个热点视频图像的原始视频片段的间隔时间较短,该原始视频片段在短时间内引起用户的激烈情绪值,吸引用户的关注,则将该原始视频片段确定为热点视频片段。The first frame number threshold refers to a preset threshold for determining whether the original video clip is the minimum value of the time interval of the hot video clip. The first frame number threshold is set independently, and its value is generally relatively small. For example, the threshold of the first frame number is set to 120 frames, and the frame rate of the original video playback is generally 24 frames/second, therefore, the original video segment that can be determined by it is an original video segment of 5 seconds. If the frame number of the video clip is less than the threshold of the first frame number, it means that the interval between the original video clips of the two adjacent hotspot video images is short, and the original video clips cause the user's intense emotional value in a short time and attract the user's attention , Then the original video clip is determined as a hot video clip.
S503:若视频片段帧数大于第一帧数阈值,且小于或者等于第二帧数阈值,则基于原始视频片段对应的播放时间戳,获取原始视频片段对应的波动情绪概率。S503: If the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold, then based on the playback timestamp corresponding to the original video clip, the fluctuation emotion probability corresponding to the original video clip is obtained.
其中,第二帧数阈值是指预先设定的用于判断片段视频是否为热点视频片段的时间间隔的最大值的阈值。一般第二帧数阈值设置的较大,例如,第一帧数阈值设为120帧时,第二帧数阈值可设置为1200帧,若原始视频的播放帧率为24帧/秒,因此,其可确定的两个热点视频图像之间形成的原始视频片段为50秒的原始视频片段。根据50秒的原始视频片段中每一原始视频图像对应的播放时间戳,获取与该播放时间戳关联的待识别图像的情绪波动概率;若情绪波动概率较大,则说明该原始视频片段引起用户的激烈情绪的可能性较大;反之,若情绪波动概率较小,则说明该原始视频片段引起用户的激烈情绪的可能性较小。其中,情绪波动概率是指用户观看原始视频片段过程中引起较大情绪波动的概率,此处的较大情绪波动可以理解地从大喜到大悲或者其他情绪变化过程。The second frame number threshold refers to a preset threshold for determining whether the segment video is the maximum time interval of the hot video segment. Generally, the second frame number threshold is set larger. For example, when the first frame number threshold is set to 120 frames, the second frame number threshold can be set to 1200 frames. If the playback frame rate of the original video is 24 frames/second, therefore, It can be determined that the original video segment formed between the two hotspot video images is a 50-second original video segment. According to the playback timestamp corresponding to each original video image in the 50-second original video clip, obtain the emotion fluctuation probability of the image to be identified associated with the playback timestamp; if the emotion fluctuation probability is large, it means that the original video clip caused the user Is more likely to have intense emotions; on the contrary, if the probability of emotional fluctuation is small, it means that the original video clip is less likely to cause intense emotions of the user. Among them, the emotion fluctuation probability refers to the probability of causing a large emotion fluctuation during the user watching the original video clip, where the large emotion fluctuation can understandably change from overjoy to great compassion or other emotion change processes.
S504:若波动情绪概率大于第二概率阈值,则将原始视频片段确定为热点视频片段。S504: If the fluctuation emotion probability is greater than the second probability threshold, the original video clip is determined as a hot video clip.
其中,第二概率阈值是用于基于波动情绪概率评估热点视频片段所设置的与概率相关的阈值。可以理解地,若某一原始视频片段的波动情绪概率大于第二概率阈值,则说明该原始视频片段引起用户较强的情绪波动,吸引用户的注意力,可将其确定为热点视频片段。The second probability threshold is a probability-related threshold set for evaluating hot-spot video clips based on the fluctuation emotion probability. Understandably, if the fluctuation emotion probability of an original video clip is greater than the second probability threshold, it means that the original video clip causes a strong emotional fluctuation of the user, attracts the user's attention, and can be determined as a hot video clip.
本实施例所提供的热点视频标注处理方法中,先获取两个热点视频图像之间形成的原始视频片段的视频片段帧数,如果视频片段帧数小于或者等于第一帧数阈值,则直接将该视频片段帧数为热点视频片段。如果视频片段帧数大于第一帧数阈值,且小于或者等于第二帧数阈值,则需获取该原始视频片段的波动情绪概率,再该片段视频的波动情绪概率与第二概率阈值的比较结果确定该原始视频片段是否为热点视频片段。本实施例中,通过两个热点视频图像之间形成的原始视频片段的视频片段帧数和波动情绪概率,确定其是否为热点视频片段,从而实现对原始视频中热点视频片段的自动标注,并保证所标注出的热点视频片段的客观性。In the hotspot video annotation processing method provided in this embodiment, the video clip frame number of the original video clip formed between two hotspot video images is first obtained. If the video clip frame number is less than or equal to the first frame number threshold, then directly The frame number of the video clip is a hot video clip. If the video clip frame number is greater than the first frame number threshold and less than or equal to the second frame number threshold, you need to obtain the fluctuation emotion probability of the original video clip, and then compare the fluctuation emotion probability of the clip video with the second probability threshold Determine whether the original video clip is a hot video clip. In this embodiment, it is determined whether the video clip frame number and fluctuation emotion probability of the original video clip formed between the two hot-spot video images are hot-spot video clips, thereby automatically tagging the hot-spot video clips in the original video, and Ensure the objectivity of the marked hot video clips.
在一个实施例中,如图6所示,步骤S503,即基于原始视频片段对应的播放时间戳,获取原始视频片段对应的波动情绪概率,包括:In one embodiment, as shown in FIG. 6, step S503, that is, based on the playback timestamp corresponding to the original video segment, acquiring the fluctuating emotion probability corresponding to the original video segment includes:
S601:基于原始视频片段对应的播放时间戳,从与原始视频相对应的录制视频中截取与播放时间戳对应的录制视频片段。S601: Based on the playback timestamp corresponding to the original video segment, the recorded video segment corresponding to the playback timestamp is intercepted from the recorded video corresponding to the original video.
具体地,服务器根据原始视频片段的播放时间戳,从与该原始视频相对应的录制视频,截取该录制视频中与播放时间戳关联的录制视频片段,以便识别该录制视频片段的待识别图像。例如,若一原始视频中原始视频片段的播放时间戳为第10-50秒,则从与该原始视频相应的录制视频中,截取录制时间戳与第10-50秒这一播放时间戳相对应的录制视频片段,以使录制视频片段中的每一待识别图像均可反映用户观看原始视频片段时的面部表情变化。Specifically, the server intercepts the recorded video segment associated with the playback timestamp in the recorded video from the recorded video corresponding to the original video according to the playback timestamp of the original video segment, so as to identify the image to be recognized of the recorded video segment. For example, if the playback timestamp of the original video segment in an original video is 10-50 seconds, then from the recorded video corresponding to the original video, the interception of the recording timestamp corresponds to the playback timestamp 10-50 seconds Recorded video clips, so that each image to be recognized in the recorded video clips can reflect the user's facial expression changes when viewing the original video clips.
S602:获取录制视频片段中每一待识别图像对应的瞬时情绪值。S602: Obtain the instantaneous emotion value corresponding to each image to be recognized in the recorded video segment.
由于步骤S202中,已经采用微表情识别模型对所有录制视频中的每一待识别图像进行识别,并获取其对应的瞬时情绪值,因此,本步骤可直接获取录制视频片段中每一待识别图像对应的瞬时情绪值,无需重新进行识别,以提高瞬时情绪值的获取效率。Since in step S202, a micro-expression recognition model has been used to identify each image to be recognized in all recorded videos and obtain the corresponding instantaneous emotion value, therefore, this step can directly obtain each image to be recognized in the recorded video clip The corresponding instantaneous emotion value does not need to be re-identified to improve the efficiency of acquiring the instantaneous emotion value.
S603:采用标准差公式对录制视频片段中所有待识别图像对应的瞬时情绪值进行计算,获取情绪值标准差;标准差公式为
Figure PCTCN2019088957-appb-000001
其中,S N为录制视频片段的情绪值标准差,N为录制视频片段中待识别图像的数量,x i为每一待识别图像的瞬时情绪值,
Figure PCTCN2019088957-appb-000002
为录制视频片段中所有瞬时情绪值x i的均值。
S603: Calculate the instantaneous emotion value corresponding to all the images to be recognized in the recorded video clip using the standard deviation formula to obtain the standard deviation of the emotion value; the standard deviation formula is
Figure PCTCN2019088957-appb-000001
Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized,
Figure PCTCN2019088957-appb-000002
It is the average value of all instantaneous emotion values x i in the recorded video clip.
其中,情绪值标准差是指用户观看原始视频片段中所有待识别图像时的瞬时情绪值的标准差,可客观反映用户观看原始视频片段时的情绪波动情况。可以理解地,若采用每个用户的瞬时情绪值计算情绪值标准差,将该情绪值标准差大于预设标准差所确定的热点视频片段为该用户所关注的热点视频片段。若采用观看过这一段原始视频片段的所有用户的平均情绪值计算情绪值标准差,则基于该情绪值标准差大于预设标准差所确定的热点视频片段为该所有用户共同关注的热点视频片段。The standard deviation of the emotion value refers to the standard deviation of the instantaneous emotion value when the user views all the images to be recognized in the original video clip, which can objectively reflect the mood fluctuation of the user when viewing the original video clip. Understandably, if the instantaneous emotion value of each user is used to calculate the standard deviation of the emotion value, the hot video segment determined by the standard deviation of the emotional value greater than the preset standard deviation is the hot video segment concerned by the user. If the average sentiment value of all users who have viewed this original video clip is used to calculate the standard deviation of the sentiment value, the hot video segment determined based on the sentiment value standard deviation being greater than the preset standard deviation is the hot video segment that all users are concerned about .
S604:若情绪值标准差大于标准差阈值,则录制视频片段为情绪波动视频片段。S604: If the standard deviation of the sentiment value is greater than the standard deviation threshold, the recorded video clip is a video clip of emotional fluctuation.
其中,标准差阈值是服务器预先设定好的值,标准差阈值可由用户根据需求自主设置。本实施例中,若某一原始视频片段的情绪值标准差大于标准差阈值,则说明用户观看该段原始视频片段时情绪波动较大,可能是从大喜转到大悲,或者从大悲转到大喜,则录制视频片段为情绪波动视频片段。这种情绪波动情况通过情绪值标准差来体现,可客观反映用户在观看原始视频片段过程中情绪变化。The standard deviation threshold is a value preset by the server, and the standard deviation threshold can be set independently by the user according to requirements. In this embodiment, if the standard deviation of the sentiment value of an original video clip is greater than the standard deviation threshold, it means that the user's emotional fluctuations are large when viewing the original video clip, which may be from great joy to great compassion, or from great compassion to great joy , The recorded video clip is a mood swing video clip. This emotional fluctuation is reflected by the standard deviation of the emotional value, which can objectively reflect the user's emotional changes during watching the original video clip.
S605:采用波动情绪概率公式对情绪波动视频片段的数量和录制视频片段的数量进行计算,获取原始视频片段的波动情绪概率,波动情绪概率公式为P=C/D,P为波动情绪概率,C为情绪波动视频片段的数量,D为录制视频片段的数量。S605: Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip. The fluctuation emotion probability formula is P=C/D, P is the fluctuation emotion probability, C Is the number of emotionally fluctuating video clips, and D is the number of recorded video clips.
具体地,通过波动情绪概率可以直观表现出用户观看原始视频片段时的情绪波动情况,若用户观看原始视频时,情绪波动视频片段数量越多,波动情绪概率越大,则说明该原始视频片段能够引起用户情绪的共鸣。上述波动情绪概率中,录制视频片段的数量D为从所有用户对应的录制视频中,截取出观看同一原始视频片段的录制视频片段的数量,可以理解为所有观看该原始视频片段并被录制到用户面部表情变化的用户数量。情绪波动视频片段的数量C为录制视频片段的数量D中,情绪值标准差大于标准差阈值的录制视频片段的数量。Specifically, the fluctuating emotion probability can intuitively express the mood fluctuation of the user when viewing the original video clip. If the user views the original video, the greater the number of mood swing video clips, the greater the fluctuating mood probability, it means that the original video clip can Resonate with user emotions. In the above fluctuating emotion probability, the number D of recorded video clips is the number of recorded video clips from which the same original video clip is viewed from the recorded video corresponding to all users, which can be understood as all the original video clips viewed and recorded to the user The number of users whose facial expressions change. The number C of emotional fluctuation video clips is the number D of recorded video clips, and the standard deviation of the sentiment value is greater than the standard deviation threshold.
本实施例所提供的热点视频标注处理方法中,获取录制视频片段中每一待识别图像对应的瞬时情绪值,采用标准差公式计算得出情绪值标准差,从而确定每一录制视频片段是否为情绪波动视频片段,以确定可引发强烈的情绪波动的情绪波动视频片段;再对情绪波动视频片段的数量和录制视频片段的数量 进行计算得出原始视频片段的波动情绪概率,以实现通过波动情绪概率反映所有用户观看该原始视频片段的情绪波动情况。In the hotspot video annotation processing method provided in this embodiment, the instantaneous emotion value corresponding to each image to be recognized in the recorded video clip is obtained, and the standard deviation of the emotion value is calculated using a standard deviation formula to determine whether each recorded video clip is Emotional fluctuation video clips to determine the emotional fluctuation video clips that can cause strong emotion fluctuations; then the number of emotional fluctuation video clips and the number of recorded video clips are calculated to obtain the fluctuation emotion probability of the original video clip to achieve The probability reflects the emotional fluctuation of all users watching the original video clip.
在一实施例中,如图7所示,每一录制视频与一用户ID关联,该用户ID是用于唯一识别用户在视频播放系统中的身份的标识。在步骤S205之后,热点视频标注处理方法还包括:In one embodiment, as shown in FIG. 7, each recorded video is associated with a user ID, which is an identifier used to uniquely identify the user's identity in the video playback system. After step S205, the hotspot video annotation processing method further includes:
S701:基于热点视频片段对应的播放时间戳,从与用户ID相对应的录制视频中截取与播放时间戳对应的目标视频片段。S701: Based on the playback timestamp corresponding to the hot video segment, intercept the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID.
其中,目标视频片段是用户ID对应的录制视频与热点视频片段的播放时间戳对应的录制视频片段。具体地,服务器根据热点视频片段对应的播放时间戳,获取用户ID对应的录制视频中,录制时间戳与热点视频片段的播放时间戳相对应的录制视频片段,将所获取的录制视频片段确定为的目标视频片段。The target video segment is a recorded video segment corresponding to the playback timestamp of the recorded video corresponding to the user ID and the hot video segment. Specifically, the server obtains the recorded video clip corresponding to the playback timestamp of the hot video clip in the recorded video corresponding to the user ID according to the playback time stamp corresponding to the hot video clip, and determines the acquired recorded video clip as Target video clip.
S702:获取目标视频片段中每一待识别图像对应的瞬时情绪值。S702: Acquire the instantaneous emotion value corresponding to each image to be recognized in the target video segment.
由于步骤S202中,已经采用微表情识别模型对所有录制视频中的每一待识别图像进行识别,并获取其对应的瞬时情绪值,因此,本步骤可直接获取目标视频片段中每一待识别图像对应的瞬时情绪值,无需重新进行识别,以提高瞬时情绪值的获取效率。In step S202, a micro-expression recognition model has been used to identify each image to be recognized in all recorded videos and obtain the corresponding instantaneous emotion value. Therefore, this step can directly obtain each image to be recognized in the target video segment The corresponding instantaneous emotion value does not need to be re-identified to improve the efficiency of acquiring the instantaneous emotion value.
S703:基于瞬时情绪值查询情绪标签对照表,获取待识别图像对应的单帧情绪标签。S703: Query the emotion tag comparison table based on the instantaneous emotion value to obtain a single frame of emotion tags corresponding to the image to be recognized.
其中,情绪标签对照表是预先设置好的用于记录每一种瞬时情绪值对应的情绪标签的对照表。由于瞬时情绪值分别设置为1、0.8、0.5、0.3、0、-0.3、-0.5、-0.8和-1中的任一个,而每一瞬时情绪值可对应至少一种微表情类型,因此,可以依据每一个瞬时情绪值确定一个情绪标签,也可以依据瞬时情绪值的大小,与预先设置情绪标签划分规则,使得每一瞬时情绪值对应一情绪标签。例如,该情绪标签可以划分为喜、怒、……哀和乐等情绪标签,也可以依据情绪标签划分规则(如情绪值由大到小)划分为1级情绪、2级情绪……M级情绪,每一级情绪对应一情绪值范围。单帧情绪标签指一帧待识别图像对应的瞬时情绪值在情绪标签对照表的情绪标签。即可根据每一待识别图像中用户的瞬时情绪值,在情绪标签对照表中,查询到瞬时情绪值对应的单帧情绪标签,以便确定根据该单帧情绪标签确定用户对相应的原始视频图像的喜好程度。The emotion tag comparison table is a preset comparison table for recording the emotion tags corresponding to each instantaneous emotion value. Since the instantaneous emotion value is set to any one of 1, 0.8, 0.5, 0.3, 0, -0.3, -0.5, -0.8, and -1, and each instantaneous emotion value can correspond to at least one micro-expression type, therefore, An emotion label may be determined according to each instantaneous emotion value, or according to the size of the instantaneous emotion value, and a preset rule for dividing the emotion label, so that each instantaneous emotion value corresponds to an emotion label. For example, the emotion label can be divided into emotion labels such as joy, anger, ... sorrow and joy, and can also be divided into emotion level 1 and emotion level 2 according to the emotion label division rules (such as the emotion value from large to small). Emotion, each level of emotion corresponds to a range of emotion values. The single-frame emotion label refers to the emotion label corresponding to the instantaneous emotion value corresponding to the image to be recognized in the emotion label comparison table. That is, according to the instantaneous emotion value of the user in each image to be recognized, a single frame of emotion labels corresponding to the instantaneous emotion value is queried in the emotion label comparison table, so as to determine that the user determines the corresponding original video image according to the single frame of emotion labels Degree of preference.
S704:基于待识别图像对应的单帧情绪标签,获取目标视频片段对应的片段情绪标签。S704: Based on the single-frame emotion tag corresponding to the image to be recognized, obtain the segment emotion tag corresponding to the target video segment.
其中,由于目标视频片段中的待识别图像是用户观看热点视频片段时所拍摄的实时图像,每一待识别图像对应的单帧情绪标签可反应出该用户观看目标视频片段中每一原始视频图像的情绪标签,获取目标视频片段中所有待识别图像对应的单帧情绪标签后,即可获取用户观看目标视频片段时的片段情绪标签。具体地,可从所有待识别图像的单帧情绪标签中,选取数量最多的一个单帧情绪标签确定为片段情绪标识。Among them, since the image to be recognized in the target video segment is a real-time image taken when the user views the hot video segment, the single-frame emotion tag corresponding to each image to be recognized can reflect each original video image in the target video segment viewed by the user After obtaining the single-frame emotional tags corresponding to all the images to be identified in the target video clip, the emotional tags of the clip when the user views the target video clip can be obtained. Specifically, a single-frame emotion label with the largest number may be selected from the single-frame emotion labels of all the images to be recognized as the segment emotion identifier.
S705:若片段情绪标签为预设情绪标签,则基于用户ID查询用户画像数据库,获取与用户ID相对应的用户标签,基于用户标签确定目标用户,将热点视频片段推送给目标用户对应的客户端。S705: If the clip emotion tag is a preset emotion tag, query the user portrait database based on the user ID, obtain the user tag corresponding to the user ID, determine the target user based on the user tag, and push the hot video clip to the client corresponding to the target user .
其中,该用户标签是基于用户ID查询用户画像数据库,所获取到与用户ID相对应的性别、年龄、职业、兴趣或其他用户画像数据库中预设的标签。目标用户是指服务器获取到的与用户ID的对原始视频具有相同喜好的用户。具体地,可基于用户ID查询用户画像数据库,获取与用户ID相对应的用户标签,再基于用户标签可快速获取到目标用户,以便于推送该目标用户喜爱的热点视频片段。Wherein, the user tag is based on the user ID to query the user portrait database, and the acquired gender, age, occupation, interest, or other preset tags in the user portrait database corresponding to the user ID are obtained. The target user refers to a user who has the same preferences as the original video obtained by the server and the user ID. Specifically, the user profile database can be queried based on the user ID to obtain a user tag corresponding to the user ID, and then the target user can be quickly obtained based on the user tag, so as to facilitate the push of the target user's favorite hot video clip.
其中,预设情绪标签是预先设置的可进行视频推送时的标签。例如,若预设情绪标签为喜标签或者1级标签,服务器识别到某一目标视频片段的片段情绪标签为1级标签,则认定对应的热点视频片段对用户ID对应的用户的吸引力较高,可对与用户ID对应的用户具有相同用户标签(即喜好相同)的目标用户进行热点视频片段推送,以保证热点视频片段对目标用户的吸引力。Among them, the preset emotion tags are preset tags that can be used for video push. For example, if the preset emotional tag is a hi tag or a level 1 tag, and the server recognizes that the segment emotional tag of a target video clip is a level 1 tag, then the corresponding hot video clip is deemed to be more attractive to the user corresponding to the user ID , Hotspot video clips can be pushed to target users with the same user tags (that is, with the same preferences) corresponding to the user ID to ensure the attractiveness of the hotspot video clips to the target users.
本实施例所提供的热点视频标注处理方法中,先从录制视频中截取与热点视频片段的播放时间戳相对应的目标视频片段,获取其中每一待识别图像对应的单帧情绪标签,从而确定目标视频片段对应的片段情绪标签,该片段情绪标签可反映用户ID对应的用户在观看热点视频片段过程中的喜好情况。然后,基于用户ID查询该用户画像数据库,获取该用户的用户标签,以便确定与用户ID对应的用户具有的相同用户标签的目标用户,以使目标用户与用户ID对应的用户具有相同喜好。在片段情绪标签为预设情绪标签时,向目标用户推送热点视频片段,以提高热点视频片段对应目标用户的吸引力,从而提高热点 视频片段甚至是或者包含该热点视频片段的原始视频的播放量。In the hotspot video annotation processing method provided in this embodiment, the target video segment corresponding to the playback timestamp of the hotspot video segment is intercepted from the recorded video, and a single frame of emotion tags corresponding to each image to be identified is obtained to determine The emotional tag of the segment corresponding to the target video segment, the emotional tag of the segment may reflect the preference of the user corresponding to the user ID in the process of watching the hot video segment. Then, query the user portrait database based on the user ID to obtain the user tag of the user, so as to determine the target user with the same user tag that the user corresponding to the user ID has, so that the target user has the same preferences as the user corresponding to the user ID. When the clip emotion tag is a preset emotion tag, push the hot video clip to the target user to increase the attractiveness of the hot video clip corresponding to the target user, thereby increasing the playback volume of the hot video clip or even the original video containing the hot video clip .
在一实施例中,每一录制视频与一用户ID关联,该用户ID是用于唯一识别用户在视频播放系统中的身份的标识。如图8所示,在步骤S205之后,热点视频标注处理方法还包括:In one embodiment, each recorded video is associated with a user ID, which is an identifier used to uniquely identify the user's identity in the video playback system. As shown in FIG. 8, after step S205, the hotspot video annotation processing method further includes:
S801:基于热点视频片段对应的播放时间戳,从与用户ID相对应的录制视频中截取与播放时间戳对应的目标视频片段。S801: Based on the playback timestamp corresponding to the hot video segment, the target video segment corresponding to the playback timestamp is intercepted from the recorded video corresponding to the user ID.
其中,步骤S801的具体实现过程与步骤S701相同,为避免赘述,此处不一一详述。The specific implementation process of step S801 is the same as that of step S701. In order to avoid redundant description, details are not described here one by one.
S802:获取目标视频片段中每一待识别图像对应的瞬时情绪值。S802: Obtain the instantaneous emotion value corresponding to each image to be recognized in the target video segment.
其中,步骤S802的具体实现过程与步骤S702相同,为避免赘述,此处不一一详述。The specific implementation process of step S802 is the same as that of step S702. In order to avoid redundant description, details are not described here one by one.
S803:基于瞬时情绪值查询情绪标签对照表,获取待识别图像对应的单帧情绪标签。S803: Query the emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized.
其中,步骤S803的具体实现过程与步骤S703相同,为避免赘述,此处不一一详述。The specific implementation process of step S803 is the same as that of step S703. To avoid redundant description, details are not described here one by one.
S804:基于待识别图像对应的单帧情绪标签,获取目标视频片段对应的片段情绪标签。S804: Based on the single-frame emotion tag corresponding to the image to be recognized, obtain the segment emotion tag corresponding to the target video segment.
其中,步骤S804的具体实现过程与步骤S704相同,为避免赘述,此处不一一详述。The specific implementation process of step S804 is the same as that of step S704. To avoid redundant description, details are not described here one by one.
S805:若片段情绪标签为预设情绪标签,则基于热点视频片段对应的播放时间戳,查询视频数据库,获取与热点视频片段相对应的内容标签,将与内容标签相对应的热点视频片段确定为推荐视频片段,将推荐视频片段推送给用户ID对应的客户端。S805: If the clip emotion tag is a preset emotion tag, query the video database based on the playback timestamp corresponding to the hot video clip, obtain the content tag corresponding to the hot video clip, and determine the hot video clip corresponding to the content tag as Recommend video clips, push the recommended video clips to the client corresponding to the user ID.
其中,内容标签是指对原始视频中播放内容的标签,其内容可能是搞笑、美食、时尚、旅游、娱乐、生活、资讯、亲子、知识、游戏、汽车、财经、萌宠、运动、音乐、动漫、科技和健康等分类标签,也可以是对视频内容的具体描述进行细分的其他标签。具体地,服务器在确定目标视频片段的片段情绪标签为预设情绪标签,确定该热点视频片段为用户ID对应的用户比较关注的视频类型的视频片段,此时,基于热点视频片段对应的播放时间戳查询视频数据库,获取与该热点视频片段对应的内容标签。由于用户ID对应的用户比较关注该热点视频片段,类推认定该用户会关注该热点视频片段对应的内容标签所对应的所有热点视频片段。Among them, the content tag refers to the tag of the content played in the original video. The content may be funny, food, fashion, travel, entertainment, life, information, parent-child, knowledge, games, cars, finance, cute pets, sports, music, Category labels such as anime, technology, and health can also be other labels that subdivide specific descriptions of video content. Specifically, the server determines that the segment emotion tag of the target video segment is a preset emotion tag, and determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about. At this time, based on the playback time corresponding to the hotspot video segment Poke to query the video database to obtain the content tag corresponding to the hot video segment. Since the user corresponding to the user ID pays more attention to the hot video segment, the analogy determines that the user will pay attention to all the hot video segments corresponding to the content tags corresponding to the hot video segment.
其中,推荐视频片段是基于内容标签确定的可推荐给用户ID对应的用户的热点视频片段。具体地,服务器根据内容标签查询视频数据库,获取与内容标签相应的其他热点视频片段,该热点视频片段确定为推荐视频片段,并将该推荐视频片段推荐给用户ID的客户端,实现自动推荐与内容标签相同的热点视频片段给用户ID对应的用户。The recommended video clip is a hot video clip that can be recommended to the user corresponding to the user ID determined based on the content tag. Specifically, the server queries the video database according to the content tag, obtains other hot video clips corresponding to the content tag, the hot video clip is determined to be a recommended video clip, and recommends the recommended video clip to the client of the user ID to implement automatic recommendation and Hot video clips with the same content label are given to the user corresponding to the user ID.
本实施例所提供的热点视频标注处理方法中,先从录制视频中截取与热点视频片段的播放时间戳相对应的目标视频片段,获取其中每一待识别图像对应的单帧情绪标签,从而确定目标视频片段对应的片段情绪标签,该片段情绪标签可反映用户ID对应的用户在观看热点视频片段过程中的喜好情况。然后,基于热点视频片段的播放时间戳,查询视频数据库,以确定该热点视频片段预先配置的内容标签,以便将服务器存储的与该内容标签相对应的其他热点视频片段确定为推荐视频片段,并将推荐视频片段推荐给用户ID对应的客户端,以使推荐视频片段更容易迎合用户ID对应的用户的喜好,提高用户ID对应的用户对推荐视频片段的吸引力。In the hotspot video annotation processing method provided in this embodiment, the target video segment corresponding to the playback timestamp of the hotspot video segment is intercepted from the recorded video, and a single frame of emotion tags corresponding to each image to be identified is obtained to determine The emotional tag of the segment corresponding to the target video segment, the emotional tag of the segment may reflect the preference of the user corresponding to the user ID in the process of watching the hot video segment. Then, based on the playback timestamp of the hotspot video clip, query the video database to determine the pre-configured content tag of the hotspot video clip, so as to determine other hotspot video clips stored by the server corresponding to the content tag as recommended video clips, and Recommend the recommended video clip to the client corresponding to the user ID, so that the recommended video clip can more easily cater to the preferences of the user corresponding to the user ID, and improve the attractiveness of the user corresponding to the user ID to the recommended video clip.
在一实施例中,如图9所示,在步骤S205之后,热点视频标注处理方法还包括:In an embodiment, as shown in FIG. 9, after step S205, the hotspot video annotation processing method further includes:
S901:基于热点视频片段,统计原始视频对应的热点视频帧率。S901: Based on the hotspot video clips, the hotspot video frame rate corresponding to the original video is counted.
其中,热点视频帧率是指一个原始视频中所有的热点视频片段的帧数占整个原始视频的帧数的概率。具体地,服务器获取一个原始视频的帧数,然后统计该原始视频中的所有热点视频片段的帧数,利用所有热点视频片段的帧数除以原始视频的帧数,即可得到原始视频对应的热点视频帧率。例如,原始视频的帧数为10000,即原始视频中包含10000帧原始视频图像,而第一个热点视频片段的帧数为1000,第二个热点视频片段的帧数为2000,第三个热点视频片段的帧数为3000,则该原始视频对应的热点视频帧率为(1000+2000+3000)/10000=60%,说明该原始视频中有60%的原始视频图像为热点视频片段中的原始视频图像,可客观反映该原始视频对用户的吸引力。Among them, the hot-spot video frame rate refers to the probability that the number of frames of all hot-spot video clips in an original video occupies the number of frames of the entire original video. Specifically, the server obtains the number of frames of an original video, and then counts the number of frames of all hot video segments in the original video, and divides the number of frames of all hot video segments by the number of frames of the original video to obtain the corresponding Hot video frame rate. For example, the number of frames of the original video is 10000, that is, the original video contains 10000 original video images, and the frame number of the first hotspot video clip is 1000, the frame number of the second hotspot video clip is 2000, and the third hotspot If the frame number of the video clip is 3000, the frame rate of the hotspot video corresponding to the original video is (1000+2000+3000)/10000=60%, indicating that 60% of the original video images in the original video are in the hotspot video clip The original video image can objectively reflect the attractiveness of the original video to users.
S902:基于原始视频对应的热点视频帧率,对原始视频进行排序,并依据排序结果在客户端上显示。S902: Sort the original video based on the hot video frame rate corresponding to the original video and display it on the client according to the sorting result.
其中,服务器依据热点视频帧率从高到低的顺序,对原始视频在客户端的显示位置进行排序,以便用户观看热点视频帧率较高的原始视频,以使用户可根据热点视频帧率进行选择观看,从而提高用户对 该视频播放系统所显示的原始视频的播放量。Among them, the server sorts the display position of the original video on the client according to the order of the hot video frame rate from high to low, so that the user can watch the original video with a higher hot video frame rate, so that the user can choose according to the hot video frame rate Watching, so as to improve the user's playback volume of the original video displayed by the video playback system.
本实施例所提供的热点视频标注处理方法中,获取每一原始视频的热点视频帧率后对原始视频进行排序并显示在该用户的客户端上,使得用户可选择性观看更具吸引力的原始视频以提高视频播放系统显示的原始视频的播放量。In the hotspot video annotation processing method provided in this embodiment, after obtaining the hotspot video frame rate of each original video, the original video is sorted and displayed on the user's client, so that the user can selectively watch more attractive Original video to increase the playback volume of the original video displayed by the video playback system.
在一实施例中,如图10所示,步骤S901,基于热点视频片段,统计原始视频对应的热点视频帧率,包括:In an embodiment, as shown in FIG. 10, in step S901, based on the hotspot video segment, the hotspot video frame rate corresponding to the original video is counted, including:
S1001:统计每一热点视频片段中原始视频图像的数量,确定为热点视频片段的片段总帧数。S1001: Count the number of original video images in each hot video segment, and determine the total number of frames of the hot video segment.
其中,热点视频片段的片段总帧数是指同一个原始视频中所有的热点视频片段的总的帧数。例如,一个原始视频中的具有6个热点视频片段,此时,服务器统计这6个热点视频片段的帧数之和作为热点视频片段的片段总帧数。Among them, the total frame number of the hot video clip refers to the total frame number of all the hot video clips in the same original video. For example, an original video has 6 hotspot video clips. At this time, the server counts the total number of frames of the 6 hotspot video clips as the total number of hotspot video clips.
S1002:统计原始视频中原始视频图像的数量,确定为原始视频的视频总帧数。S1002: Count the number of original video images in the original video and determine the total number of video frames of the original video.
具体地,服务器统计原始视频中原始视频图像的数量,确定为原始视频的视频总帧数,即该视频总帧数为原始视频中所有原始视频图像的数量。具体地,在播放帧率确定的前提下,可根据播放帧率与原始视频的播放时长的乘积确定原始视频的视频总帧数,以便快速确定原始视频的总帧数,Specifically, the server counts the number of original video images in the original video and determines the total number of video frames of the original video, that is, the total number of video frames is the number of all original video images in the original video. Specifically, on the premise that the playback frame rate is determined, the total number of video frames of the original video may be determined according to the product of the playback frame rate and the playback duration of the original video, so as to quickly determine the total number of frames of the original video,
S1003:采用热点视频帧率公式对热点视频片段的片段总帧数和原始视频的视频总帧数进行计算,获取原始视频对应的热点视频帧率,热点视频帧率公式为
Figure PCTCN2019088957-appb-000003
其中,Z为热点视频帧率,w j为第j个热点视频片段的片段总帧数,m为热点视频片段的数量,K为原始视频的视频总帧数。
S1003: The hotspot video frame rate formula is used to calculate the total frame number of the hotspot video clip and the original video video frame to obtain the hotspot video frame rate corresponding to the original video. The hotspot video frame rate formula is
Figure PCTCN2019088957-appb-000003
Where, Z is the hot spot video frame rate, w j is the total frame number of the jth hot spot video segment, m is the number of hot spot video segments, and K is the total video frame number of the original video.
其中,服务器可在确定热点视频片段的片段总帧数和原始视频的视频总帧数之后,可采用热点视频帧率公式快速计算得到热点视频帧率,基于热点视频帧率对客户端显示的原始视频进行排序,以便用户选择性观看热点视频帧率较高的原始视频,提高原始视频的播放量。The server can determine the total frame number of the hot video segment and the total video frame of the original video, and can quickly calculate the frame rate of the hot video using the formula of the frame rate of the hot video, based on the original frame of the hot video The videos are sorted so that users can selectively watch the original video with a higher frame rate of the hotspot video and improve the playback volume of the original video.
本实施例所提供的热点视频标注处理方法中,获取热点视频片段的片段总帧数及原始视频的视频总帧数,采用热点视频帧率公式计算得出原始视频对应的热点视频帧率,以便依据热点视频帧率反映原始视频对用户的吸引力,从而进行排序,以提高原始视频的播放量。In the hotspot video annotation processing method provided in this embodiment, the total frame number of the hotspot video segment and the total video frame number of the original video are obtained, and the hotspot video frame rate formula is used to calculate the hotspot video frame rate corresponding to the original video, so that According to the hot video frame rate to reflect the attractiveness of the original video to the user, so as to sort, in order to improve the playback volume of the original video.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
在一实施例中,提供一种热点视频标注处理装置,该热点视频标注处理装置与上述实施例中热点视频标注处理方法一一对应。如图11所示,该热点视频标注处理装置包括录制视频获取模块1101、瞬时情绪值获取模块1102、激烈情绪概率确定模块1103、热点视频图像确定模块1104和热点视频片段获取模块1105。各功能模块详细说明如下:In an embodiment, a hotspot video annotation processing device is provided, and the hotspot video annotation processing device corresponds one-to-one to the hotspot video annotation processing method in the foregoing embodiment. As shown in FIG. 11, the hotspot video annotation processing device includes a recorded video acquisition module 1101, an instant emotion value acquisition module 1102, an intense emotion probability determination module 1103, a hotspot video image determination module 1104, and a hotspot video segment acquisition module 1105. The detailed description of each functional module is as follows:
录制视频获取模块1101,用于获取客户端播放原始视频的同时采集到的用户的录制视频,原始视频包括至少一帧原始视频图像,录制视频包括至少一帧待识别图像,每一待识别图像的录制时间戳与一原始视频图像的播放时间戳关联。The recorded video obtaining module 1101 is used to obtain the user's recorded video collected while the client plays the original video. The original video includes at least one frame of original video image, and the recorded video includes at least one frame of image to be recognized. The recording timestamp is associated with the playback timestamp of an original video image.
瞬时情绪值获取模块1102,用于采用微表情识别模型对每一待识别图像进行识别,获取待识别图像对应的瞬时情绪值。The instantaneous emotion value acquisition module 1102 is used to identify each image to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the image to be recognized.
激烈情绪概率确定模块1103,用于依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定播放时间戳对应的原始视频图像的激烈情绪概率。The intense emotion probability determination module 1103 is used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp.
热点视频图像确定模块1104,用于若激烈情绪概率大于第一概率阈值,则将原始视频图像确定为热点视频图像。The hotspot video image determination module 1104 is configured to determine the original video image as a hotspot video image if the intense emotion probability is greater than the first probability threshold.
热点视频片段获取模块1105,用于基于热点视频图像对原始视频进行热点标注,获取热点视频片段。The hotspot video clip acquisition module 1105 is configured to perform hotspot annotation on the original video based on the hotspot video image to obtain hotspot video clips.
优选地,瞬时情绪值获取模块1102包括瞬时概率获取单元、微表情类型确定单元和瞬时情绪值获 取单元。Preferably, the instantaneous emotion value acquisition module 1102 includes an instantaneous probability acquisition unit, a micro-expression type determination unit, and an instantaneous emotion value acquisition unit.
瞬时概率获取单元,用于采用微表情识别模型对每一待识别图像进行识别,获取至少一种识别表情类型对应的瞬时概率。The instantaneous probability acquisition unit is used to identify each image to be recognized by using a micro-expression recognition model to acquire the instantaneous probability corresponding to at least one type of recognized expression.
微表情类型确定单元,用于将瞬时概率最大的识别表情类型确定为待识别图像的微表情类型。The micro-expression type determination unit is used to determine the identified expression type with the largest instantaneous probability as the micro-expression type of the image to be recognized.
瞬时情绪值获取单元,用于基于微表情类型查询情绪值对照表,获取待识别图像的瞬时情绪值。The instantaneous emotion value acquisition unit is used to query an emotion value comparison table based on the micro-expression type to acquire the instantaneous emotion value of the image to be recognized.
优选地,激烈情绪概率确定模块1103包括图像总数量统计单元、激烈情绪判断单元、激烈情绪数量统计单元和激烈情绪概率确定单元。Preferably, the intense emotion probability determination module 1103 includes a total number of image statistics unit, an intense emotion judgment unit, an intense emotion quantity statistical unit, and an intense emotion probability determination unit.
图像总数量统计单元,用于统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像的图像总数量。The total number of image counting unit is used to count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp.
激烈情绪判断单元,用于若待识别图像对应的瞬时情绪值的绝对值大于预设情绪阈值,则待识别图像的情绪属性为激烈情绪。The intense emotion judgment unit is configured to: if the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion.
激烈情绪数量统计单元,用于统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像中,情绪属性为激烈情绪的待识别图像的数量为激烈情绪数量。The intense emotion quantity counting unit is used to count the images to be recognized corresponding to all the recording timestamps associated with the same playback timestamp, and the number of images to be recognized whose emotion attribute is intense emotion is the number of intense emotions.
激烈情绪概率确定单元,用于采用激烈情绪概率公式对图像总数量和激烈情绪数量进行计算,确定播放时间戳对应的原始视频图像的激烈情绪概率,激烈情绪概率公式为L=A/B,L为激烈情绪概率,A为激烈情绪数量,B为图像总数量。Severe emotion probability determination unit, used to calculate the total number of images and the number of intense emotions using the intense emotion probability formula to determine the intense emotion probability of the original video image corresponding to the playback timestamp, the intense emotion probability formula is L=A/B, L Is the probability of intense emotion, A is the number of intense emotions, and B is the total number of images.
优选地,热点视频片段获取模块1105包括视频片段帧数统计单元、第一热点视频片段确定单元、波动情绪概率获取单元和第二热点视频片段确定单元。Preferably, the hotspot video clip acquisition module 1105 includes a video clip frame number counting unit, a first hotspot video clip determination unit, a fluctuation emotion probability acquisition unit, and a second hotspot video clip determination unit.
视频片段帧数统计单元,用于统计任意两个热点视频图像之间形成的原始视频片段的帧数,确定为视频片段帧数。The video clip frame number counting unit is used to count the number of frames of the original video clip formed between any two hot-spot video images and determine the frame number of the video clip.
第一热点视频片段确定单元,用于若视频片段帧数小于或等于第一帧数阈值,则将原始视频片段确定为热点视频片段。The first hotspot video segment determination unit is configured to determine the original video segment as a hotspot video segment if the video segment frame number is less than or equal to the first frame number threshold.
波动情绪概率获取单元,用于若视频片段帧数大于第一帧数阈值,且小于或者等于第二帧数阈值,则基于原始视频片段对应的播放时间戳,获取原始视频片段对应的波动情绪概率。Fluctuation mood probability acquisition unit, used to obtain the fluctuation mood probability corresponding to the original video clip based on the playback timestamp corresponding to the original video clip if the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold .
第二热点视频片段确定单元,用于若波动情绪概率大于第二概率阈值,则将原始视频片段确定为热点视频片段。The second hotspot video segment determination unit is configured to determine the original video segment as a hotspot video segment if the fluctuation emotion probability is greater than the second probability threshold.
优选地,波动情绪概率获取单元包括录制视频片段截取子单元、瞬时情绪值获取子单元、情绪值标准差获取子单元、情绪波动视频片段确定子单元和波动情绪概率计算子单元。Preferably, the fluctuation emotion probability acquisition unit includes a recorded video clip interception subunit, an instant emotion value acquisition subunit, an emotion value standard deviation acquisition subunit, an emotion fluctuation video clip determination subunit, and a fluctuation emotion probability calculation subunit.
录制视频片段截取子单元,用于基于原始视频片段对应的播放时间戳,从与原始视频相对应的录制视频中截取与播放时间戳对应的录制视频片段。The recorded video clip interception subunit is used to intercept the recorded video clip corresponding to the playback timestamp from the recorded video corresponding to the original video based on the playback timestamp corresponding to the original video clip.
瞬时情绪值获取子单元,用于获取录制视频片段中每一待识别图像对应的瞬时情绪值。The instantaneous emotion value acquisition subunit is used to acquire the instantaneous emotion value corresponding to each image to be identified in the recorded video segment.
情绪值标准差获取子单元,用于采用标准差公式对录制视频片段中所有待识别图像对应的瞬时情绪值进行计算,获取情绪值标准差,标准差公式为
Figure PCTCN2019088957-appb-000004
其中,S N为录制视频片段的情绪值标准差,N为录制视频片段中待识别图像的数量,x i为每一待识别图像的瞬时情绪值,
Figure PCTCN2019088957-appb-000005
为录制视频片段中所有瞬时情绪值x i的均值。
Emotion value standard deviation acquisition subunit, used to calculate the instantaneous emotion value corresponding to all the images to be recognized in the recorded video clip using the standard deviation formula to obtain the standard deviation of the emotion value, the standard deviation formula is
Figure PCTCN2019088957-appb-000004
Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized,
Figure PCTCN2019088957-appb-000005
It is the average value of all instantaneous emotion values x i in the recorded video clip.
情绪波动视频片段确定子单元,用于若情绪值标准差大于标准差阈值,则录制视频片段为情绪波动视频片段。The emotional fluctuation video clip determination subunit is used to record a video clip as an emotional fluctuation video clip if the standard deviation of the emotional value is greater than the standard deviation threshold.
波动情绪概率计算子单元,用于采用波动情绪概率公式对情绪波动视频片段的数量和录制视频片段的数量进行计算,获取原始视频片段的波动情绪概率,波动情绪概率公式为P=C/D,P为波动情 绪概率,C为情绪波动视频片段的数量,D为录制视频片段的数量。Fluctuation emotion probability calculation subunit, used to calculate the quantity of emotion fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip, the fluctuation emotion probability formula is P=C/D, P is the probability of fluctuating emotions, C is the number of video clips of emotional fluctuations, and D is the number of recorded video clips.
优选地,每一录制视频与一用户ID关联;在热点视频片段获取模块1105之后,热点视频标注处装置还包括目标视频片段截取模块、目标情绪值获取模块、单帧情绪标签获取模块、片段情绪标签获取模块、目标用户确定模块和热点视频片段推送模块。Preferably, each recorded video is associated with a user ID; after the hotspot video clip acquisition module 1105, the hotspot video tagging device further includes a target video clip interception module, a target emotion value acquisition module, a single-frame emotion tag acquisition module, a clip emotion Tag acquisition module, target user determination module and hotspot video clip pushing module.
目标视频片段截取模块,用于基于热点视频片段对应的播放时间戳,从与用户ID相对应的录制视频中截取与播放时间戳对应的目标视频片段。The target video clip interception module is used to intercept the target video clip corresponding to the playback timestamp from the recorded video corresponding to the user ID based on the playback timestamp corresponding to the hot video clip.
目标情绪值获取模块,用于获取目标视频片段中每一待识别图像对应的瞬时情绪值。The target emotion value acquisition module is used to acquire the instantaneous emotion value corresponding to each image to be identified in the target video segment.
单帧情绪标签获取模块,用于基于瞬时情绪值查询情绪标签对照表,获取待识别图像对应的单帧情绪标签。The single-frame emotion label acquisition module is used to query the emotion label comparison table based on the instantaneous emotion value and obtain the single-frame emotion label corresponding to the image to be recognized.
片段情绪标签获取模块,用于基于待识别图像对应的单帧情绪标签,获取目标视频片段对应的片段情绪标签。The segment emotion tag acquisition module is used to acquire the segment emotion tag corresponding to the target video segment based on the single frame emotion tag corresponding to the image to be recognized.
第一视频片段推送模块,用于若片段情绪标签为预设情绪标签,则基于用户ID查询用户画像数据库,获取与用户ID相对应的用户标签,基于用户标签确定目标用户,将热点视频片段推送给目标用户对应的客户端。The first video segment pushing module is used to query the user portrait database based on the user ID if the emotional tag of the segment is a preset emotional tag, obtain the user tag corresponding to the user ID, determine the target user based on the user tag, and push the hot video clip To the client corresponding to the target user.
第二视频片段推送模块,用于若片段情绪标签为预设情绪标签,则基于热点视频片段对应的播放时间戳,查询视频数据库,获取与热点视频片段相对应的内容标签,将与内容标签相对应的热点视频片段确定为推荐视频片段,将推荐视频片段推送给用户ID对应的客户端。The second video clip push module is used to query the video database based on the playback timestamp corresponding to the hotspot video clip if the clip's emotion tag is the preset emotion tag, and obtain the content tag corresponding to the hotspot video clip, which will be related to the content tag The corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.
优选地,在热点视频片段获取模块1105之后,热点视频标注处理装置还包括热点视频帧率统计模块和原始视频排序模块。Preferably, after the hotspot video segment acquisition module 1105, the hotspot video annotation processing device further includes a hotspot video frame rate statistics module and an original video sorting module.
热点视频帧率统计模块,用于基于热点视频片段,统计原始视频对应的热点视频帧率。The hotspot video frame rate statistics module is used to calculate the hotspot video frame rate corresponding to the original video based on the hotspot video clips.
原始视频排序模块,用于基于原始视频对应的热点视频帧率,对原始视频进行排序,并依据排序结果在客户端上显示。The original video sorting module is used to sort the original video based on the hot video frame rate corresponding to the original video and display it on the client according to the sorting result.
优选地,热点视频帧率统计模块包括片段总帧数确定单元、视频总帧数确定单元和热点视频帧率获取单元。Preferably, the hotspot video frame rate statistics module includes a total frame number determination unit for the clip, a total video frame number determination unit, and a hotspot video frame rate acquisition unit.
片段总帧数确定单元,用于统计每一热点视频片段中原始视频图像的数量,确定为热点视频片段的片段总帧数。The total frame number determining unit of the clip is used to count the number of original video images in each hot video segment and determine the total frame number of the hot video segment.
视频总帧数确定单元,用于统计原始视频中原始视频图像的数量,确定为原始视频的视频总帧数。The total video frame number determining unit is used to count the number of original video images in the original video and determine the total number of video frames of the original video.
热点视频帧率获取单元,用于采用热点视频帧率公式对热点视频片段的片段总帧数和原始视频的视频总帧数进行计算,获取原始视频对应的热点视频帧率,热点视频帧率公式为
Figure PCTCN2019088957-appb-000006
其中,Z为热点视频帧率,w j为第j个热点视频片段的片段总帧数,m为热点视频片段的数量,K为原始视频的视频总帧数。
Hotspot video frame rate acquisition unit, used to calculate the total frame number of the hotspot video clip and the total video frame of the original video using the hotspot video frame rate formula, to obtain the hotspot video frame rate corresponding to the original video, and the hotspot video frame rate formula for
Figure PCTCN2019088957-appb-000006
Where, Z is the hot spot video frame rate, w j is the total frame number of the jth hot spot video segment, m is the number of hot spot video segments, and K is the total video frame number of the original video.
关于热点视频标注处理装置的具体限定可以参见上文中对于热点视频标注处理方法的限定,在此不再赘述。上述热点视频标注处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For specific definitions of the hotspot video annotation processing device, reference may be made to the above limitation on the hotspot video annotation processing method, and details are not described herein again. Each module in the above hotspot video annotation processing device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图12所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储执行上述基于热点视频标 注处理方法过程中采用或者生成的数据,如原始视频图像的数量。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种热点视频标注处理方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure may be as shown in FIG. 12. The computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer device is used to store data used or generated during the execution of the above hot-spot video annotation processing method, such as the number of original video images. The network interface of the computer device is used to communicate with external terminals through a network connection. When the computer readable instructions are executed by the processor to implement a hotspot video annotation processing method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中热点视频标注处理方法,例如图2所示的步骤S201-S205,或者图3至图10中所示的步骤,为避免重复,这里不再赘述。或者,处理器执行计算机可读指令时实现热点视频标注处理装置这一实施例中的各模块/单元的功能,例如图11所示的录制视频获取模块1101、瞬时情绪值获取模块1102、激烈情绪概率确定模块1103、热点视频图像确定模块1104和热点视频片段获取模块1105的功能,为避免重复,这里不再赘述。In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. When the processor executes the computer-readable instructions, the hot spots in the above embodiments are implemented Video annotation processing methods, such as steps S201-S205 shown in FIG. 2 or steps shown in FIGS. 3-10, are not repeated here to avoid repetition. Alternatively, the processor implements the functions of each module/unit in the embodiment of the hotspot video annotation processing device when executing computer-readable instructions, for example, the recorded video acquisition module 1101 shown in FIG. 11, the instant emotion value acquisition module 1102, and the intense emotion The functions of the probability determination module 1103, the hotspot video image determination module 1104, and the hotspot video segment acquisition module 1105 are described here to avoid repetition.
在一实施例中,提供一计算机可读存储介质,该计算机可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现上述实施例中热点视频标注处理方法,例如图2所示的步骤S201-S205,或者图3至图10中所示的步骤,为避免重复,这里不再赘述。或者,该计算机可读指令被处理器执行时实现上述热点视频标注处理装置这一实施例中的各模块/单元的功能,例如图11所示的录制视频获取模块1101、瞬时情绪值获取模块1102、激烈情绪概率确定模块1103、热点视频图像确定模块1104和热点视频片段获取模块1105的功能,为避免重复,这里不再赘述。In an embodiment, a computer-readable storage medium is provided, and the computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by a processor, the hotspot video annotation processing method in the foregoing embodiment is implemented, for example The steps S201-S205 shown in FIG. 2 or the steps shown in FIGS. 3-10 are not repeated here to avoid repetition. Alternatively, when the computer-readable instructions are executed by the processor, the functions of each module/unit in the embodiment of the above-mentioned hotspot video annotation processing apparatus are realized, for example, the recorded video acquisition module 1101 shown in FIG. 11 and the instantaneous emotion value acquisition module 1102 1. The functions of the intense emotion probability determination module 1103, the hotspot video image determination module 1104, and the hotspot video segment acquisition module 1105. To avoid repetition, they are not described here.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art may understand that all or part of the process in the method of the above embodiments may be completed by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions may be stored in a non-volatile computer-readable In the storage medium, when the computer-readable instructions are executed, they may include the processes of the foregoing method embodiments. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for convenience and conciseness of description, only the above-mentioned division of each functional unit and module is used as an example for illustration. In actual applications, the above-mentioned functions can be allocated by different functional units, Module completion means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that: The recorded technical solutions are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of this application, and should be included in this application. Within the scope of protection.

Claims (20)

  1. 一种热点视频标注处理方法,其特征在于,包括:A hotspot video annotation processing method, which is characterized by including:
    获取客户端播放原始视频的同时采集到的用户的录制视频,所述原始视频包括至少一帧原始视频图像,所述录制视频包括至少一帧待识别图像,每一所述待识别图像的录制时间戳与一所述原始视频图像的播放时间戳关联;Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;
    采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值;Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;
    依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率;Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
    若所述激烈情绪概率大于第一概率阈值,则将所述原始视频图像确定为热点视频图像;If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;
    基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段。Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  2. 如权利要求1所述的热点视频标注处理方法,其特征在于,所述采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值,包括:The hotspot video annotation processing method according to claim 1, wherein the micro-expression recognition model is used to identify each of the images to be recognized, and obtaining the instantaneous emotion value corresponding to the images to be recognized includes:
    采用微表情识别模型对每一所述待识别图像进行识别,获取至少一种识别表情类型对应的瞬时概率;Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous probability corresponding to at least one type of recognized expression;
    将所述瞬时概率最大的识别表情类型确定为所述待识别图像的微表情类型;Determining the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized;
    基于所述微表情类型查询情绪值对照表,获取所述待识别图像的瞬时情绪值。Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.
  3. 如权利要求1所述的热点视频标注处理方法,其特征在于,所述依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率,包括:The hotspot video annotation processing method according to claim 1, wherein the original emotion corresponding to the playback timestamp is determined according to the instantaneous emotion value of the image to be identified corresponding to all recording timestamps associated with the same playback timestamp Probability of intense emotions in video images, including:
    统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像的图像总数量;Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp;
    若所述待识别图像对应的瞬时情绪值的绝对值大于预设情绪阈值,则所述待识别图像的情绪属性为激烈情绪;If the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion;
    统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像中,情绪属性为激烈情绪的待识别图像的数量为激烈情绪数量;Count the number of images to be recognized corresponding to all recording timestamps associated with the same playback timestamp, and the number of images to be recognized with intense emotions is the number of intense emotions;
    采用激烈情绪概率公式对所述图像总数量和所述激烈情绪数量进行计算,确定所述播放时间戳对应的原始视频图像的激烈情绪概率,所述激烈情绪概率公式为L=A/B,L为所述激烈情绪概率,A为所述激烈情绪数量,B为所述图像总数量。Use the intense emotion probability formula to calculate the total number of images and the intense emotion quantity to determine the intense emotion probability of the original video image corresponding to the playback timestamp. The intense emotion probability formula is L=A/B, L Is the probability of intense emotion, A is the number of intense emotions, and B is the total number of images.
  4. 如权利要求1所述的热点视频标注处理方法,其特征在于,所述基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段,包括:The hotspot video annotation processing method according to claim 1, wherein the hotspot annotation of the original video based on the hotspot video image to obtain hotspot video clips includes:
    统计任意两个热点视频图像之间形成的原始视频片段的帧数,确定为视频片段帧数;Count the number of frames of the original video clip formed between any two hot video images and determine the number of frames of the video clip;
    若所述视频片段帧数小于或等于第一帧数阈值,则将所述原始视频片段确定为热点视频片段;If the frame number of the video clip is less than or equal to the first frame number threshold, the original video clip is determined to be a hot video clip;
    若所述视频片段帧数大于第一帧数阈值,且小于或者等于第二帧数阈值,则基于所述原始视频片段对应的播放时间戳,获取所述原始视频片段对应的波动情绪概率;If the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold, then based on the playback timestamp corresponding to the original video clip, the fluctuation emotion probability corresponding to the original video clip is obtained;
    若所述波动情绪概率大于第二概率阈值,则将所述原始视频片段确定为热点视频片段。If the fluctuation emotion probability is greater than the second probability threshold, the original video clip is determined to be a hot video clip.
  5. 如权利要求1所述的热点视频标注处理方法,其特征在于,所述基于所述原始视频片段对应的播放时间戳,获取所述原始视频片段对应的波动情绪概率,包括:The hotspot video annotation processing method according to claim 1, wherein the acquiring the fluctuating emotion probability corresponding to the original video segment based on the playback time stamp corresponding to the original video segment includes:
    基于所述原始视频片段对应的播放时间戳,从与所述原始视频相对应的录制视频中截取与所述播放时间戳对应的录制视频片段;Based on the playback timestamp corresponding to the original video segment, intercepting the recorded video segment corresponding to the playback timestamp from the recorded video corresponding to the original video;
    获取所述录制视频片段中每一所述待识别图像对应的瞬时情绪值;Acquiring the instantaneous emotion value corresponding to each image to be identified in the recorded video segment;
    采用标准差公式对所述录制视频片段中所有待识别图像对应的瞬时情绪值进行计算,获取情绪值标准差;所述标准差公式为
    Figure PCTCN2019088957-appb-100001
    其中,S N为录制视频片段的情绪值标准差,N为录制视频片段中待识别图像的数量,x i为每一待识别图像的瞬时情绪值,
    Figure PCTCN2019088957-appb-100002
    为录制视频片段中所有瞬时 情绪值x i的均值;
    The standard deviation formula is used to calculate the instantaneous emotion values corresponding to all the images to be recognized in the recorded video clip to obtain the standard deviation of the emotion value; the standard deviation formula is
    Figure PCTCN2019088957-appb-100001
    Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized,
    Figure PCTCN2019088957-appb-100002
    Is the average of all instantaneous emotion values x i in the recorded video clip;
    若所述情绪值标准差大于标准差阈值,则所述录制视频片段为情绪波动视频片段;If the standard deviation of the sentiment value is greater than the standard deviation threshold, the recorded video clip is a video clip of emotional fluctuation;
    采用波动情绪概率公式对所述情绪波动视频片段的数量和所述录制视频片段的数量进行计算,获取所述原始视频片段的波动情绪概率,所述波动情绪概率公式为P=C/D,P为所述波动情绪概率,C为所述情绪波动视频片段的数量,D为所述录制视频片段的数量。Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip, the fluctuation emotion probability formula is P=C/D, P Is the fluctuating emotion probability, C is the number of the fluctuating video clips, and D is the number of the recorded video clips.
  6. 如权利要求1所述的热点视频标注处理方法,其特征在于,每一所述录制视频与一用户ID关联;The hotspot video annotation processing method according to claim 1, wherein each of the recorded videos is associated with a user ID;
    在所述获取热点视频片段之后,所述热点视频标注处理方法还包括:After the hot video segment is obtained, the hot video annotation processing method further includes:
    基于所述热点视频片段对应的播放时间戳,从与所述用户ID相对应的录制视频中截取与所述播放时间戳对应的目标视频片段;Based on the playback timestamp corresponding to the hot video segment, intercepting the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID;
    获取所述目标视频片段中每一所述待识别图像对应的瞬时情绪值;Acquiring the instantaneous emotion value corresponding to each of the images to be recognized in the target video segment;
    基于所述瞬时情绪值查询情绪标签对照表,获取所述待识别图像对应的单帧情绪标签;Query an emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized;
    基于所述待识别图像对应的单帧情绪标签,获取目标视频片段对应的片段情绪标签;Based on the single-frame emotional label corresponding to the image to be identified, acquiring the segment emotional label corresponding to the target video segment;
    若所述片段情绪标签为预设情绪标签,基于所述用户ID查询用户画像数据库,获取与所述用户ID相对应的用户标签,基于所述用户标签确定目标用户,将所述热点视频片段推送给所述目标用户对应的客户端;If the segment emotion tag is a preset emotion tag, query a user portrait database based on the user ID, obtain a user tag corresponding to the user ID, determine a target user based on the user tag, and push the hot video segment To the client corresponding to the target user;
    或者,若所述片段情绪标签为预设情绪标签,则基于所述热点视频片段对应的播放时间戳,查询视频数据库,获取与所述热点视频片段相对应的内容标签,将与所述内容标签相对应的热点视频片段确定为推荐视频片段,将所述推荐视频片段推送给所述用户ID对应的客户端。Alternatively, if the clip emotion tag is a preset emotion tag, then query the video database based on the playback timestamp corresponding to the hotspot video clip to obtain the content tag corresponding to the hotspot video clip, which will be the content tag The corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.
  7. 如权利要求1所述的热点视频标注处理方法,其特征在于,在所述获取热点视频片段之后,所述热点视频标注处理方法还包括:The hotspot video annotation processing method according to claim 1, wherein after the hotspot video segment is acquired, the hotspot video annotation processing method further comprises:
    基于所述热点视频片段,统计所述原始视频对应的热点视频帧率;Based on the hotspot video segment, counting the hotspot video frame rate corresponding to the original video;
    基于所述原始视频对应的热点视频帧率,对所述原始视频进行排序,并依据排序结果在所述客户端上显示。Based on the hot video frame rate corresponding to the original video, the original video is sorted and displayed on the client according to the sorting result.
  8. 一种热点视频标注处理装置,其特征在于,包括:A hotspot video annotation processing device, which is characterized by comprising:
    录制视频获取模块,用于获取客户端播放原始视频的同时采集到的用户的录制视频,所述原始视频包括至少一帧原始视频图像,所述录制视频包括至少一帧待识别图像,每一所述待识别图像的录制时间戳与一所述原始视频图像的播放时间戳关联;The recorded video acquisition module is used to acquire the user's recorded video collected by the client while playing the original video. The original video includes at least one frame of original video image, and the recorded video includes at least one frame of image to be recognized. The recording timestamp of the image to be identified is associated with a playback timestamp of the original video image;
    瞬时情绪值获取模块,用于采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值;An instantaneous emotion value acquisition module, which is used to identify each of the images to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the images to be recognized;
    激烈情绪概率确定模块,用于依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率;The intense emotion probability determination module is used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
    热点视频图像确定模块,用于若所述激烈情绪概率大于第一概率阈值,则将所述原始视频图像确定为热点视频图像;A hotspot video image determination module, configured to determine the original video image as a hotspot video image if the intense emotion probability is greater than a first probability threshold;
    热点视频片段获取模块,基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段。The hot-spot video clip acquisition module performs hot-spot annotation on the original video based on the hot-spot video image to obtain hot-spot video clips.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, characterized in that, when the processor executes the computer-readable instructions, it is implemented as follows step:
    获取客户端播放原始视频的同时采集到的用户的录制视频,所述原始视频包括至少一帧原始视频图像,所述录制视频包括至少一帧待识别图像,每一所述待识别图像的录制时间戳与一所述原始视频图像的播放时间戳关联;Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;
    采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值;Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;
    依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率;Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
    若所述激烈情绪概率大于第一概率阈值,则将所述原始视频图像确定为热点视频图像;If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;
    基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段。Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  10. 如权利要求9所述的计算机设备,其特征在于,所述采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值,包括:The computer device according to claim 9, wherein the micro-expression recognition model is used to recognize each of the images to be recognized, and obtaining the instantaneous emotion value corresponding to the images to be recognized includes:
    采用微表情识别模型对每一所述待识别图像进行识别,获取至少一种识别表情类型对应的瞬时概率;Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous probability corresponding to at least one type of recognized expression;
    将所述瞬时概率最大的识别表情类型确定为所述待识别图像的微表情类型;Determining the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized;
    基于所述微表情类型查询情绪值对照表,获取所述待识别图像的瞬时情绪值。Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.
  11. 如权利要求9所述的计算机设备,其特征在于,所述依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率,包括:The computer device according to claim 9, wherein the original video image corresponding to the playback timestamp is determined according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp Probability of intense emotions, including:
    统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像的图像总数量;Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp;
    若所述待识别图像对应的瞬时情绪值的绝对值大于预设情绪阈值,则所述待识别图像的情绪属性为激烈情绪;If the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion;
    统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像中,情绪属性为激烈情绪的待识别图像的数量为激烈情绪数量;Count the number of images to be recognized corresponding to all recording timestamps associated with the same playback timestamp, and the number of images to be recognized with intense emotions is the number of intense emotions;
    采用激烈情绪概率公式对所述图像总数量和所述激烈情绪数量进行计算,确定所述播放时间戳对应的原始视频图像的激烈情绪概率,所述激烈情绪概率公式为L=A/B,L为所述激烈情绪概率,A为所述激烈情绪数量,B为所述图像总数量。Use the intense emotion probability formula to calculate the total number of images and the intense emotion quantity to determine the intense emotion probability of the original video image corresponding to the playback timestamp. The intense emotion probability formula is L=A/B, L Is the probability of intense emotion, A is the number of intense emotions, and B is the total number of images.
  12. 如权利要求9所述的计算机设备,其特征在于,所述基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段,包括:The computer device according to claim 9, wherein the hotspot annotation of the original video based on the hotspot video image to obtain hotspot video clips includes:
    统计任意两个热点视频图像之间形成的原始视频片段的帧数,确定为视频片段帧数;Count the number of frames of the original video clip formed between any two hot video images and determine the number of frames of the video clip;
    若所述视频片段帧数小于或等于第一帧数阈值,则将所述原始视频片段确定为热点视频片段;If the frame number of the video clip is less than or equal to the first frame number threshold, the original video clip is determined to be a hot video clip;
    若所述视频片段帧数大于第一帧数阈值,且小于或者等于第二帧数阈值,则基于所述原始视频片段对应的播放时间戳,获取所述原始视频片段对应的波动情绪概率;If the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold, then based on the playback timestamp corresponding to the original video clip, the fluctuation emotion probability corresponding to the original video clip is obtained;
    若所述波动情绪概率大于第二概率阈值,则将所述原始视频片段确定为热点视频片段。If the fluctuation emotion probability is greater than the second probability threshold, the original video clip is determined to be a hot video clip.
  13. 如权利要求9所述的计算机设备,其特征在于,所述基于所述原始视频片段对应的播放时间戳,获取所述原始视频片段对应的波动情绪概率,包括:The computer device according to claim 9, wherein the obtaining the probability of fluctuating emotion corresponding to the original video segment based on the playback timestamp corresponding to the original video segment includes:
    基于所述原始视频片段对应的播放时间戳,从与所述原始视频相对应的录制视频中截取与所述播放时间戳对应的录制视频片段;Based on the playback timestamp corresponding to the original video segment, intercepting the recorded video segment corresponding to the playback timestamp from the recorded video corresponding to the original video;
    获取所述录制视频片段中每一所述待识别图像对应的瞬时情绪值;Acquiring the instantaneous emotion value corresponding to each image to be identified in the recorded video segment;
    采用标准差公式对所述录制视频片段中所有待识别图像对应的瞬时情绪值进行计算,获取情绪值标准差;所述标准差公式为
    Figure PCTCN2019088957-appb-100003
    其中,S N为录制视频片段的情绪值标准差,N为录制视频片段中待识别图像的数量,x i为每一待识别图像的瞬时情绪值,
    Figure PCTCN2019088957-appb-100004
    为录制视频片段中所有瞬时情绪值x i的均值;
    The standard deviation formula is used to calculate the instantaneous emotion values corresponding to all the images to be recognized in the recorded video clip to obtain the standard deviation of the emotion value; the standard deviation formula is
    Figure PCTCN2019088957-appb-100003
    Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized,
    Figure PCTCN2019088957-appb-100004
    Is the average of all instantaneous emotion values x i in the recorded video clip;
    若所述情绪值标准差大于标准差阈值,则所述录制视频片段为情绪波动视频片段;If the standard deviation of the sentiment value is greater than the standard deviation threshold, the recorded video clip is a video clip of emotional fluctuation;
    采用波动情绪概率公式对所述情绪波动视频片段的数量和所述录制视频片段的数量进行计算,获取所述原始视频片段的波动情绪概率,所述波动情绪概率公式为P=C/D,P为所述波动情绪概率,C为所述情绪波动视频片段的数量,D为所述录制视频片段的数量。Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip, the fluctuation emotion probability formula is P=C/D, P Is the fluctuating emotion probability, C is the number of the fluctuating video clips, and D is the number of the recorded video clips.
  14. 如权利要求9所述的计算机设备,其特征在于,每一所述录制视频与一用户ID关联;The computer device of claim 9, wherein each of the recorded videos is associated with a user ID;
    在所述获取热点视频片段之后,所述处理器执行所述计算机可读指令时还实现如下步骤:After the hot video segment is acquired, the processor further implements the following steps when executing the computer-readable instructions:
    基于所述热点视频片段对应的播放时间戳,从与所述用户ID相对应的录制视频中截取与所述播放 时间戳对应的目标视频片段;Based on the playback timestamp corresponding to the hot video segment, intercepting the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID;
    获取所述目标视频片段中每一所述待识别图像对应的瞬时情绪值;Acquiring the instantaneous emotion value corresponding to each of the images to be recognized in the target video segment;
    基于所述瞬时情绪值查询情绪标签对照表,获取所述待识别图像对应的单帧情绪标签;Query an emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized;
    基于所述待识别图像对应的单帧情绪标签,获取目标视频片段对应的片段情绪标签;Based on the single-frame emotional label corresponding to the image to be identified, acquiring the segment emotional label corresponding to the target video segment;
    若所述片段情绪标签为预设情绪标签,基于所述用户ID查询用户画像数据库,获取与所述用户ID相对应的用户标签,基于所述用户标签确定目标用户,将所述热点视频片段推送给所述目标用户对应的客户端;If the segment emotion tag is a preset emotion tag, query a user portrait database based on the user ID, obtain a user tag corresponding to the user ID, determine a target user based on the user tag, and push the hot video segment To the client corresponding to the target user;
    或者,若所述片段情绪标签为预设情绪标签,则基于所述热点视频片段对应的播放时间戳,查询视频数据库,获取与所述热点视频片段相对应的内容标签,将与所述内容标签相对应的热点视频片段确定为推荐视频片段,将所述推荐视频片段推送给所述用户ID对应的客户端。Alternatively, if the clip emotion tag is a preset emotion tag, then query the video database based on the playback timestamp corresponding to the hotspot video clip to obtain the content tag corresponding to the hotspot video clip, which will be the content tag The corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, characterized in that the computer readable instructions are processed by one or more When the processor executes, the one or more processors execute the following steps:
    获取客户端播放原始视频的同时采集到的用户的录制视频,所述原始视频包括至少一帧原始视频图像,所述录制视频包括至少一帧待识别图像,每一所述待识别图像的录制时间戳与一所述原始视频图像的播放时间戳关联;Obtaining the user's recorded video collected while the client plays the original video, the original video includes at least one frame of original video image, the recorded video includes at least one frame of image to be recognized, and the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image;
    采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值;Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous emotion value corresponding to the images to be recognized;
    依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率;Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
    若所述激烈情绪概率大于第一概率阈值,则将所述原始视频图像确定为热点视频图像;If the intense emotion probability is greater than the first probability threshold, the original video image is determined to be a hot video image;
    基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段。Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述采用微表情识别模型对每一所述待识别图像进行识别,获取所述待识别图像对应的瞬时情绪值,包括:The non-volatile readable storage medium according to claim 15, wherein the micro-expression recognition model is used to identify each of the images to be recognized, and the instantaneous emotion value corresponding to the images to be recognized is obtained, include:
    采用微表情识别模型对每一所述待识别图像进行识别,获取至少一种识别表情类型对应的瞬时概率;Use a micro-expression recognition model to identify each of the images to be recognized, and obtain the instantaneous probability corresponding to at least one type of recognized expression;
    将所述瞬时概率最大的识别表情类型确定为所述待识别图像的微表情类型;Determining the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized;
    基于所述微表情类型查询情绪值对照表,获取所述待识别图像的瞬时情绪值。Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.
  17. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述依据与同一播放时间戳关联的所有录制时间戳对应的待识别图像的瞬时情绪值,确定所述播放时间戳对应的原始视频图像的激烈情绪概率,包括:The non-volatile readable storage medium according to claim 15, wherein the playback timestamp is determined according to the instantaneous emotion value of the image to be identified corresponding to all recording timestamps associated with the same playback timestamp The probabilistic emotional probability of the corresponding original video image, including:
    统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像的图像总数量;Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp;
    若所述待识别图像对应的瞬时情绪值的绝对值大于预设情绪阈值,则所述待识别图像的情绪属性为激烈情绪;If the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion;
    统计与同一播放时间戳关联的所有录制时间戳对应的待识别图像中,情绪属性为激烈情绪的待识别图像的数量为激烈情绪数量;Count the number of images to be recognized corresponding to all recording timestamps associated with the same playback timestamp, and the number of images to be recognized with intense emotions is the number of intense emotions;
    采用激烈情绪概率公式对所述图像总数量和所述激烈情绪数量进行计算,确定所述播放时间戳对应的原始视频图像的激烈情绪概率,所述激烈情绪概率公式为L=A/B,L为所述激烈情绪概率,A为所述激烈情绪数量,B为所述图像总数量。Use the intense emotion probability formula to calculate the total number of images and the intense emotion quantity to determine the intense emotion probability of the original video image corresponding to the playback timestamp. The intense emotion probability formula is L=A/B, L Is the probability of intense emotion, A is the number of intense emotions, and B is the total number of images.
  18. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述基于所述热点视频图像对所述原始视频进行热点标注,获取热点视频片段,包括:The non-volatile readable storage medium according to claim 15, wherein the hotspot annotation of the original video based on the hotspot video image to obtain hotspot video clips includes:
    统计任意两个热点视频图像之间形成的原始视频片段的帧数,确定为视频片段帧数;Count the number of frames of the original video clip formed between any two hot video images and determine the number of frames of the video clip;
    若所述视频片段帧数小于或等于第一帧数阈值,则将所述原始视频片段确定为热点视频片段;If the frame number of the video clip is less than or equal to the first frame number threshold, the original video clip is determined to be a hot video clip;
    若所述视频片段帧数大于第一帧数阈值,且小于或者等于第二帧数阈值,则基于所述原始视频片段对应的播放时间戳,获取所述原始视频片段对应的波动情绪概率;If the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold, then based on the playback timestamp corresponding to the original video clip, the fluctuation emotion probability corresponding to the original video clip is obtained;
    若所述波动情绪概率大于第二概率阈值,则将所述原始视频片段确定为热点视频片段。If the fluctuation emotion probability is greater than the second probability threshold, the original video clip is determined to be a hot video clip.
  19. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述基于所述原始视频片段对应的播放时间戳,获取所述原始视频片段对应的波动情绪概率,包括:The non-volatile readable storage medium according to claim 15, wherein the acquiring the fluctuating emotion probability corresponding to the original video segment based on the playback time stamp corresponding to the original video segment includes:
    基于所述原始视频片段对应的播放时间戳,从与所述原始视频相对应的录制视频中截取与所述播放时间戳对应的录制视频片段;Based on the playback timestamp corresponding to the original video segment, intercepting the recorded video segment corresponding to the playback timestamp from the recorded video corresponding to the original video;
    获取所述录制视频片段中每一所述待识别图像对应的瞬时情绪值;Acquiring the instantaneous emotion value corresponding to each image to be identified in the recorded video segment;
    采用标准差公式对所述录制视频片段中所有待识别图像对应的瞬时情绪值进行计算,获取情绪值标准差;所述标准差公式为
    Figure PCTCN2019088957-appb-100005
    其中,S N为录制视频片段的情绪值标准差,N为录制视频片段中待识别图像的数量,x i为每一待识别图像的瞬时情绪值,
    Figure PCTCN2019088957-appb-100006
    为录制视频片段中所有瞬时情绪值x i的均值;
    The standard deviation formula is used to calculate the instantaneous emotion values corresponding to all the images to be recognized in the recorded video clip to obtain the standard deviation of the emotion value; the standard deviation formula is
    Figure PCTCN2019088957-appb-100005
    Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized,
    Figure PCTCN2019088957-appb-100006
    Is the average of all instantaneous emotion values x i in the recorded video clip;
    若所述情绪值标准差大于标准差阈值,则所述录制视频片段为情绪波动视频片段;If the standard deviation of the sentiment value is greater than the standard deviation threshold, the recorded video clip is a video clip of emotional fluctuation;
    采用波动情绪概率公式对所述情绪波动视频片段的数量和所述录制视频片段的数量进行计算,获取所述原始视频片段的波动情绪概率,所述波动情绪概率公式为P=C/D,P为所述波动情绪概率,C为所述情绪波动视频片段的数量,D为所述录制视频片段的数量。Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip, the fluctuation emotion probability formula is P=C/D, P Is the fluctuating emotion probability, C is the number of the fluctuating video clips, and D is the number of the recorded video clips.
  20. 如权利要求15所述的非易失性可读存储介质,其特征在于,每一所述录制视频与一用户ID关联;The non-volatile readable storage medium of claim 15, wherein each of the recorded videos is associated with a user ID;
    在所述获取热点视频片段之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:After the hot video segment is acquired, when the computer-readable instructions are executed by one or more processors, the one or more processors further perform the following steps:
    基于所述热点视频片段对应的播放时间戳,从与所述用户ID相对应的录制视频中截取与所述播放时间戳对应的目标视频片段;Based on the playback timestamp corresponding to the hot video segment, intercepting the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID;
    获取所述目标视频片段中每一所述待识别图像对应的瞬时情绪值;Acquiring the instantaneous emotion value corresponding to each of the images to be recognized in the target video segment;
    基于所述瞬时情绪值查询情绪标签对照表,获取所述待识别图像对应的单帧情绪标签;Query an emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized;
    基于所述待识别图像对应的单帧情绪标签,获取目标视频片段对应的片段情绪标签;Based on the single-frame emotional label corresponding to the image to be identified, acquiring the segment emotional label corresponding to the target video segment;
    若所述片段情绪标签为预设情绪标签,基于所述用户ID查询用户画像数据库,获取与所述用户ID相对应的用户标签,基于所述用户标签确定目标用户,将所述热点视频片段推送给所述目标用户对应的客户端;If the segment emotion tag is a preset emotion tag, query a user portrait database based on the user ID, obtain a user tag corresponding to the user ID, determine a target user based on the user tag, and push the hot video segment To the client corresponding to the target user;
    或者,若所述片段情绪标签为预设情绪标签,则基于所述热点视频片段对应的播放时间戳,查询视频数据库,获取与所述热点视频片段相对应的内容标签,将与所述内容标签相对应的热点视频片段确定为推荐视频片段,将所述推荐视频片段推送给所述用户ID对应的客户端。Alternatively, if the clip emotion tag is a preset emotion tag, then query the video database based on the playback timestamp corresponding to the hotspot video clip to obtain the content tag corresponding to the hotspot video clip, which will be the content tag The corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.
PCT/CN2019/088957 2019-01-11 2019-05-29 Hotspot video annotation processing method and apparatus, computer device and storage medium WO2020143156A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910025355.9 2019-01-11
CN201910025355.9A CN109819325B (en) 2019-01-11 2019-01-11 Hotspot video annotation processing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2020143156A1 true WO2020143156A1 (en) 2020-07-16

Family

ID=66604271

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088957 WO2020143156A1 (en) 2019-01-11 2019-05-29 Hotspot video annotation processing method and apparatus, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN109819325B (en)
WO (1) WO2020143156A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291589A (en) * 2020-10-29 2021-01-29 腾讯科技(深圳)有限公司 Video file structure detection method and device
CN112699774A (en) * 2020-12-28 2021-04-23 深延科技(北京)有限公司 Method and device for recognizing emotion of person in video, computer equipment and medium
CN113127576A (en) * 2021-04-15 2021-07-16 微梦创科网络科技(中国)有限公司 Hotspot discovery method and system based on user content consumption analysis
CN114445896A (en) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 Method and device for evaluating confidence degree of human statement content in video
CN116386060A (en) * 2023-03-23 2023-07-04 浪潮智慧科技有限公司 Automatic water gauge data labeling method, device, equipment and medium

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819325B (en) * 2019-01-11 2021-08-20 平安科技(深圳)有限公司 Hotspot video annotation processing method and device, computer equipment and storage medium
CN110401847B (en) * 2019-07-17 2021-08-06 咪咕文化科技有限公司 Compression storage method, electronic equipment and system for cloud DVR video
CN110519617B (en) * 2019-07-18 2023-04-07 平安科技(深圳)有限公司 Video comment processing method and device, computer equipment and storage medium
CN110418204B (en) * 2019-07-18 2022-11-04 平安科技(深圳)有限公司 Video recommendation method, device, equipment and storage medium based on micro expression
CN110353705B (en) * 2019-08-01 2022-10-25 秒针信息技术有限公司 Method and device for recognizing emotion
CN110647812B (en) * 2019-08-19 2023-09-19 平安科技(深圳)有限公司 Tumble behavior detection processing method and device, computer equipment and storage medium
CN110826471B (en) * 2019-11-01 2023-07-14 腾讯科技(深圳)有限公司 Video tag labeling method, device, equipment and computer readable storage medium
CN111343483B (en) * 2020-02-18 2022-07-19 北京奇艺世纪科技有限公司 Method and device for prompting media content segment, storage medium and electronic device
CN111447505B (en) * 2020-03-09 2022-05-31 咪咕文化科技有限公司 Video clipping method, network device, and computer-readable storage medium
CN111629222B (en) * 2020-05-29 2022-12-20 腾讯科技(深圳)有限公司 Video processing method, device and storage medium
CN111860302B (en) * 2020-07-17 2024-03-01 北京百度网讯科技有限公司 Image labeling method and device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161409A1 (en) * 2008-12-23 2010-06-24 Samsung Electronics Co., Ltd. Apparatus for providing content according to user's interest in content and method for providing content according to user's interest in content
CN102693739A (en) * 2011-03-24 2012-09-26 腾讯科技(深圳)有限公司 Method and system for video clip generation
CN103873492A (en) * 2012-12-07 2014-06-18 联想(北京)有限公司 Electronic device and data transmission method
CN105022801A (en) * 2015-06-30 2015-11-04 北京奇艺世纪科技有限公司 Hot video mining method and hot video mining device
CN105615902A (en) * 2014-11-06 2016-06-01 北京三星通信技术研究有限公司 Emotion monitoring method and device
CN107257509A (en) * 2017-07-13 2017-10-17 上海斐讯数据通信技术有限公司 The filter method and device of a kind of video content
CN107809673A (en) * 2016-09-09 2018-03-16 索尼公司 According to the system and method for emotional state detection process video content
CN107888947A (en) * 2016-09-29 2018-04-06 法乐第(北京)网络科技有限公司 A kind of video broadcasting method and device
CN109819325A (en) * 2019-01-11 2019-05-28 平安科技(深圳)有限公司 Hot video marks processing method, device, computer equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130139256A1 (en) * 2011-11-30 2013-05-30 Elwha LLC, a limited liability corporation of the State of Delaware Deceptive indicia profile generation from communications interactions
CN104681048A (en) * 2013-11-28 2015-06-03 索尼公司 Multimedia read control device, curve acquiring device, electronic equipment and curve providing device and method
CN106341712A (en) * 2016-09-30 2017-01-18 北京小米移动软件有限公司 Processing method and apparatus of multimedia data
CN106792170A (en) * 2016-12-14 2017-05-31 合网络技术(北京)有限公司 Method for processing video frequency and device
CN107968961B (en) * 2017-12-05 2020-06-02 吕庆祥 Video editing method and device based on emotional curve
CN108093297A (en) * 2017-12-29 2018-05-29 厦门大学 A kind of method and system of filmstrip automatic collection
CN109151576A (en) * 2018-06-20 2019-01-04 新华网股份有限公司 Multimedia messages clipping method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161409A1 (en) * 2008-12-23 2010-06-24 Samsung Electronics Co., Ltd. Apparatus for providing content according to user's interest in content and method for providing content according to user's interest in content
CN102693739A (en) * 2011-03-24 2012-09-26 腾讯科技(深圳)有限公司 Method and system for video clip generation
CN103873492A (en) * 2012-12-07 2014-06-18 联想(北京)有限公司 Electronic device and data transmission method
CN105615902A (en) * 2014-11-06 2016-06-01 北京三星通信技术研究有限公司 Emotion monitoring method and device
CN105022801A (en) * 2015-06-30 2015-11-04 北京奇艺世纪科技有限公司 Hot video mining method and hot video mining device
CN107809673A (en) * 2016-09-09 2018-03-16 索尼公司 According to the system and method for emotional state detection process video content
CN107888947A (en) * 2016-09-29 2018-04-06 法乐第(北京)网络科技有限公司 A kind of video broadcasting method and device
CN107257509A (en) * 2017-07-13 2017-10-17 上海斐讯数据通信技术有限公司 The filter method and device of a kind of video content
CN109819325A (en) * 2019-01-11 2019-05-28 平安科技(深圳)有限公司 Hot video marks processing method, device, computer equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291589A (en) * 2020-10-29 2021-01-29 腾讯科技(深圳)有限公司 Video file structure detection method and device
CN112291589B (en) * 2020-10-29 2023-09-22 腾讯科技(深圳)有限公司 Method and device for detecting structure of video file
CN112699774A (en) * 2020-12-28 2021-04-23 深延科技(北京)有限公司 Method and device for recognizing emotion of person in video, computer equipment and medium
CN113127576A (en) * 2021-04-15 2021-07-16 微梦创科网络科技(中国)有限公司 Hotspot discovery method and system based on user content consumption analysis
CN114445896A (en) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 Method and device for evaluating confidence degree of human statement content in video
CN114445896B (en) * 2022-01-28 2024-04-05 北京百度网讯科技有限公司 Method and device for evaluating confidence of content of person statement in video
CN116386060A (en) * 2023-03-23 2023-07-04 浪潮智慧科技有限公司 Automatic water gauge data labeling method, device, equipment and medium
CN116386060B (en) * 2023-03-23 2023-11-14 浪潮智慧科技有限公司 Automatic water gauge data labeling method, device, equipment and medium

Also Published As

Publication number Publication date
CN109819325A (en) 2019-05-28
CN109819325B (en) 2021-08-20

Similar Documents

Publication Publication Date Title
WO2020143156A1 (en) Hotspot video annotation processing method and apparatus, computer device and storage medium
US11290775B2 (en) Computerized system and method for automatically detecting and rendering highlights from streaming videos
US10832738B2 (en) Computerized system and method for automatically generating high-quality digital content thumbnails from digital video
Segalin et al. What your Facebook profile picture reveals about your personality
WO2021088510A1 (en) Video classification method and apparatus, computer, and readable storage medium
US11064257B2 (en) System and method for segment relevance detection for digital content
CN109547814B (en) Video recommendation method and device, server and storage medium
US9589205B2 (en) Systems and methods for identifying a user's demographic characteristics based on the user's social media photographs
JP5795580B2 (en) Estimating and displaying social interests in time-based media
CN110519617B (en) Video comment processing method and device, computer equipment and storage medium
US8819728B2 (en) Topic to social media identity correlation
US20150296228A1 (en) Systems and Methods for Performing Multi-Modal Video Datastream Segmentation
US20160014482A1 (en) Systems and Methods for Generating Video Summary Sequences From One or More Video Segments
JP2018530847A (en) Video information processing for advertisement distribution
US10104429B2 (en) Methods and systems of dynamic content analysis
WO2020253360A1 (en) Content display method and apparatus for application, storage medium, and computer device
Narassiguin et al. Data Science for Influencer Marketing: feature processing and quantitative analysis
Yang et al. Zapping index: using smile to measure advertisement zapping likelihood
WO2022247666A1 (en) Content processing method and apparatus, and computer device and storage medium
US20160012078A1 (en) Intelligent media management system
CN111163366B (en) Video processing method and terminal
Barbieri et al. Content selection criteria for news multi-video summarization based on human strategies
CN112685596B (en) Video recommendation method and device, terminal and storage medium
US11010935B2 (en) Context aware dynamic image augmentation
US20220261580A1 (en) Detecting synthetic media

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19908319

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19908319

Country of ref document: EP

Kind code of ref document: A1