WO2020143156A1 - Procédé et appareil de traitement d'annotation vidéo d'un point d'accès public, dispositif informatique et support de stockage - Google Patents

Procédé et appareil de traitement d'annotation vidéo d'un point d'accès public, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2020143156A1
WO2020143156A1 PCT/CN2019/088957 CN2019088957W WO2020143156A1 WO 2020143156 A1 WO2020143156 A1 WO 2020143156A1 CN 2019088957 W CN2019088957 W CN 2019088957W WO 2020143156 A1 WO2020143156 A1 WO 2020143156A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
emotion
image
recognized
probability
Prior art date
Application number
PCT/CN2019/088957
Other languages
English (en)
Chinese (zh)
Inventor
刘建华
徐小方
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020143156A1 publication Critical patent/WO2020143156A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording

Definitions

  • the present application relates to the technical field of micro-expression recognition, in particular to a hotspot video annotation processing method, device, computer equipment, and storage medium.
  • video (especially online video) is the largest and fastest growing type of mobile data traffic.
  • the so-called online video refers to an audio-visual file provided by an online video service provider (for example, Baidu iQiyi), using streaming media as a playback format, and can be broadcasted online or on demand.
  • Network video generally requires an independent player, and the file format is mainly based on the P2P (Peer to Peer, peer-to-peer) technology that takes up less FLV (Flash Video, streaming media) format of client resources.
  • P2P Peer to Peer, peer-to-peer
  • FLV Flash Video, streaming media
  • Embodiments of the present application provide a hotspot video annotation processing method, device, computer equipment, and storage medium, to solve the problem of low efficiency in the current manual annotation of original video segment attributes.
  • a hotspot video annotation processing method including:
  • the original video includes at least one frame of original video image
  • the recorded video includes at least one frame of image to be recognized
  • the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image
  • the original video image is determined to be a hot video image
  • Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  • a hotspot video annotation processing device including:
  • Recorded video acquisition module used to acquire the user's recorded video collected by the client while playing the original video.
  • the original video includes at least one frame of original video image.
  • the recorded video includes at least one frame of image to be recognized.
  • the recording timestamp of the image to be identified is associated with a playback timestamp of the original video image;
  • Instantaneous emotion value acquisition module used to identify each of the images to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion values corresponding to the images to be recognized;
  • Intense emotion probability determination module used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp;
  • Hotspot video image determination module used to determine the original video image as a hotspot video image if the intense emotion probability is greater than a first probability threshold;
  • Hotspot video clip acquisition module hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • the original video includes at least one frame of original video image
  • the recorded video includes at least one frame of image to be recognized
  • the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image
  • the original video image is determined to be a hot video image
  • Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  • One or more non-volatile readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, Causing the one or more processors to perform the following steps:
  • the original video includes at least one frame of original video image
  • the recorded video includes at least one frame of image to be recognized
  • the recording time of each of the image to be recognized The stamp is associated with a playback timestamp of the original video image
  • the original video image is determined to be a hot video image
  • Hotspot the original video based on the hotspot video image to obtain hotspot video clips.
  • FIG. 1 is a schematic diagram of an application environment of a hotspot video annotation processing method in an embodiment of the present application
  • FIG. 2 is a flowchart of a hotspot video annotation processing method in an embodiment of the present application
  • FIG. 3 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 4 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 5 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 6 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 7 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 8 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 9 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 10 is another flowchart of a hotspot video annotation processing method in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a hotspot video annotation processing device in an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a computer device in an embodiment of the present application.
  • the hotspot video annotation processing method provided by the embodiment of the present application can be applied in the application environment shown in FIG. 1.
  • the hotspot video annotation processing method is applied in a video playback system.
  • the video playback system includes a client and a server as shown in FIG. 1, and the client and the server communicate through a network to implement hotspot video clips of the original video Automatic tagging improves the efficiency of hotspot video clip annotation, and implements personalized recommendation and sorting display of hotspot video clips.
  • the client is also called the user, which refers to the program corresponding to the server to provide local services for the client.
  • the client can be installed on but not limited to various personal computers, notebook computers, smart phones, tablets and portable wearable devices.
  • the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • a hotspot video annotation processing method is provided.
  • the method is applied to the server in FIG. 1 as an example for illustration, including the following steps:
  • S201 Obtain the user's recorded video collected while the client plays the original video.
  • the original video includes at least one original video image
  • the recorded video includes at least one image to be recognized.
  • the original video refers to a video played by a video playback program (that is, a client) installed on a terminal device such as a user's mobile phone and computer, for viewing by the user.
  • Recorded video refers to real-time shooting of the user's facial expression changes while watching the original video through the shooting module (such as a built-in camera) of the terminal device installed with the video playback program.
  • the original video includes at least one frame of original video image, and the original video image is a single frame image forming the original video, that is, a single image frame of the smallest unit in the original video.
  • Each original video image carries a playback timestamp, which is the timestamp of the original video image in the original video, for example, the playback timestamp of the 100s original video image in the 10min original video is 100s.
  • the recorded video includes at least one frame of image to be recognized, and the image to be recognized is a single frame image that forms the recorded video, that is, a single image screen of the smallest unit in the recorded video.
  • Each image to be recognized corresponds to a recording timestamp, which is the timestamp of the image to be recognized in the recorded video, for example, the playback timestamp of the 100s-th image to be recognized in the 10-min recorded video is 100s.
  • the recording timestamp is associated with the playback timestamp carried by the original video image, so that the image to be recognized corresponds one-to-one with the original video image, which is convenient for accurately determining the user's emotion when watching the original video.
  • each original video carries a unique video identifier, which is used to uniquely identify the corresponding original video, for example, the original video corresponding to episode XX of "XX", carries a unique video identifier XX0001, so that the server can
  • the video ID is XX0001, and the original video corresponding to episode XX of the corresponding "XX” can be obtained.
  • the playback timestamp carried by each original video image is the timestamp of the original video image in the original video.
  • the server while receiving the same original video played by the client, acquires a recorded video corresponding to the change in the expression of the original video watched by all users through a shooting module (such as a built-in camera) installed in the terminal device of the client,
  • the recorded video includes at least one frame of image to be identified, and each image to be identified corresponds to a recording time stamp, which is associated with the playback time stamp carried by the original video image. Understandably, by collecting the recorded video when different users watch the original video, it can better determine whether the original video attracts the audience, thereby helping to automatically mark the hot video segments in the original video and improve the hot video The efficiency of segment annotation.
  • obtaining the user's recorded video collected while the client plays the original video includes: (1) controlling the client to play the original video so that the playback timestamp of each original video image in the original video is Current system time association. (2) Obtain the user's recorded video collected while the client plays the original video, so that the recording timestamp of each image to be identified in the recorded video is associated with the current system time. (3) Based on the current system time, associate the recording timestamp of each image to be identified with the playback timestamp of an original video image.
  • the current system time is the current time of the system at any moment, for example, the current system time can be obtained by the currentTimeMillis method in the System class.
  • the playback timestamp of the original video corresponds to the recording timestamp of the recorded video, that is, the first frame of the original video image corresponds to the first frame of the image to be identified, So that the image to be recognized can reflect the micro expression of the user when viewing the corresponding original video image.
  • the playback time of the original video is not synchronized with the recording time of the recorded video, it is necessary to correlate the playback timestamp of the original video with the recording timestamp of the recorded video through the current system time, so that the associated image to be recognized can be reflected The micro expression of the user when viewing the corresponding original video image.
  • the time of the original video playback and the recorded video is related to the current system time, that is, if the first video is played at 10:5:10 1000 frames of the original video image, and the 10th frame of the image to be recognized is recorded at 10:5:10, the playback timestamp of the 1000th frame of the original video image is associated with the 10th frame of the image to be recognized.
  • S202 Recognize each image to be recognized using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the image to be recognized.
  • the micro-expression recognition model is a model for recognizing the micro-expression of the human face in the image to be recognized.
  • the micro-expression recognition model is to capture the local features of the user's face in the image to be recognized, and determine each target facial action unit of the human face in the image to be recognized according to the local features, and then according to the recognized target face The action unit determines the model of its micro-expression.
  • the instantaneous emotion value corresponding to the image to be recognized is the emotion value corresponding to the micro-expression type of the face in a certain image to be recognized by using the micro-expression recognition model.
  • the server first uses a micro-expression recognition model to perform micro-expression recognition on each image to be identified to determine its corresponding micro-expression type, and then queries the emotion value comparison table according to the micro-expression type to obtain the corresponding Instant mood value.
  • the micro-expression types include, but are not limited to, love, interest, surprise, expectation... aggressiveness, conflict, insult, suspicion, and fear.
  • the instantaneous emotion value of the face in the image to be recognized is obtained.
  • the micro-expression recognition model can quickly obtain the instantaneous emotion value when different users watch each original video image in the same original video, so as to analyze the hot video segment based on the instantaneous emotion value, so as to achieve the purpose of automatically tagging the hot video segment.
  • the micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on a local binary pattern (LBP).
  • the micro-expression recognition model is a local recognition model based on classification.
  • the training image data includes positive samples of each facial action unit and facial action unit. Negative samples are used to train the training image data through classification algorithms to obtain a micro-expression recognition model.
  • a large amount of training image data may be trained through an SVM classification algorithm to obtain SVM classifiers corresponding to multiple facial action units.
  • it may be 39 SVM classifiers corresponding to 39 face action units, or 54 SVM classifiers corresponding to 54 face action units, and the positive samples of different face action units included in the training image data for training The more negative samples, the more SVM classifiers are obtained. Understandably, in multiple micro-expression recognition models formed by multiple SVM classifiers, the more SVM classifiers it acquires, the more accurate the micro-expression types recognized by the formed micro-expression recognition model. Take the micro-expression recognition model formed by the SVM classifier corresponding to 54 facial action units as an example. Using this micro-expression recognition model, 54 types of micro-expressions can be identified, for example, including love, interest, surprise, expectation ... 54 types of micro-expressions such as aggression, conflict, insult, doubt and fear.
  • S203 Determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp.
  • the intense emotion probability is a probability for evaluating the motivated emotion of different to-be-recognized images watching the same original video. Understandably, if the probability of intense emotion is high, it means that the user's mood for watching the original video fluctuates greatly, and the original video has a strong attraction to the user.
  • the server first obtains the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the playback timestamp according to the playback timestamp corresponding to each original video image, so as to obtain all the viewing time corresponding to the playback timestamp
  • the instantaneous emotion value of the user of the original video image determines whether it is intense emotion, thereby analyzing the probability of intense emotion when all users watch the original video image, so that the probability of intense emotion can be objective Reflects the degree to which the user watching the same original video likes the original video or the degree of resonance.
  • the first probability threshold is a preset probability threshold for evaluating whether the original video is a hot video image.
  • the preset probability threshold may be set to 60%. If the intense emotion probability is greater than the first probability threshold, it means that a large percentage of all users who viewed the original video image (that is, greater than the first probability threshold) caused strong emotional fluctuations in watching the original video image ( That is, the emotion corresponding to the instantaneous emotion value is intense emotion), which has a higher attraction to the user, so the original video image can be determined as a hot video image.
  • S205 Hot-spot the original video based on the hot-spot video image to obtain hot-spot video clips.
  • the server may form an original video segment based on any two hotspot video images, and then based on the total number of frames of all the original video images in the original video segment and the preset frame number threshold Make a comparison to determine whether the original video clip is a hot video clip, automatically mark the original video image corresponding to the hot video image, and mark the hot video clip in the original video to realize the hot video in the original video
  • the automatic labeling of clips improves the annotation efficiency of hot video clips.
  • the hotspot video annotation processing method provided in this embodiment collects the user's recorded video while playing the original video, so that the recording timestamp of each image to be identified is associated with the playback timestamp of an original video image to ensure The objectivity of micro expression analysis of the original video. Then, the micro-expression recognition model is used to recognize the image to be recognized, and the micro-expression recognition model can quickly identify the micro-expression when the user views an original video image in the original video to obtain the intense emotion value of the user watching the original video, so as to be based on the intense The emotion value realizes the hotspot video annotation, thereby ensuring the objectivity of the hotspot video clip annotation.
  • the hotspot annotation of the video is subdivided into hotspot analysis of the original video image to ensure the objectivity and accuracy of the hotspot analysis.
  • the original video is hotspot annotated, and hotspot video clips are obtained to calculate the probability of intense emotion when the user watches the original video, so that the server can obtain hotspot video clips, so that the hotspot video clips are automatically marked, and the hotspot video clips are marked up.
  • Efficiency and accuracy provide users with a better viewing experience.
  • step S202 a micro-expression recognition model is used to identify each image to be recognized, and the instantaneous emotion value corresponding to the image to be recognized is obtained, including:
  • S301 Recognize each image to be recognized by using a micro-expression recognition model to obtain the instantaneous probability corresponding to at least one type of recognized expression.
  • the recognition expression type refers to a model that recognizes that it belongs to a certain pre-configured micro expression type when the image to be recognized is recognized by using a micro expression recognition model.
  • the micro-expression recognition model pre-trained by the server includes multiple SVM classifiers, and each SVM classifier is used to identify a facial action unit.
  • the micro-expression recognition model includes 54 SVM classifiers to establish a facial action unit number mapping table, and each facial action unit is represented by a predetermined number. For example, AU1 is the inner eyebrow lift, AU2 is the outer eyebrow lift, AU5 is the upper eyelid lift, and AU26 is the lower jaw opening.
  • Each facial action unit has a corresponding SVM classifier trained.
  • the SVM classifier corresponding to the inner eyebrow can be identified as the probability that the inner feature of the inner eyebrow is belonged to the inner eyebrow, and the SVM classifier corresponding to the outer eyebrow can be identified that the local feature of the outer eyebrow is belonged to the outer eyebrow. Probability values, etc.
  • the server when it uses a pre-trained micro-expression recognition model to recognize the image to be recognized, it may first perform key point detection and feature extraction on each image to be recognized to obtain local features of the image to be recognized.
  • the face key point algorithm can be, but not limited to, Ensemble of Regression Tress (ERT) algorithm, SIFT (scale-invariant feature) transform algorithm, SURF (Speeded UpRobust Features) algorithm, LBP (Local Binary Patterns) algorithm and HOG (Histogram of Oriented Gridients) algorithm.
  • the feature extraction algorithm may be a CNN (Convolutional Neural Network) algorithm.
  • the target facial action unit refers to the facial action unit (Action Unit, AU) obtained by recognizing the image to be recognized according to the micro-expression recognition model.
  • the probability value may specifically be a value between 0-1.
  • the output probability value is 0.6 and the preset threshold value is 0.5
  • the probability value 0.6 is greater than the preset threshold value 0.5
  • the facial action unit corresponding to 0.6 is used as the image to be recognized Target facial action unit.
  • all the acquired target facial action units are comprehensively evaluated to obtain the probability corresponding to the micro-expression type pre-configured in the micro-expression recognition model, that is, the instantaneous probability belonging to each type of recognized expression.
  • the comprehensive evaluation of all the acquired target facial action units specifically refers to obtaining the probability that this combination belongs to a pre-configured micro-expression type based on the combination of all target facial action units to determine the instantaneous probability of identifying the expression type.
  • S302 Determine the recognized expression type with the largest instantaneous probability as the micro expression type of the image to be recognized.
  • the recognized expression type with the largest instantaneous probability needs to be determined as the micro expression type corresponding to the image to be recognized. For example, when it is recognized that the image to be recognized belongs to the recognition expression type of "love”, the instantaneous probability is 0.9, while the instantaneous probability of the two recognition expression types of "doubt" and "quiet” are 0.05, respectively, then the instantaneous probability The identified expression type corresponding to a probability of 0.9 is determined as the micro-expression type of the image to be recognized, so as to ensure the accuracy of the identified micro-expression type.
  • S303 Query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized.
  • the emotion value comparison table is a preset data table for recording the emotion attribute corresponding to each micro-expression type.
  • the emotion value comparison table the association relationship between the micro-expression type and the emotion value is stored.
  • the server queries the emotion value comparison table based on the micro-expression type to obtain the corresponding instantaneous emotion value.
  • the instantaneous emotion value is a value between [-1,1], the larger the value, the more the user likes the original video image corresponding to the recording timestamp associated with the image to be recognized; the smaller the data, the more the user hates the treatment Identify the original video image corresponding to the recording timestamp associated with the image.
  • the instantaneous emotion values corresponding to the 54 micro-expression types identified by the micro-expression recognition model can be set to 1, 0.8, 0.5, 0.3, 0, -0.3, -0.5, -0.8, and -1, respectively. Any of them.
  • the hotspot video annotation processing method provided in this embodiment first uses a micro-expression recognition model to recognize the image to be recognized, so as to quickly obtain the instantaneous probability corresponding to at least one recognized expression type, and selects the identified expression type with the largest instantaneous probability to determine to be recognized The micro-expression type of the image to ensure the accuracy of the identified micro-expression type. Then query the emotion value comparison table based on the micro-expression type to obtain the instantaneous emotion value of the image to be recognized, so as to ensure the efficiency of acquiring the instantaneous emotion value of the image to be recognized.
  • the server may query the database based on the instantaneous emotion value to obtain the standard volume or standard tone corresponding to the instantaneous emotion value; and obtain the client currently playing the to-be-recognized
  • the current volume or current color tone of the image based on the standard volume or standard color tone, automatically adjust the current volume and current color tone respectively, so that the current volume and current color tone of the image to be recognized match the user's current mood, you can make the video
  • the volume or hue of the match with the user's mood at the time it is easier to cause empathy, thereby increasing the appeal of the original video to the user.
  • step S203 the original video image corresponding to the playback timestamp is determined according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp
  • the steps of intense emotion probability include:
  • S401 Count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp.
  • the total number of images is the sum of the images to be recognized corresponding to all users who have collected the original video image and collected by the server. Specifically, when annotating a hot video segment of any original video, it is necessary to obtain all recorded videos corresponding to viewing the original video, and to count the images to be identified corresponding to all the recording time stamps associated with the playback time stamp corresponding to the same original video image The number is determined as the total number of images. For example, for an original video with a video ID of XX0001, a certain original video image is an original video image with a playback timestamp of the 10th second in the original video, and the number of all images to be recognized associated with the 10th original video image is The total number of images.
  • the preset emotion threshold is a preset threshold for evaluating whether the instantaneous emotion value is intense emotion.
  • the preset emotion threshold may be set to 0.6 or other values.
  • the server compares the absolute value of the instantaneous emotion value corresponding to the image to be recognized with the preset emotion threshold, if the absolute value is greater than the preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion; otherwise, if the absolute If the value is not greater than the preset emotion threshold, the emotion attribute of the image to be recognized is plain emotion. That is, the micro-expression recognition model recognizes that the instantaneous emotion value corresponding to each image to be recognized is a value between [-1,1].
  • the micro-expression emotions can be considered as intense emotions. Such intense emotions easily resonate with users and have strong appeal.
  • the absolute value of the instantaneous emotion value is close to 0, it means that the user’s preference or dislike of the original video image in the original video being watched is smaller, indicating that the original video image does not resonate with the user.
  • the micro-expression emotion can be regarded as a dull emotion.
  • S403 Count the images to be recognized corresponding to all the recording timestamps associated with the same playback timestamp, and the number of images to be recognized whose emotion attribute is intense emotion is the number of intense emotions.
  • the server determines the number of all images to be recognized that have an emotion of intense emotion value from the images to be recognized corresponding to all recorded time stamps associated with the same playback time stamp, and determines the number as the number of intense emotions. For example, if 100 users watch the original video image corresponding to a certain playback timestamp in the same original video at the same time, then obtain 100 to-be-recognized images corresponding to all the recording timestamps associated with the same playback timestamp and use the micro-expression recognition model to identify The instantaneous emotion values of all 100 images to be identified, and whether or not they are intense emotions is determined based on the instantaneous emotion values, and the number of images to be recognized whose emotional attribute is intense emotions is determined as the intense emotion quantity, in which case the intense emotion quantity is 0 Values between -100.
  • S404 Calculate the total number of images and the number of intense emotions using the intense emotion probability formula to determine the intense emotion probability of the original video image corresponding to the playback timestamp.
  • the server may quickly calculate the intense emotion probability using the intense emotion probability formula.
  • the intense emotion probability reflects the probability of causing strong emotion fluctuations to the original video image among all users who viewed the original video image, which can well reflect the attractiveness of the original video image to the user or the degree of resonance caused by the user.
  • the hotspot video annotation processing method first obtain the total number of images of all the images to be identified corresponding to the same playback timestamp, and determine from the images to be identified corresponding to the same playback timestamp that the emotional attribute is intense.
  • the number of intense emotions is calculated using the intense emotion probability formula, which makes the acquisition of intense emotion probabilities more objective and can intuitively show the attractiveness of the original video image to the user.
  • step S205 the original video is hot-spot-marked based on the hot-spot video image to obtain hot-spot video clips, including:
  • S501 Count the number of frames of the original video clip formed between any two hot-spot video images, and determine the number of frames of the video clip.
  • the frame number of the video clip refers to the total number of frames of the original video clip formed between the two hot video images.
  • the number of frames of the original video clip formed between any two hotspot video images is counted and determined as the number of frames of the video clip. Since the original video clip contains two hotspot video images, so The number of video clip frames is at least two. For example, if the 20th original video image and the 40th original video image in the original video are hotspot video images, it is determined that the number of video clip frames of the original video clip formed between the two hotspot video images is 21 frames.
  • the first frame number threshold refers to a preset threshold for determining whether the original video clip is the minimum value of the time interval of the hot video clip.
  • the first frame number threshold is set independently, and its value is generally relatively small.
  • the threshold of the first frame number is set to 120 frames, and the frame rate of the original video playback is generally 24 frames/second, therefore, the original video segment that can be determined by it is an original video segment of 5 seconds. If the frame number of the video clip is less than the threshold of the first frame number, it means that the interval between the original video clips of the two adjacent hotspot video images is short, and the original video clips cause the user's intense emotional value in a short time and attract the user's attention , Then the original video clip is determined as a hot video clip.
  • the second frame number threshold refers to a preset threshold for determining whether the segment video is the maximum time interval of the hot video segment.
  • the second frame number threshold is set larger. For example, when the first frame number threshold is set to 120 frames, the second frame number threshold can be set to 1200 frames. If the playback frame rate of the original video is 24 frames/second, therefore, It can be determined that the original video segment formed between the two hotspot video images is a 50-second original video segment.
  • the playback timestamp corresponding to each original video image in the 50-second original video clip obtain the emotion fluctuation probability of the image to be identified associated with the playback timestamp; if the emotion fluctuation probability is large, it means that the original video clip caused the user Is more likely to have intense emotions; on the contrary, if the probability of emotional fluctuation is small, it means that the original video clip is less likely to cause intense emotions of the user.
  • the emotion fluctuation probability refers to the probability of causing a large emotion fluctuation during the user watching the original video clip, where the large emotion fluctuation can understandably change from overjoy to great compassion or other emotion change processes.
  • the second probability threshold is a probability-related threshold set for evaluating hot-spot video clips based on the fluctuation emotion probability. Understandably, if the fluctuation emotion probability of an original video clip is greater than the second probability threshold, it means that the original video clip causes a strong emotional fluctuation of the user, attracts the user's attention, and can be determined as a hot video clip.
  • the video clip frame number of the original video clip formed between two hotspot video images is first obtained. If the video clip frame number is less than or equal to the first frame number threshold, then directly The frame number of the video clip is a hot video clip. If the video clip frame number is greater than the first frame number threshold and less than or equal to the second frame number threshold, you need to obtain the fluctuation emotion probability of the original video clip, and then compare the fluctuation emotion probability of the clip video with the second probability threshold Determine whether the original video clip is a hot video clip.
  • the video clip frame number and fluctuation emotion probability of the original video clip formed between the two hot-spot video images are hot-spot video clips, thereby automatically tagging the hot-spot video clips in the original video, and Ensure the objectivity of the marked hot video clips.
  • step S503 that is, based on the playback timestamp corresponding to the original video segment, acquiring the fluctuating emotion probability corresponding to the original video segment includes:
  • the server intercepts the recorded video segment associated with the playback timestamp in the recorded video from the recorded video corresponding to the original video according to the playback timestamp of the original video segment, so as to identify the image to be recognized of the recorded video segment. For example, if the playback timestamp of the original video segment in an original video is 10-50 seconds, then from the recorded video corresponding to the original video, the interception of the recording timestamp corresponds to the playback timestamp 10-50 seconds Recorded video clips, so that each image to be recognized in the recorded video clips can reflect the user's facial expression changes when viewing the original video clips.
  • S602 Obtain the instantaneous emotion value corresponding to each image to be recognized in the recorded video segment.
  • step S202 Since in step S202, a micro-expression recognition model has been used to identify each image to be recognized in all recorded videos and obtain the corresponding instantaneous emotion value, therefore, this step can directly obtain each image to be recognized in the recorded video clip The corresponding instantaneous emotion value does not need to be re-identified to improve the efficiency of acquiring the instantaneous emotion value.
  • S603 Calculate the instantaneous emotion value corresponding to all the images to be recognized in the recorded video clip using the standard deviation formula to obtain the standard deviation of the emotion value; the standard deviation formula is Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized, It is the average value of all instantaneous emotion values x i in the recorded video clip.
  • the standard deviation of the emotion value refers to the standard deviation of the instantaneous emotion value when the user views all the images to be recognized in the original video clip, which can objectively reflect the mood fluctuation of the user when viewing the original video clip. Understandably, if the instantaneous emotion value of each user is used to calculate the standard deviation of the emotion value, the hot video segment determined by the standard deviation of the emotional value greater than the preset standard deviation is the hot video segment concerned by the user. If the average sentiment value of all users who have viewed this original video clip is used to calculate the standard deviation of the sentiment value, the hot video segment determined based on the sentiment value standard deviation being greater than the preset standard deviation is the hot video segment that all users are concerned about .
  • the standard deviation threshold is a value preset by the server, and the standard deviation threshold can be set independently by the user according to requirements.
  • the standard deviation of the sentiment value of an original video clip is greater than the standard deviation threshold, it means that the user's emotional fluctuations are large when viewing the original video clip, which may be from great joy to great compassion, or from great compassion to great joy
  • the recorded video clip is a mood swing video clip. This emotional fluctuation is reflected by the standard deviation of the emotional value, which can objectively reflect the user's emotional changes during watching the original video clip.
  • S605 Calculate the number of emotional fluctuation video clips and the number of recorded video clips using the fluctuation emotion probability formula to obtain the fluctuation emotion probability of the original video clip.
  • the fluctuating emotion probability can intuitively express the mood fluctuation of the user when viewing the original video clip. If the user views the original video, the greater the number of mood swing video clips, the greater the fluctuating mood probability, it means that the original video clip can Resonate with user emotions.
  • the number D of recorded video clips is the number of recorded video clips from which the same original video clip is viewed from the recorded video corresponding to all users, which can be understood as all the original video clips viewed and recorded to the user The number of users whose facial expressions change.
  • the number C of emotional fluctuation video clips is the number D of recorded video clips, and the standard deviation of the sentiment value is greater than the standard deviation threshold.
  • the instantaneous emotion value corresponding to each image to be recognized in the recorded video clip is obtained, and the standard deviation of the emotion value is calculated using a standard deviation formula to determine whether each recorded video clip is Emotional fluctuation video clips to determine the emotional fluctuation video clips that can cause strong emotion fluctuations; then the number of emotional fluctuation video clips and the number of recorded video clips are calculated to obtain the fluctuation emotion probability of the original video clip to achieve The probability reflects the emotional fluctuation of all users watching the original video clip.
  • each recorded video is associated with a user ID, which is an identifier used to uniquely identify the user's identity in the video playback system.
  • the hotspot video annotation processing method further includes:
  • S701 Based on the playback timestamp corresponding to the hot video segment, intercept the target video segment corresponding to the playback timestamp from the recorded video corresponding to the user ID.
  • the target video segment is a recorded video segment corresponding to the playback timestamp of the recorded video corresponding to the user ID and the hot video segment.
  • the server obtains the recorded video clip corresponding to the playback timestamp of the hot video clip in the recorded video corresponding to the user ID according to the playback time stamp corresponding to the hot video clip, and determines the acquired recorded video clip as Target video clip.
  • S702 Acquire the instantaneous emotion value corresponding to each image to be recognized in the target video segment.
  • step S202 a micro-expression recognition model has been used to identify each image to be recognized in all recorded videos and obtain the corresponding instantaneous emotion value. Therefore, this step can directly obtain each image to be recognized in the target video segment The corresponding instantaneous emotion value does not need to be re-identified to improve the efficiency of acquiring the instantaneous emotion value.
  • S703 Query the emotion tag comparison table based on the instantaneous emotion value to obtain a single frame of emotion tags corresponding to the image to be recognized.
  • the emotion tag comparison table is a preset comparison table for recording the emotion tags corresponding to each instantaneous emotion value. Since the instantaneous emotion value is set to any one of 1, 0.8, 0.5, 0.3, 0, -0.3, -0.5, -0.8, and -1, and each instantaneous emotion value can correspond to at least one micro-expression type, therefore, An emotion label may be determined according to each instantaneous emotion value, or according to the size of the instantaneous emotion value, and a preset rule for dividing the emotion label, so that each instantaneous emotion value corresponds to an emotion label. For example, the emotion label can be divided into emotion labels such as joy, anger, ...
  • each level of emotion corresponds to a range of emotion values.
  • the single-frame emotion label refers to the emotion label corresponding to the instantaneous emotion value corresponding to the image to be recognized in the emotion label comparison table. That is, according to the instantaneous emotion value of the user in each image to be recognized, a single frame of emotion labels corresponding to the instantaneous emotion value is queried in the emotion label comparison table, so as to determine that the user determines the corresponding original video image according to the single frame of emotion labels Degree of preference.
  • S704 Based on the single-frame emotion tag corresponding to the image to be recognized, obtain the segment emotion tag corresponding to the target video segment.
  • the single-frame emotion tag corresponding to each image to be recognized can reflect each original video image in the target video segment viewed by the user
  • the emotional tags of the clip when the user views the target video clip can be obtained.
  • a single-frame emotion label with the largest number may be selected from the single-frame emotion labels of all the images to be recognized as the segment emotion identifier.
  • S705 If the clip emotion tag is a preset emotion tag, query the user portrait database based on the user ID, obtain the user tag corresponding to the user ID, determine the target user based on the user tag, and push the hot video clip to the client corresponding to the target user .
  • the user tag is based on the user ID to query the user portrait database, and the acquired gender, age, occupation, interest, or other preset tags in the user portrait database corresponding to the user ID are obtained.
  • the target user refers to a user who has the same preferences as the original video obtained by the server and the user ID.
  • the user profile database can be queried based on the user ID to obtain a user tag corresponding to the user ID, and then the target user can be quickly obtained based on the user tag, so as to facilitate the push of the target user's favorite hot video clip.
  • the preset emotion tags are preset tags that can be used for video push. For example, if the preset emotional tag is a hi tag or a level 1 tag, and the server recognizes that the segment emotional tag of a target video clip is a level 1 tag, then the corresponding hot video clip is deemed to be more attractive to the user corresponding to the user ID , Hotspot video clips can be pushed to target users with the same user tags (that is, with the same preferences) corresponding to the user ID to ensure the attractiveness of the hotspot video clips to the target users.
  • the target video segment corresponding to the playback timestamp of the hotspot video segment is intercepted from the recorded video, and a single frame of emotion tags corresponding to each image to be identified is obtained to determine
  • the emotional tag of the segment corresponding to the target video segment may reflect the preference of the user corresponding to the user ID in the process of watching the hot video segment.
  • query the user portrait database based on the user ID to obtain the user tag of the user, so as to determine the target user with the same user tag that the user corresponding to the user ID has, so that the target user has the same preferences as the user corresponding to the user ID.
  • the clip emotion tag is a preset emotion tag
  • each recorded video is associated with a user ID, which is an identifier used to uniquely identify the user's identity in the video playback system.
  • the hotspot video annotation processing method further includes:
  • step S801 The specific implementation process of step S801 is the same as that of step S701. In order to avoid redundant description, details are not described here one by one.
  • S802 Obtain the instantaneous emotion value corresponding to each image to be recognized in the target video segment.
  • step S802 The specific implementation process of step S802 is the same as that of step S702. In order to avoid redundant description, details are not described here one by one.
  • S803 Query the emotion label comparison table based on the instantaneous emotion value to obtain a single frame of emotion labels corresponding to the image to be recognized.
  • step S803 The specific implementation process of step S803 is the same as that of step S703. To avoid redundant description, details are not described here one by one.
  • step S804 The specific implementation process of step S804 is the same as that of step S704. To avoid redundant description, details are not described here one by one.
  • S805 If the clip emotion tag is a preset emotion tag, query the video database based on the playback timestamp corresponding to the hot video clip, obtain the content tag corresponding to the hot video clip, and determine the hot video clip corresponding to the content tag as Recommend video clips, push the recommended video clips to the client corresponding to the user ID.
  • the content tag refers to the tag of the content played in the original video.
  • the content may be funny, food, fashion, travel, entertainment, life, information, parent-child, knowledge, games, cars, finance, cute pets, sports, music, Category labels such as anime, technology, and health can also be other labels that subdivide specific descriptions of video content.
  • the server determines that the segment emotion tag of the target video segment is a preset emotion tag, and determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about.
  • the server determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about.
  • the server determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about.
  • the server determines that the hotspot video segment is a video segment of the video type that the user corresponding to the user ID is more concerned about.
  • the recommended video clip is a hot video clip that can be recommended to the user corresponding to the user ID determined based on the content tag.
  • the server queries the video database according to the content tag, obtains other hot video clips corresponding to the content tag, the hot video clip is determined to be a recommended video clip, and recommends the recommended video clip to the client of the user ID to implement automatic recommendation and Hot video clips with the same content label are given to the user corresponding to the user ID.
  • the target video segment corresponding to the playback timestamp of the hotspot video segment is intercepted from the recorded video, and a single frame of emotion tags corresponding to each image to be identified is obtained to determine
  • the emotional tag of the segment corresponding to the target video segment, the emotional tag of the segment may reflect the preference of the user corresponding to the user ID in the process of watching the hot video segment.
  • the video database based on the playback timestamp of the hotspot video clip, query the video database to determine the pre-configured content tag of the hotspot video clip, so as to determine other hotspot video clips stored by the server corresponding to the content tag as recommended video clips, and Recommend the recommended video clip to the client corresponding to the user ID, so that the recommended video clip can more easily cater to the preferences of the user corresponding to the user ID, and improve the attractiveness of the user corresponding to the user ID to the recommended video clip.
  • the hotspot video annotation processing method further includes:
  • the hot-spot video frame rate refers to the probability that the number of frames of all hot-spot video clips in an original video occupies the number of frames of the entire original video.
  • the server obtains the number of frames of an original video, and then counts the number of frames of all hot video segments in the original video, and divides the number of frames of all hot video segments by the number of frames of the original video to obtain the corresponding Hot video frame rate.
  • the number of frames of the original video is 10000, that is, the original video contains 10000 original video images
  • the frame number of the first hotspot video clip is 1000
  • the frame number of the second hotspot video clip is 2000
  • the original video image can objectively reflect the attractiveness of the original video to users.
  • S902 Sort the original video based on the hot video frame rate corresponding to the original video and display it on the client according to the sorting result.
  • the server sorts the display position of the original video on the client according to the order of the hot video frame rate from high to low, so that the user can watch the original video with a higher hot video frame rate, so that the user can choose according to the hot video frame rate Watching, so as to improve the user's playback volume of the original video displayed by the video playback system.
  • the hotspot video annotation processing method After obtaining the hotspot video frame rate of each original video, the original video is sorted and displayed on the user's client, so that the user can selectively watch more attractive Original video to increase the playback volume of the original video displayed by the video playback system.
  • step S901 based on the hotspot video segment, the hotspot video frame rate corresponding to the original video is counted, including:
  • S1001 Count the number of original video images in each hot video segment, and determine the total number of frames of the hot video segment.
  • the total frame number of the hot video clip refers to the total frame number of all the hot video clips in the same original video.
  • an original video has 6 hotspot video clips.
  • the server counts the total number of frames of the 6 hotspot video clips as the total number of hotspot video clips.
  • S1002 Count the number of original video images in the original video and determine the total number of video frames of the original video.
  • the server counts the number of original video images in the original video and determines the total number of video frames of the original video, that is, the total number of video frames is the number of all original video images in the original video.
  • the total number of video frames of the original video may be determined according to the product of the playback frame rate and the playback duration of the original video, so as to quickly determine the total number of frames of the original video,
  • the hotspot video frame rate formula is used to calculate the total frame number of the hotspot video clip and the original video video frame to obtain the hotspot video frame rate corresponding to the original video.
  • the hotspot video frame rate formula is Where, Z is the hot spot video frame rate, w j is the total frame number of the jth hot spot video segment, m is the number of hot spot video segments, and K is the total video frame number of the original video.
  • the server can determine the total frame number of the hot video segment and the total video frame of the original video, and can quickly calculate the frame rate of the hot video using the formula of the frame rate of the hot video, based on the original frame of the hot video
  • the videos are sorted so that users can selectively watch the original video with a higher frame rate of the hotspot video and improve the playback volume of the original video.
  • the hotspot video frame rate formula is used to calculate the hotspot video frame rate corresponding to the original video, so that According to the hot video frame rate to reflect the attractiveness of the original video to the user, so as to sort, in order to improve the playback volume of the original video.
  • a hotspot video annotation processing device is provided, and the hotspot video annotation processing device corresponds one-to-one to the hotspot video annotation processing method in the foregoing embodiment.
  • the hotspot video annotation processing device includes a recorded video acquisition module 1101, an instant emotion value acquisition module 1102, an intense emotion probability determination module 1103, a hotspot video image determination module 1104, and a hotspot video segment acquisition module 1105.
  • the detailed description of each functional module is as follows:
  • the recorded video obtaining module 1101 is used to obtain the user's recorded video collected while the client plays the original video.
  • the original video includes at least one frame of original video image, and the recorded video includes at least one frame of image to be recognized.
  • the recording timestamp is associated with the playback timestamp of an original video image.
  • the instantaneous emotion value acquisition module 1102 is used to identify each image to be recognized by using a micro-expression recognition model to obtain the instantaneous emotion value corresponding to the image to be recognized.
  • the intense emotion probability determination module 1103 is used to determine the intense emotion probability of the original video image corresponding to the playback timestamp according to the instantaneous emotion values of the images to be identified corresponding to all the recording timestamps associated with the same playback timestamp.
  • the hotspot video image determination module 1104 is configured to determine the original video image as a hotspot video image if the intense emotion probability is greater than the first probability threshold.
  • the hotspot video clip acquisition module 1105 is configured to perform hotspot annotation on the original video based on the hotspot video image to obtain hotspot video clips.
  • the instantaneous emotion value acquisition module 1102 includes an instantaneous probability acquisition unit, a micro-expression type determination unit, and an instantaneous emotion value acquisition unit.
  • the instantaneous probability acquisition unit is used to identify each image to be recognized by using a micro-expression recognition model to acquire the instantaneous probability corresponding to at least one type of recognized expression.
  • the micro-expression type determination unit is used to determine the identified expression type with the largest instantaneous probability as the micro-expression type of the image to be recognized.
  • the instantaneous emotion value acquisition unit is used to query an emotion value comparison table based on the micro-expression type to acquire the instantaneous emotion value of the image to be recognized.
  • the intense emotion probability determination module 1103 includes a total number of image statistics unit, an intense emotion judgment unit, an intense emotion quantity statistical unit, and an intense emotion probability determination unit.
  • the total number of image counting unit is used to count the total number of images to be identified corresponding to all recording timestamps associated with the same playback timestamp.
  • the intense emotion judgment unit is configured to: if the absolute value of the instantaneous emotion value corresponding to the image to be recognized is greater than a preset emotion threshold, the emotion attribute of the image to be recognized is intense emotion.
  • the intense emotion quantity counting unit is used to count the images to be recognized corresponding to all the recording timestamps associated with the same playback timestamp, and the number of images to be recognized whose emotion attribute is intense emotion is the number of intense emotions.
  • the hotspot video clip acquisition module 1105 includes a video clip frame number counting unit, a first hotspot video clip determination unit, a fluctuation emotion probability acquisition unit, and a second hotspot video clip determination unit.
  • the video clip frame number counting unit is used to count the number of frames of the original video clip formed between any two hot-spot video images and determine the frame number of the video clip.
  • the first hotspot video segment determination unit is configured to determine the original video segment as a hotspot video segment if the video segment frame number is less than or equal to the first frame number threshold.
  • Fluctuation mood probability acquisition unit used to obtain the fluctuation mood probability corresponding to the original video clip based on the playback timestamp corresponding to the original video clip if the frame number of the video clip is greater than the first frame number threshold and less than or equal to the second frame number threshold .
  • the second hotspot video segment determination unit is configured to determine the original video segment as a hotspot video segment if the fluctuation emotion probability is greater than the second probability threshold.
  • the fluctuation emotion probability acquisition unit includes a recorded video clip interception subunit, an instant emotion value acquisition subunit, an emotion value standard deviation acquisition subunit, an emotion fluctuation video clip determination subunit, and a fluctuation emotion probability calculation subunit.
  • the recorded video clip interception subunit is used to intercept the recorded video clip corresponding to the playback timestamp from the recorded video corresponding to the original video based on the playback timestamp corresponding to the original video clip.
  • the instantaneous emotion value acquisition subunit is used to acquire the instantaneous emotion value corresponding to each image to be identified in the recorded video segment.
  • Emotion value standard deviation acquisition subunit used to calculate the instantaneous emotion value corresponding to all the images to be recognized in the recorded video clip using the standard deviation formula to obtain the standard deviation of the emotion value
  • the standard deviation formula is Among them, S N is the standard deviation of the sentiment value of the recorded video clip, N is the number of images to be recognized in the recorded video clip, x i is the instantaneous emotion value of each image to be recognized, It is the average value of all instantaneous emotion values x i in the recorded video clip.
  • the emotional fluctuation video clip determination subunit is used to record a video clip as an emotional fluctuation video clip if the standard deviation of the emotional value is greater than the standard deviation threshold.
  • each recorded video is associated with a user ID; after the hotspot video clip acquisition module 1105, the hotspot video tagging device further includes a target video clip interception module, a target emotion value acquisition module, a single-frame emotion tag acquisition module, a clip emotion Tag acquisition module, target user determination module and hotspot video clip pushing module.
  • the target video clip interception module is used to intercept the target video clip corresponding to the playback timestamp from the recorded video corresponding to the user ID based on the playback timestamp corresponding to the hot video clip.
  • the target emotion value acquisition module is used to acquire the instantaneous emotion value corresponding to each image to be identified in the target video segment.
  • the single-frame emotion label acquisition module is used to query the emotion label comparison table based on the instantaneous emotion value and obtain the single-frame emotion label corresponding to the image to be recognized.
  • the segment emotion tag acquisition module is used to acquire the segment emotion tag corresponding to the target video segment based on the single frame emotion tag corresponding to the image to be recognized.
  • the first video segment pushing module is used to query the user portrait database based on the user ID if the emotional tag of the segment is a preset emotional tag, obtain the user tag corresponding to the user ID, determine the target user based on the user tag, and push the hot video clip To the client corresponding to the target user.
  • the second video clip push module is used to query the video database based on the playback timestamp corresponding to the hotspot video clip if the clip's emotion tag is the preset emotion tag, and obtain the content tag corresponding to the hotspot video clip, which will be related to the content tag
  • the corresponding hot video segment is determined to be the recommended video segment, and the recommended video segment is pushed to the client corresponding to the user ID.
  • the hotspot video annotation processing device further includes a hotspot video frame rate statistics module and an original video sorting module.
  • the hotspot video frame rate statistics module is used to calculate the hotspot video frame rate corresponding to the original video based on the hotspot video clips.
  • the original video sorting module is used to sort the original video based on the hot video frame rate corresponding to the original video and display it on the client according to the sorting result.
  • the hotspot video frame rate statistics module includes a total frame number determination unit for the clip, a total video frame number determination unit, and a hotspot video frame rate acquisition unit.
  • the total frame number determining unit of the clip is used to count the number of original video images in each hot video segment and determine the total frame number of the hot video segment.
  • the total video frame number determining unit is used to count the number of original video images in the original video and determine the total number of video frames of the original video.
  • Hotspot video frame rate acquisition unit used to calculate the total frame number of the hotspot video clip and the total video frame of the original video using the hotspot video frame rate formula, to obtain the hotspot video frame rate corresponding to the original video, and the hotspot video frame rate formula for Where, Z is the hot spot video frame rate, w j is the total frame number of the jth hot spot video segment, m is the number of hot spot video segments, and K is the total video frame number of the original video.
  • Each module in the above hotspot video annotation processing device may be implemented in whole or in part by software, hardware, and combinations thereof.
  • the above modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 12.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store data used or generated during the execution of the above hot-spot video annotation processing method, such as the number of original video images.
  • the network interface of the computer device is used to communicate with external terminals through a network connection. When the computer readable instructions are executed by the processor to implement a hotspot video annotation processing method.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions
  • the hot spots in the above embodiments are implemented Video annotation processing methods, such as steps S201-S205 shown in FIG. 2 or steps shown in FIGS. 3-10, are not repeated here to avoid repetition.
  • the processor implements the functions of each module/unit in the embodiment of the hotspot video annotation processing device when executing computer-readable instructions, for example, the recorded video acquisition module 1101 shown in FIG. 11, the instant emotion value acquisition module 1102, and the intense emotion
  • the functions of the probability determination module 1103, the hotspot video image determination module 1104, and the hotspot video segment acquisition module 1105 are described here to avoid repetition.
  • a computer-readable storage medium stores computer-readable instructions.
  • the hotspot video annotation processing method in the foregoing embodiment is implemented, for example The steps S201-S205 shown in FIG. 2 or the steps shown in FIGS. 3-10 are not repeated here to avoid repetition.
  • the functions of each module/unit in the embodiment of the above-mentioned hotspot video annotation processing apparatus are realized, for example, the recorded video acquisition module 1101 shown in FIG. 11 and the instantaneous emotion value acquisition module 1102 1.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

La présente invention concerne un procédé et un appareil de traitement d'annotation vidéo d'un point d'accès public, un dispositif informatique et un support de stockage. Le procédé consiste à : obtenir une vidéo enregistrée d'un utilisateur collectée pendant qu'un client lit une vidéo originale, la vidéo originale contenant au moins une trame d'une image vidéo originale et la vidéo enregistrée contenant au moins une trame d'une image à identifier ; utiliser un modèle de reconnaissance de micro-expression pour identifier chaque image à identifier et obtenir des valeurs d'émotion instantanée correspondant aux images à identifier ; en fonction des valeurs d'émotion instantanée, déterminer la probabilité d'une émotion intense d'une image vidéo originale correspondant à une estampille temporelle de lecture ; si la probabilité d'une émotion intense est supérieure à un premier seuil de probabilité, déterminer l'image vidéo originale comme étant une image vidéo d'un point d'accès public ; et, sur la base de l'image vidéo du point d'accès public, effectuer une annotation du point d'accès public sur la vidéo originale de façon à obtenir des séquences vidéo du point d'accès public. Le procédé d'après la présente invention peut effectuer l'annotation automatique des séquences vidéo du point d'accès public et accroître l'efficacité d'annotation des séquences vidéo du point d'accès public.
PCT/CN2019/088957 2019-01-11 2019-05-29 Procédé et appareil de traitement d'annotation vidéo d'un point d'accès public, dispositif informatique et support de stockage WO2020143156A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910025355.9A CN109819325B (zh) 2019-01-11 2019-01-11 热点视频标注处理方法、装置、计算机设备及存储介质
CN201910025355.9 2019-01-11

Publications (1)

Publication Number Publication Date
WO2020143156A1 true WO2020143156A1 (fr) 2020-07-16

Family

ID=66604271

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088957 WO2020143156A1 (fr) 2019-01-11 2019-05-29 Procédé et appareil de traitement d'annotation vidéo d'un point d'accès public, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN109819325B (fr)
WO (1) WO2020143156A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291589A (zh) * 2020-10-29 2021-01-29 腾讯科技(深圳)有限公司 视频文件的结构检测方法、装置
CN112699774A (zh) * 2020-12-28 2021-04-23 深延科技(北京)有限公司 视频中人物的情绪识别方法及装置、计算机设备及介质
CN113127576A (zh) * 2021-04-15 2021-07-16 微梦创科网络科技(中国)有限公司 一种基于用户内容消费分析的热点发现方法及系统
CN114445896A (zh) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 视频中人物陈述内容可置信度的评估方法及装置
CN116386060A (zh) * 2023-03-23 2023-07-04 浪潮智慧科技有限公司 一种水尺数据自动标注方法、装置、设备及介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819325B (zh) * 2019-01-11 2021-08-20 平安科技(深圳)有限公司 热点视频标注处理方法、装置、计算机设备及存储介质
CN110401847B (zh) * 2019-07-17 2021-08-06 咪咕文化科技有限公司 云dvr视频的压缩存储方法、电子设备及系统
CN110519617B (zh) * 2019-07-18 2023-04-07 平安科技(深圳)有限公司 视频评论处理方法、装置、计算机设备及存储介质
CN110418204B (zh) * 2019-07-18 2022-11-04 平安科技(深圳)有限公司 基于微表情的视频推荐方法、装置、设备和存储介质
CN110353705B (zh) * 2019-08-01 2022-10-25 秒针信息技术有限公司 一种识别情绪的方法及装置
CN110647812B (zh) * 2019-08-19 2023-09-19 平安科技(深圳)有限公司 摔倒行为检测处理方法、装置、计算机设备及存储介质
CN110826471B (zh) * 2019-11-01 2023-07-14 腾讯科技(深圳)有限公司 视频标签的标注方法、装置、设备及计算机可读存储介质
CN111343483B (zh) * 2020-02-18 2022-07-19 北京奇艺世纪科技有限公司 媒体内容片段的提示方法和装置、存储介质、电子装置
CN111447505B (zh) * 2020-03-09 2022-05-31 咪咕文化科技有限公司 视频剪辑方法、网络设备及计算机可读存储介质
CN111629222B (zh) * 2020-05-29 2022-12-20 腾讯科技(深圳)有限公司 一种视频处理方法、设备及存储介质
CN111860302B (zh) * 2020-07-17 2024-03-01 北京百度网讯科技有限公司 一种图像标注方法、装置、电子设备及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161409A1 (en) * 2008-12-23 2010-06-24 Samsung Electronics Co., Ltd. Apparatus for providing content according to user's interest in content and method for providing content according to user's interest in content
CN102693739A (zh) * 2011-03-24 2012-09-26 腾讯科技(深圳)有限公司 视频片段生成方法及系统
CN103873492A (zh) * 2012-12-07 2014-06-18 联想(北京)有限公司 一种电子设备及数据传输方法
CN105022801A (zh) * 2015-06-30 2015-11-04 北京奇艺世纪科技有限公司 一种热门视频挖掘方法和装置
CN105615902A (zh) * 2014-11-06 2016-06-01 北京三星通信技术研究有限公司 情绪监控方法和装置
CN107257509A (zh) * 2017-07-13 2017-10-17 上海斐讯数据通信技术有限公司 一种视频内容的过滤方法及装置
CN107809673A (zh) * 2016-09-09 2018-03-16 索尼公司 根据情绪状态检测处理视频内容的系统和方法
CN107888947A (zh) * 2016-09-29 2018-04-06 法乐第(北京)网络科技有限公司 一种视频播放方法与装置
CN109819325A (zh) * 2019-01-11 2019-05-28 平安科技(深圳)有限公司 热点视频标注处理方法、装置、计算机设备及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9026678B2 (en) * 2011-11-30 2015-05-05 Elwha Llc Detection of deceptive indicia masking in a communications interaction
CN104681048A (zh) * 2013-11-28 2015-06-03 索尼公司 多媒体读取控制装置、曲线获取装置、电子设备、曲线提供装置及方法
CN106341712A (zh) * 2016-09-30 2017-01-18 北京小米移动软件有限公司 多媒体数据的处理方法及装置
CN106792170A (zh) * 2016-12-14 2017-05-31 合网络技术(北京)有限公司 视频处理方法及装置
CN107968961B (zh) * 2017-12-05 2020-06-02 吕庆祥 基于情感曲线剪辑视频的方法及装置
CN108093297A (zh) * 2017-12-29 2018-05-29 厦门大学 一种影片片段自动采集的方法及系统
CN109151576A (zh) * 2018-06-20 2019-01-04 新华网股份有限公司 多媒体信息剪辑方法和系统

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161409A1 (en) * 2008-12-23 2010-06-24 Samsung Electronics Co., Ltd. Apparatus for providing content according to user's interest in content and method for providing content according to user's interest in content
CN102693739A (zh) * 2011-03-24 2012-09-26 腾讯科技(深圳)有限公司 视频片段生成方法及系统
CN103873492A (zh) * 2012-12-07 2014-06-18 联想(北京)有限公司 一种电子设备及数据传输方法
CN105615902A (zh) * 2014-11-06 2016-06-01 北京三星通信技术研究有限公司 情绪监控方法和装置
CN105022801A (zh) * 2015-06-30 2015-11-04 北京奇艺世纪科技有限公司 一种热门视频挖掘方法和装置
CN107809673A (zh) * 2016-09-09 2018-03-16 索尼公司 根据情绪状态检测处理视频内容的系统和方法
CN107888947A (zh) * 2016-09-29 2018-04-06 法乐第(北京)网络科技有限公司 一种视频播放方法与装置
CN107257509A (zh) * 2017-07-13 2017-10-17 上海斐讯数据通信技术有限公司 一种视频内容的过滤方法及装置
CN109819325A (zh) * 2019-01-11 2019-05-28 平安科技(深圳)有限公司 热点视频标注处理方法、装置、计算机设备及存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291589A (zh) * 2020-10-29 2021-01-29 腾讯科技(深圳)有限公司 视频文件的结构检测方法、装置
CN112291589B (zh) * 2020-10-29 2023-09-22 腾讯科技(深圳)有限公司 视频文件的结构检测方法、装置
CN112699774A (zh) * 2020-12-28 2021-04-23 深延科技(北京)有限公司 视频中人物的情绪识别方法及装置、计算机设备及介质
CN112699774B (zh) * 2020-12-28 2024-05-24 深延科技(北京)有限公司 视频中人物的情绪识别方法及装置、计算机设备及介质
CN113127576A (zh) * 2021-04-15 2021-07-16 微梦创科网络科技(中国)有限公司 一种基于用户内容消费分析的热点发现方法及系统
CN113127576B (zh) * 2021-04-15 2024-05-24 微梦创科网络科技(中国)有限公司 一种基于用户内容消费分析的热点发现方法及系统
CN114445896A (zh) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 视频中人物陈述内容可置信度的评估方法及装置
CN114445896B (zh) * 2022-01-28 2024-04-05 北京百度网讯科技有限公司 视频中人物陈述内容可置信度的评估方法及装置
CN116386060A (zh) * 2023-03-23 2023-07-04 浪潮智慧科技有限公司 一种水尺数据自动标注方法、装置、设备及介质
CN116386060B (zh) * 2023-03-23 2023-11-14 浪潮智慧科技有限公司 一种水尺数据自动标注方法、装置、设备及介质

Also Published As

Publication number Publication date
CN109819325B (zh) 2021-08-20
CN109819325A (zh) 2019-05-28

Similar Documents

Publication Publication Date Title
WO2020143156A1 (fr) Procédé et appareil de traitement d'annotation vidéo d'un point d'accès public, dispositif informatique et support de stockage
US11290775B2 (en) Computerized system and method for automatically detecting and rendering highlights from streaming videos
US10832738B2 (en) Computerized system and method for automatically generating high-quality digital content thumbnails from digital video
Segalin et al. What your Facebook profile picture reveals about your personality
WO2021088510A1 (fr) Procédé et appareil de classification de vidéos, ordinateur, et support de stockage lisible
US11064257B2 (en) System and method for segment relevance detection for digital content
US10885380B2 (en) Automatic suggestion to share images
US20210201349A1 (en) Media and marketing optimization with cross platform consumer and content intelligence
US9589205B2 (en) Systems and methods for identifying a user's demographic characteristics based on the user's social media photographs
JP5795580B2 (ja) タイムベースメディアにおけるソーシャルインタレストの推定および表示
CN110519617B (zh) 视频评论处理方法、装置、计算机设备及存储介质
US9154853B1 (en) Web identity to social media identity correlation
JP2023036898A (ja) 視聴者エンゲージメントを評価するためのシステムおよび方法
US9253511B2 (en) Systems and methods for performing multi-modal video datastream segmentation
WO2020253360A1 (fr) Procédé et appareil d'affichage de contenu pour application, support d'enregistrement et dispositif informatique
US20170245011A1 (en) Methods and systems of dynamic content analysis
CN112685596B (zh) 视频推荐方法及装置、终端、存储介质
Narassiguin et al. Data Science for Influencer Marketing: feature processing and quantitative analysis
Yang et al. Zapping index: using smile to measure advertisement zapping likelihood
US11010935B2 (en) Context aware dynamic image augmentation
US12073064B2 (en) Abstract generation method and apparatus
US11983925B2 (en) Detecting synthetic media
TWM551710U (zh) 用戶資料蒐集系統
TAO Analyzing image tweets in Microblogs
Garcıa TRABAJO DE FIN DE GRADO

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19908319

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19908319

Country of ref document: EP

Kind code of ref document: A1