WO2021000644A1

WO2021000644A1 - Video processing method and apparatus, computer device and storage medium

Info

Publication number: WO2021000644A1
Application number: PCT/CN2020/087694
Authority: WO
Inventors: 刘丽珍; 吕小立
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2019-07-04
Filing date: 2020-04-29
Publication date: 2021-01-07
Also published as: CN110458008A

Abstract

The present application relates to the field of image processing, and in particular to a video processing method and apparatus, a computer device and a storage medium. The method comprises: acquiring a monitoring video, and capturing a service sub-video of a target monitoring object from the monitoring video; extracting, from the service sub-video, a set of first video images containing the target monitoring object, and extracting a target object image from the set of first video images; extracting, from the service sub-video, a second video image containing a service object, and obtaining a set of service images according to the target object image and the second video image; respectively carrying out subtle expression analysis on each set image in the set of service images to obtain a preset subtle expression matching each set image; and generating a service information profile of the target monitoring object according to the set of service images and the preset subtle expression. By means of the method, the efficiency of acquiring effective video information can be improved.

Description

Video processing method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on July 4, 2019, the application number is 201910599356.4, and the invention title is "video processing methods, devices, computer equipment and storage media", the entire contents of which are incorporated by reference In this application.

Technical field

This application relates to the field of artificial intelligence technology, in particular to a video processing method, device, computer equipment and storage medium.

Background technique

At present, the society pays more and more attention to the service attitude of the service industry. As consumers, they hope to get the best quality service anywhere. Therefore, it is necessary to monitor the service quality of service personnel. When the service quality of service personnel fails Be reminded and corrected in time.

However, the inventor realizes that the amount of information contained in surveillance video is often very large. If quality monitors want to evaluate the service quality of service personnel based on surveillance video, they need to look back at the surveillance video, and there are a huge number of image frames in the video. It also contains a lot of redundant information. Therefore, it takes a lot of time to obtain effective information that can be used to evaluate the quality of the server, and the efficiency of obtaining effective video information is very low.

Summary of the invention

Based on this, it is necessary to provide a video processing method, device, computer equipment, and storage medium that can improve the efficiency of acquiring effective video information in response to the above technical problems.

A video processing method, the method includes:

Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;

Extracting a first video image set containing the target monitoring object from the service sub-video, and extracting a target object image from the first video image set;

Extracting a second video image containing a service object from the service sub-video, and obtaining a service image set according to the target object image and the second video image;

Performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;

The service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.

A video processing device, the device comprising:

The video interception module is used to obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;

A target image extraction module, configured to extract a first video image set containing the target monitoring object from the service sub-video, and extract a target object image from the first video image set;

An image collection generating module, configured to extract a second video image containing a service object from the service sub-video, and obtain a service image collection according to the target object image and the second video image;

An expression analysis module, configured to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;

The file generating module is used to generate the service information file of the target monitored object according to the service image collection and the preset micro-expression.

A computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the foregoing method when the computer program is executed.

A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are realized.

The above-mentioned video processing method, device, computer equipment and storage medium can intercept the service sub-video of the target monitoring object at the service location from the surveillance video, and detect the service object and the image containing the target object from the service sub-video , Which can automatically filter out the useless redundant image information; it can also perform micro-expression analysis on the filtered images to analyze the micro-expression of the target object and the service object in the image, so as to further process the image information and get the help evaluation The effective information of the target monitoring object, thereby greatly improving the efficiency of obtaining effective video information.

Description of the drawings

Figure 1 is an application scenario diagram of a video processing method in an embodiment;

Figure 2 is a schematic flowchart of a video processing method in an embodiment;

FIG. 3 is a schematic flowchart of a method for generating an expression comparison graph in an embodiment;

Figure 4 is a structural block diagram of a video processing device in an embodiment;

Fig. 5 is an internal structure diagram of a computer device in an embodiment.

Detailed ways

The video processing method provided in this application can be applied to the application environment shown in FIG. 1. Wherein, the terminal 102 communicates with the server 104 through the network. The terminal 102 sends the surveillance video to the server 104. After the server 104 receives the surveillance video, it intercepts the service sub-video of the target monitored object from the surveillance video; extracts the first video image set containing the target monitored object from the service sub-video, from the first video Extract the target object image from the image collection; extract the second video image containing the service object from the service sub-video, and obtain the service image collection based on the target object image and the second video image; respectively perform micro-expression on each collection image in the service image collection Analyze, obtain the preset micro-expression matching with each set of images; generate the service information file of the target monitoring object according to the service image collection and the preset micro-expression. The server 104 returns the generated service information file to the terminal 102.

The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented as an independent server or a server cluster composed of multiple servers.

In one embodiment, as shown in FIG. 2, a video processing method is provided. Taking the method applied to the server 104 in FIG. 1 as an example for description, the method includes the following steps:

Step 210: Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video.

The target monitoring objects are those who need to perform service quality monitoring and evaluation, such as customer service personnel. Surveillance video is a video for monitoring and shooting the service location of the target surveillance object. In addition to the target surveillance object, the people captured in the surveillance video may also include service objects such as customers or other service personnel. The shooting duration of surveillance video is generally a fixed time period, such as 1 day, 1 week, etc. The terminal used for monitoring can periodically send the captured surveillance video to the server. After the server receives the surveillance recorded video, it can process the surveillance video immediately or regularly. The server stores in advance the information of the target monitoring object that needs to be monitored, such as the face image of the target monitoring image, service time and other information. The server intercepts the acquired surveillance video according to the information of the target surveillance object, the service sub-video corresponding to the target surveillance object, and the service sub-video is the video in which the target surveillance object appears at the service location and performs activities.

Step 220: Extract a first video image set containing the target monitoring object from the service sub-video, and extract the target object image from the first video image set.

The service sub-video is only a preliminary and rough screening of the target monitoring object in the service position, which may contain other person information or redundant information. In addition to the image of the target monitored object, the service sub-video may also include the image of the service object, the images of other service personnel, or the images of multiple people and objects due to the rotation and change of the shooting angle during the shooting of the surveillance video. . The server performs face monitoring on the service sub-video according to the face information of the target monitored object, and recognizes the first video image set containing the target monitored object, that is, the set of video images capable of detecting the face of the target monitored object.

Further, since the service sub-video contains a large number of video image frames, before processing the service sub-video, the server may first extract video image frames from the service sub-video at a fixed time interval, thereby reducing the video image Processing volume, but the setting of the extraction time interval needs to take into account the amount of video information at the same time, and it cannot be set too large and lose too much effective information.

After the server extracts the video collection containing the target monitoring object, the server further filters the images in the collection to determine whether the target monitoring object in the image is in the service behavior state, and removes the target object image with the target monitoring object in the service behavior state from the first Extracted from the video image collection. For example, the service behavior status can be the status of customer service staff serving customers. In addition, the target monitoring object may also be in other states, such as the state of communicating with other server personnel, and the idle state without service. The server can judge the target based on information such as the number of people in the image and the situation of other people other than the target monitoring object. Whether the monitored object is in the service behavior state.

Step 230: Extract a second video image containing a service object from the service sub-video, and obtain a service image set according to the target object image and the second video image.

The service object is the service object of the target monitoring object, such as customers. The server extracts the second video image containing the service object from the service sub-video. The server may store the face information of all service personnel in advance, and perform face monitoring on each image. If a face that does not match the service personnel is detected, it can be determined that there is a service object in the image. Further, it is possible to further determine whether the service object is in the serviced state according to the number of people in the image, etc., and extract the image that contains the service object and is in the serviced state as the second video image.

The server obtains the service object set according to the extracted target object image and the second video image. The server can mark the object category of each image in the service object, such as the service object or the target monitoring object, and can sort and organize the service image collection according to the video time corresponding to each image.

Step 240: Perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions matching each set of images.

After obtaining the service image collection, the server performs micro-expression analysis on each collection image in the service image collection. Specifically, the server can extract the facial images of each object from the collection images, and extract facial features from the facial images. Then find the preset micro-expression that you want to match with the facial features. A variety of preset micro-expressions are stored in the database of the server in advance. The preset micro-expressions can be set according to the facial parts. For example, the preset micro-expressions of the eyes can include squinting, staring, etc. Therefore, the obtained and each collection There may be multiple preset micro-expressions for image matching, such as preset micro-expressions that can be matched to multiple parts. The preset micro-expression can also be a micro-expression synthesized from the features of various parts, and the server matches a comprehensive facial micro-expression, such as smiling, laughing, etc., according to the facial features of multiple parts.

Step 250: Generate a service information file of the target monitored object according to the service image collection and the preset micro-expression.

The server can classify and sort according to the object category, shooting time and matching preset micro-expressions corresponding to each service image collection, and can compare and analyze the preset micro-expressions matched by the target monitoring object and the service object in the same time period. The preset micro-expressions of the two parties obtain the service score of the target monitoring object, and the service scores of each time period can be comprehensively calculated to obtain the overall service evaluation score of the target monitoring object in the time period. The server can also serve each time period The score is compared with the service warning threshold to determine whether the server score in the time period is qualified and whether the service quality warning prompt information is required. The server can generate a service information file according to one or a combination of the above-mentioned service image collection, each collection image and other preset micro-expressions, service scores of each time period, and warning prompt information, etc. The server can also generate service information files based on the collection images Perform other processing methods with the preset micro-expressions to obtain other analysis information and obtain service information files.

Specifically, when the server calculates the service score for each time period, it can set the service score corresponding to each preset micro-expression in advance, and the preset micro-expression matching the service object and the target monitoring object can set different service scores. , And set different weights for the preset micro-expressions of the service object and the target monitoring object, and calculate the service score of each time period according to the service score and weight of both parties. In other embodiments, the server may also use other methods to calculate the service score.

In this embodiment, the server can intercept the service sub-video of the target monitoring object in the service location from the surveillance video, and detect the service object and the image containing the target object from the service sub-video, so as to automatically filter to Useless redundant image information; it can also perform micro-expression analysis on the filtered images, analyze the micro-expression of the target object and the service object in the image, and further process the image information to obtain effective information that can help evaluate the target monitoring object , Thereby greatly improving the efficiency of obtaining effective video information.

In one embodiment, the step of intercepting the service sub-video of the target surveillance object from the surveillance video may include: obtaining the service identifier of the target surveillance object, searching for the service time corresponding to the service identifier and the target face image; extracting the shooting from the surveillance video Surveillance video clips whose time matches the service time; perform face detection on the surveillance video clips according to the target face image, and extract video sub-segments from the surveillance video clips that do not detect a face that matches the target face image; obtain The segment duration of each video sub-segment is compared with the preset missing threshold; the first video sub-segment whose segment duration is greater than the preset missing threshold is deleted from the video segment to obtain the service sub-video.

The service identifier is used to uniquely identify the service personnel. The service identifier can be an employee code, name, job number, etc. The mapping relationship between the service identifier of each service personnel and the basic information of the service personnel is stored in the server in advance, and the basic information of the service personnel may include Service hours are like customer service hours in class, and personnel information like gender, age, facial image information, etc. The server obtains the service ID of the target monitored object, finds the service time corresponding to the service ID and the target face image, the server compares the service time with the shooting time of the surveillance video, and extracts the surveillance that matches the shooting time and the service time from the surveillance video Video clips, for example, can intercept corresponding clips from surveillance videos according to the starting time of the service time, and obtain fixed rest times, such as meal time, etc., and remove the video clips corresponding to the fixed rest time to obtain surveillance video clips.

The server performs face detection on the surveillance video clips based on the target face images found, and can extract image frames from the surveillance video clips at regular intervals to detect whether there is a face image matching the target face image in the image frames , Extract the image frames from which no matching face image is detected, and obtain multiple image frames in which no matching face images are detected in sequential extraction order to obtain video sub-segments. The number of obtained video sub-segments may be multiple, The server obtains the start time and end time of each video sub-segment, and calculates the duration of each video sub-segment based on the end time and the start time. The server obtains the preset missing threshold, which is used to judge whether the service personnel are away. The time threshold of the post, if the missing time of the service personnel in the surveillance video exceeds the preset missing threshold, it is determined that the service personnel are leaving the post. The server compares the segment duration of each video sub-segment with a preset missing threshold, and deletes the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video. Among them, the algorithm for face recognition detection can be a recognition method based on template matching, a principal component analysis method, a method based on singular value features, a subspace analysis method, a partial preservation projection method and other algorithms.

In this embodiment, through video matching and face detection, it is possible to preliminarily screen out the video clips containing the target monitoring object, thereby effectively reducing redundant video clips that are not related to the target monitoring target. And by setting the preset missing threshold, it can be preliminarily determined whether the target monitoring object is leaving the job for a long time, such as going to the toilet, going out, or leaving the job for a short time, such as consulting with other service personnel, obtaining information, etc., and keeping it for a short time. Time video information (will contain the expression information of the service object), thereby reducing the loss of effective information.

In one embodiment, the step of extracting the target object image from the first video image set may include: extracting the first image frame from the first video image set, and detecting the number of portraits in each first image frame; Extract multi-person image frames with more than one portrait from the image frame; perform face detection on the multi-person image frame according to the service face image pre-stored in the service face library, and detect whether there is a face in the multi-person image frame and the service face Face images with unmatched images; if a face image that does not match the service face image is detected, the face image matching the target face image in the corresponding multi-person image frame is extracted as the target object image.

The first video image set is an image set containing the face images of the target monitored object, and the server may extract the first image frame from the first video image set at a fixed time interval to reduce the amount of image processing data. The server detects the number of portraits in each extracted first image frame. Portrait detection is not the same as face detection. It only needs to detect the number of people present in each first image frame. It does not need to accurately recognize faces. Detect the number of portraits by detecting the outline of the human body. The server extracts multi-person image frames with a number of portraits greater than one from the first image frame, that is, excluding image frames in a non-service state where the target monitoring object appears alone in the video.

The service face database is a face information database in the server. The face images of all service personnel, including the target monitored object, are stored in the service face database. The server performs face video on each multi-person image frame and obtains the detected For each facial feature, compare and match the detected facial features with the service facial images of all server personnel pre-stored in the service face database, and determine whether the detected facial features are consistent with those in the service face database. A certain service face image matching, when it is detected that there is a face image matching all the service face images in the multi-person image frame, it is determined that the service object exists in the multi-person image frame, and the multi-person image frame The facial image that matches the target face image in is extracted and extracted as the target object image.

In this embodiment, through multi-person image detection, it is possible to exclude image frames in which the target monitoring object exists alone, and through the detection and matching of the service face image of the service personnel, it is possible to exclude the non-service state where only the target monitoring object and the service personnel exist Image frames, thereby further reducing the range of video images, effectively reducing redundant image information.

In one embodiment, the step of extracting the second video image containing the service object from the service sub-video may include: obtaining the second video sub-segment whose segment duration does not exceed a preset missing threshold, and extracting the second video sub-segment from the second video sub-segment The first facial image that does not match the service face image; the second facial image that does not match the service face image is extracted from the multi-person image frame; the second video is obtained according to the first facial image and the second facial image image.

The server obtains the second video sub-segment whose segment duration does not exceed the preset missing threshold from the service sub-videos. The second video sub-segment is the video for which the target monitoring object does not appear and the missing duration of the target monitoring object does not exceed the preset missing threshold. In the segment, the server first identifies, from the second video sub-segment, a face image matching the service face image in the service face library, and then extracts other face images as the first facial image of the service object.

The multi-person image frame is a multi-person image containing the target monitoring object. Similarly, the server first stores the multi-person image frame to identify the face image that matches the service face image in the service face database, and then converts other faces The image is extracted as the second facial image of the service object. The server jointly generates a second video image based on the first facial image and the second facial image. Further, the server may mark the second video image in the image category of the service object category, and may mark the shooting time of each second video image.

In this embodiment, the server extracts all the images of the service object including or not including the target monitoring object, which can avoid losing the facial expression information of the service object.

In one embodiment, the step of performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match each set of images may include: extracting facial feature points from each set of images, and according to facial features Point to calculate facial motion features; input facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression; select the preset micro-expression matching the collection image according to the matching probability value.

Each set of images is the facial image of the target monitored object or service object. The server extracts facial feature points from the set of images. The facial feature points are feature points of facial features and facial contours, such as feature coordinates of eyes, mouth, nose, eyebrows, etc. Specifically, the server may perform facial feature point extraction on the current facial image through a pre-trained 3D face model or a deep learning neural network.

Based on the extracted facial feature points, the server can extract facial action features from the collection image through a pre-trained 3D face model or deep learning neural network model, or it can classify the extracted facial feature points and input the corresponding The facial motion feature calculation model of, to obtain the corresponding facial motion features. For example, input facial feature points belonging to the eyes into the eye movement model to obtain facial motion features about the eyes, such as blinking, squinting, and staring. The 3D face model, deep learning neural network model, and facial motion feature calculation model are all obtained by deep learning training on multiple face images in advance.

The server can calculate the value of each facial motion feature according to the 3D face model or the deep learning neural network model or the facial motion feature calculation model, and input the facial motion features and values into the pre-trained micro-expression classification model to obtain various Preset the probability value of the micro expression. The micro-expression classification model can use SVM classifiers, deep neural network learning models, decision tree classification models and other models for classification. The micro-expression classification model is obtained by pre-training the facial motion features of multiple facial images. The server can select the preset micro-expression with the largest probability value according to the output result of the model.

In this embodiment, through facial feature extraction and feature classification training, more accurate preset micro-expressions can be obtained, and the obtained preset micro-expressions can be used to evaluate the service attitude and quality of service personnel and the satisfaction of service objects The degree provides important data reference.

In one embodiment, the step of generating the service information file of the target monitored object according to the service image collection and the preset micro-expression may include: associating the preset micro-expression with the corresponding collection image in the service image collection; and obtaining the corresponding image of each collection Object category; find the expression label corresponding to the preset micro expression, and determine the emotion category corresponding to the preset micro expression according to the label; divide the collection image in the service image collection into multiple image subsets according to the object category and emotion category, Generate service information files based on image subsets.

The server associates each collection image with the corresponding preset micro-expression, for example, can mark each collection image with the preset micro-expression, or record the mapping relationship between the collection image and the corresponding preset micro-expression, etc. The server obtains the object category corresponding to each set of images. In this embodiment, the object category is divided according to the face objects in the image. The object category can include two categories, namely the target monitoring object category and the service object category. When the collection images of the categories are subjected to face detection and matching, the collection images are classified according to the objects of the detected face images.

The emoticon tag is the emotional mode tag corresponding to the preset micro-emoji. The emoticon tag can be happy, excited, contemptuous, angry, peaceful, etc. One emoticon tag can correspond to multiple preset micro-emoticons, such as the preset corresponding to the happy emoticon tag The micro expressions can include preset micro expressions such as squinting eyes and raising the corners of the mouth. The mapping relationship between the emoticon tag and the preset microemoticon is stored in the server in advance. The server searches for the emoticon tag corresponding to the preset micro emoticon.

Emotion tags can be divided into multiple emotion categories, and one emotion category can correspond to multiple emotion tags. For example, in one embodiment, the emotional categories of emoticons can be divided into three categories, including positive emotional categories, neutral emotional categories, and negative emotional categories. For example, expression tags such as happy and excited belong to the positive emotional category, and expressions such as contempt and anger. The label belongs to the negative emotion category, and the Pinghe emoticon label belongs to the neutral emotion category. In other embodiments, emotion categories can also be divided in other ways. The mapping relationship between emotion categories and expression tags can be stored in the server in advance, and the server obtains the emotion categories corresponding to the preset micro-expressions of each collection image, and associates the found emotion categories with the corresponding collection images.

The server can classify the collection image according to the object category and emotion category corresponding to each collection image. For example, the collection image can be divided into service image sets of multiple object categories according to the object category, and then the service image collection of each object category is based on the collection The emotional category to which the image belongs is divided into multiple small image subsets. At the same time, the mapping relationship of the preset micro-expression, object category, and emotional category of the shooting time corresponding to each set of images is organized to form the image information of each service image set Table, based on the divided multiple image subsets and corresponding image information tables to generate service information files. The server can push the service information file to the terminal, so that the terminal can reasonably evaluate the service quality of the target monitoring object based on the service information file.

In this embodiment, by judging the emotion category of the preset micro-expression, and sorting and sorting the collection image according to the object category and emotion category of the collection image, it is convenient to search the collection image and to obtain the information in the collection image. Object expression information.

In an embodiment, as shown in FIG. 3, FIG. 3 is a flowchart of a method for generating an expression comparison graph, which may specifically include the following steps:

Step 310: Associate the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set.

In the above embodiment, the collection images in the server image collection are divided into different image subsets according to the object category, and each image subset has a corresponding image information table. The server starts from the target object category, namely the target monitoring object category, Obtain the shooting time of each collection image from the image information table of the image subset of the service object category, and find the first collection image of the target object category and the second collection image of the service object category that match the shooting time, and compare the two matching images. Association between class images. Wherein, the matching of the shooting time does not necessarily mean that the shooting time is exactly the same. It can also be determined that the shooting time matches when the time range is the same. For example, the length of the time range can be set to 10 seconds, 20 seconds, 30 seconds, etc.

Step 320: Determine whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category.

The server separately obtains the matching preset micro-expressions of the first set of images and the second set of images, and searches for the emotional category corresponding to each preset micro-expressions, and determines whether the emotional categories corresponding to the two images are the same, that is, to determine whether the target monitoring object is Whether the emotional state of the client’s expression matches at the time, the emotional changes of the target monitoring target and the client during the communication process are generally relatively synchronized. At this time, it is easy to evaluate the service attitude of the target monitoring target, but when the emotional state of the two is inconsistent and conflicting, Customer analysis and evaluation of the service status of the target monitoring object cannot be performed, and it is often necessary to manually determine the true service status at the time.

In step 330, when corresponding to different emotion categories, stitching the associated first set of images and second set of images to obtain an expression comparison map.

When the server determines that the two images correspond to different emotion categories, the server stitches the associated first set of images and second set of images to obtain an expression comparison map. The location and form of the two stitching can be set according to the needs of the monitoring personnel . Further, the server may further record and mark the shooting time corresponding to the expression comparison graph. The server may also sort the expression comparison images of multiple shooting times according to the shooting time, and then generate an expression comparison animation. The server may send the generated expression comparison image or expression comparison animation to the terminal to provide an early warning reminder of conflict expressions to the terminal.

In this embodiment, the images of the target monitoring object and the service object at the same shooting time are associated, and the emotional state of the characters in the images of both parties are automatically matched and detected, and the images of the emotional state can be spliced, so as to facilitate the monitoring personnel Perform comparative analysis.

It should be understood that, although the various steps in the flowchart of FIGS. 2-3 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

In one embodiment, as shown in FIG. 4, a video processing device is provided, including: a video interception module 410, a target image extraction module 420, an image collection generation module 430, an expression analysis module 440, and an archive generation module 450, wherein :

The video interception module 410 is configured to obtain surveillance videos, and intercept service sub-videos of the target surveillance object from the surveillance videos.

The target image extraction module 420 is configured to extract a first video image set containing the target monitoring object from the service sub-video, and extract a target object image from the first video image set.

The image set generating module 430 is configured to extract a second video image containing a service object from the service sub-video, and obtain a service image set according to the target object image and the second video image.

The expression analysis module 440 is configured to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images.

The file generating module 450 is configured to generate a service information file of the target monitored object according to the service image set and the preset micro-expression.

In an embodiment, the video interception module 410 may include:

The information searching unit is used to obtain the service identification of the target monitoring object, and search for the service time and the target face image corresponding to the service identification.

The segment extraction unit is configured to extract, from the surveillance video, a surveillance video segment whose shooting time matches the service time.

The re-screening unit is configured to perform face detection on the surveillance video clip according to the target face image, and extract from the surveillance video clip a video whose face matching the target face image is not detected Sub-fragment.

The duration comparison unit is used to obtain the segment duration of each of the video sub-segments, and compare the segment duration with a preset missing threshold.

The video deletion unit is configured to delete the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.

In an embodiment, the target image extraction module 420 may include:

The portrait detection unit is configured to extract first image frames from the first video image set, and detect the number of portraits in each of the first image frames.

The multi-person detection unit is configured to extract a multi-person image frame with more than one portrait from the first image frame.

The face matching unit is configured to perform face detection on the multi-person image frame according to the service face image pre-stored in the service face database, and detect whether there is a face image in the multi-person image frame that is different from the service face image Matched face image.

The target object extraction unit is configured to, if a face image that does not match the service face image is detected, extract the face image matching the target face image in the corresponding multi-person image frame as The target object image.

In an embodiment, the image collection generating module 430 may include:

The first extraction unit is configured to obtain a second video sub-segment whose segment duration does not exceed the preset missing threshold, and extract from the second video sub-segment a first video that does not match the service face image Facial image.

A second extraction unit, configured to extract a second facial image that does not match the service facial image from the multi-person image frame;

The image summary unit is configured to obtain a second video image according to the first facial image and the second facial image.

In an embodiment, the expression analysis module 440 may include:

The feature extraction unit is configured to extract facial feature points from each of the collective images, and calculate facial action features based on the facial feature points.

The probability calculation unit is used to input the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression.

The expression selecting unit is configured to select a preset micro expression matching the set image according to the matching probability value.

In an embodiment, the archive generation module 450 may include:

The associating unit is configured to associate the preset micro-expression with the corresponding collection image in the service image collection.

The category obtaining unit is configured to obtain the object category corresponding to each of the set images.

The emotion determination unit is configured to search for an expression tag corresponding to the preset micro-expression, and determine the emotion category corresponding to the preset micro-expression according to the tag.

The subset dividing unit is configured to divide the set image in the service image set into multiple image subsets according to the object category and the emotion category, and generate a service information file based on the image subset.

In an embodiment, the apparatus may further include:

The image associating module is used to associate the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set.

The category matching module is used to determine whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category.

The image splicing module is used for splicing the associated first set of images and the second set of images to obtain an expression comparison map when corresponding to different emotion categories.

For the specific limitation of the video processing device, please refer to the above limitation of the video processing method, which will not be repeated here. Each module in the above-mentioned video processing device may be implemented in whole or in part by software, hardware, and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The computer equipment database is used to store video processing data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a video processing method.

Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

In one embodiment, a computer device is provided, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the following steps are implemented: Obtain surveillance video, and intercept target surveillance from the surveillance video The service sub-video of the object; extract from the service sub-video a first video image set containing the target monitoring object, extract the target object image from the first video image set; extract from the service sub-video containing The second video image of the service object obtains a service image set according to the target object image and the second video image; the micro-expression analysis is performed on each set image in the service image set to obtain the same Matching preset micro-expressions; generating the service information file of the target monitoring object according to the service image collection and the preset micro-expressions.

In one embodiment, when the processor executes the computer program to implement the step of intercepting the service sub-video of the target monitored object from the surveillance video, it is further used to: obtain the service identifier of the target monitored object, and search for the service identifier Corresponding service time and target face image; extract from the surveillance video a surveillance video clip whose shooting time matches the service time; perform face detection on the surveillance video clip according to the target face image, and obtain Extract video sub-segments from the surveillance video segment that do not detect a face that matches the target face image; obtain the segment duration of each of the video sub-segments, and compare the segment duration with a preset missing threshold Delete the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.

In one embodiment, when the processor executes the computer program to implement the step of extracting the target object image from the first video image set, it is also used to: extract the first image frame from the first video image set, and detect The number of portraits in each of the first image frames; extract the multi-person image frames with the number of portraits greater than one from the first image frame; compare all face images of the service face stored in the service face database Perform face detection on the multi-person image frame, and detect whether there is a face image that does not match the service face image in the multi-person image frame; if it is detected that there is a face image that does not match the service face image For a face image, a face image matching the target face image in the corresponding multi-person image frame is extracted as a target object image.

In one embodiment, when the processor implements the step of extracting the second video image containing the service object from the service sub-video when executing the computer program, it is further configured to: obtain that the segment duration does not exceed the preset missing threshold Extracting a first facial image that does not match the serving face image from the second video sub-segment; extracting a first facial image that does not match the serving face image from the multi-person image frame A second facial image whose image does not match; a second video image is obtained according to the first facial image and the second facial image.

In one embodiment, when the processor executes the computer program to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images, it is also used : Extract facial feature points from each of the collection images, calculate facial motion features based on the facial feature points; input the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression; according to the matching The probability value selects a preset micro-expression matching the set of images.

In one embodiment, when the processor executes the computer program to realize the step of generating the service information file of the target monitored object according to the service image set and the preset micro-expression, it is further used to: The expression is associated with the corresponding collection image in the service image collection; the object category corresponding to each of the collection images is obtained; the expression tag corresponding to the preset micro-expression is searched, and the preset is determined according to the tag The emotion category corresponding to the micro-expression; according to the object category and the emotion category, the set images in the service image set are divided into multiple image subsets, and a service information file is generated according to the image subsets.

In one embodiment, the processor further implements the following steps when executing the computer program: Associating the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set; Whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category; when corresponding to different emotion categories, the associated first set of images Splicing with the second set of images to obtain an expression comparison map.

In one embodiment, a computer-readable storage medium is provided. The computer-readable storage medium is a volatile storage medium or a non-volatile storage medium, and a computer program is stored thereon, and the computer program is executed by a processor. When realizing the following steps: acquiring surveillance video, intercepting the service sub-video of the target surveillance object from the surveillance video; extracting the first video image set containing the target surveillance object from the service sub-video, Extract the target object image from the video image set; extract the second video image containing the service object from the service sub-video, and obtain the service image set according to the target object image and the second video image; Micro-expression analysis is performed on each collection image in the collection to obtain a preset micro-expression matching the collection images; a service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.

In one embodiment, when the computer program is executed by the processor to implement the step of intercepting the service sub-video of the target monitored object from the surveillance video, it is also used to: obtain the service identifier of the target monitored object, and search for the service Identify the corresponding service time and target face image; extract from the surveillance video a surveillance video clip whose shooting time matches the service time; perform face detection on the surveillance video clip according to the target face image, and Extract video sub-segments from the surveillance video segment that does not detect a face that matches the target face image; obtain the segment duration of each of the video sub-segments, and compare the segment duration to a preset missing threshold Compare; delete the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.

In one embodiment, when the computer program is executed by the processor to implement the step of extracting the target object image from the first video image set, it is further used to: extract the first image frame from the first video image set, Detect the number of portraits in each of the first image frames; extract from the first image frame a multi-person image frame with the number of portraits greater than one; according to the service face image pair pre-stored in the service face database Perform face detection on the multi-person image frame, and detect whether there is a face image in the multi-person image frame that does not match the service face image; if it is detected that there is no match with the service face image Then, the facial image matching the target face image in the corresponding multi-person image frame is extracted as the target object image.

In one embodiment, when the computer program is executed by the processor to implement the step of extracting the second video image containing the service object from the service sub-video, it is also used for: acquiring the segment duration not exceeding the preset absence Threshold second video sub-segment, extract from the second video sub-segment a first facial image that does not match the service face image; extract from the multi-person image frame and the service person A second facial image whose facial image does not match; a second video image is obtained according to the first facial image and the second facial image.

In one embodiment, when the computer program is executed by the processor to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images, it is also used Yu: Extract facial feature points from each of the collection images, calculate facial motion features based on the facial feature points; input the facial motion features into a micro-expression analysis model to obtain the matching probability value of each preset micro-expression; The matching probability value selects a preset micro-expression matching the set of images.

In one embodiment, when the computer program is executed by the processor to realize the step of generating the service information file of the target monitored object according to the service image collection and the preset micro-expression, it is also used to: The micro-expression is associated with the corresponding collection image in the service image collection; the object category corresponding to each of the collection images is obtained; the expression tag corresponding to the preset micro-expression is searched, and the preset is determined based on the tag Set the emotion category corresponding to the micro-expression; according to the object category and the emotion category, the set images in the service image set are divided into a plurality of image subsets, and a service information file is generated according to the image subsets.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: in the service image collection, the first collection of images of the target object category matching the shooting time is associated with the second collection of images of the service object category; Determine whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category; when corresponding to different emotion categories, associate the first set with The image and the second set of images are spliced to obtain an expression comparison map.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the procedures of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Claims

A video processing method, wherein the method includes:

Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;

Extracting a first video image set containing the target monitoring object from the service sub-video, and extracting a target object image from the first video image set;

Extracting a second video image containing a service object from the service sub-video, and obtaining a service image set according to the target object image and the second video image;

Performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;

The service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.
The method according to claim 1, wherein the intercepting the service sub-video of the target monitoring object from the monitoring video comprises:

Acquiring the service identifier of the target monitoring object, and searching the service time and target face image corresponding to the service identifier;

Extracting from the surveillance video a surveillance video clip whose shooting time matches the service time;

Performing face detection on the surveillance video clip according to the target face image, and extracting from the surveillance video clip a video sub-segment that does not detect a face that matches the target face image;

Acquiring the segment duration of each of the video sub segments, and comparing the segment duration with a preset missing threshold;

The first video sub-segment whose segment duration is greater than the preset missing threshold is deleted from the video segment to obtain a service sub-video.
The method according to claim 2, wherein said extracting a target object image from said first video image set comprises:

Extracting first image frames from the first video image set, and detecting the number of portraits in each of the first image frames;

Extracting, from the first image frame, a multi-person image frame in which the number of portraits is greater than one;

Performing face detection on the multi-person image frame according to the service face image pre-stored in the service face database, and detecting whether there is a face image in the multi-person image frame that does not match the service face image;

If it is detected that there is a face image that does not match the service face image, the face image that matches the target face image in the corresponding multi-person image frame is extracted as a target object image.
The method according to claim 3, wherein the extracting a second video image containing a service object from the service sub-video comprises:

Acquiring a second video sub-segment whose segment duration does not exceed the preset missing threshold, and extracting a first facial image that does not match the service face image from the second video sub-segment;

Extracting a second facial image that does not match the service facial image from the multi-person image frame;

Obtain a second video image according to the first facial image and the second facial image.
The method according to claim 4, wherein said respectively performing micro-expression analysis on each collection image in said service image collection to obtain preset micro-expression matching with said each collection image comprises:

Extracting facial feature points from each of the set images, and calculating facial motion features based on the facial feature points;

Inputting the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression;

According to the matching probability value, a preset micro-expression matching the collection image is selected.
The method according to claim 1, wherein the generating a service information file of the target monitored object according to the service image set and the preset micro-expressions comprises:

Associating the preset micro-expression with the corresponding collection image in the service image collection;

Acquiring the object category corresponding to each of the collection images;

Searching for an expression tag corresponding to the preset micro-expression, and determining the emotion category corresponding to the preset micro-expression according to the tag;

According to the object category and the emotion category, the set images in the service image set are divided into a plurality of image subsets, and a service information file is generated according to the image subsets.
The method according to claim 6, wherein the method further comprises:

Associating the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set;

Determining whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category;

When corresponding to different emotion categories, the associated first set of images and the second set of images are spliced to obtain an expression comparison map.
A video processing device, wherein the device includes:

The video interception module is used to obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;

A target image extraction module, configured to extract a first video image set containing the target monitoring object from the service sub-video, and extract a target object image from the first video image set;

An image collection generating module, configured to extract a second video image containing a service object from the service sub-video, and obtain a service image collection according to the target object image and the second video image;

An expression analysis module, configured to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;

The file generating module is used to generate the service information file of the target monitored object according to the service image collection and the preset micro-expression.
A computer device including:

One or more processors;

Memory

One or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and the one or more computer programs are configured to execute A video processing method, wherein the video processing method includes:

Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;

Extracting a first video image set containing the target monitoring object from the service sub-video, and extracting a target object image from the first video image set;

Extracting a second video image containing a service object from the service sub-video, and obtaining a service image set according to the target object image and the second video image;

Performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;

The service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.
The computer device according to claim 9, wherein the intercepting the service sub-video of the target monitoring object from the monitoring video comprises:

Acquiring the service identifier of the target monitoring object, and searching the service time and target face image corresponding to the service identifier;

Extracting from the surveillance video a surveillance video clip whose shooting time matches the service time;

Performing face detection on the surveillance video clip according to the target face image, and extracting from the surveillance video clip a video sub-segment that does not detect a face that matches the target face image;

Acquiring the segment duration of each of the video sub segments, and comparing the segment duration with a preset missing threshold;

The first video sub-segment whose segment duration is greater than the preset missing threshold is deleted from the video segment to obtain a service sub-video.
The computer device according to claim 10, wherein said extracting a target object image from said first video image set comprises:

Extracting first image frames from the first video image set, and detecting the number of portraits in each of the first image frames;

Extracting, from the first image frame, a multi-person image frame in which the number of portraits is greater than one;

Performing face detection on the multi-person image frame according to the service face image pre-stored in the service face database, and detecting whether there is a face image in the multi-person image frame that does not match the service face image;

If it is detected that there is a face image that does not match the service face image, the face image that matches the target face image in the corresponding multi-person image frame is extracted as a target object image.
11. The computer device according to claim 11, wherein said extracting a second video image containing a service object from said service sub-video comprises:

Acquiring a second video sub-segment whose segment duration does not exceed the preset missing threshold, and extracting a first facial image that does not match the service face image from the second video sub-segment;

Extracting a second facial image that does not match the service facial image from the multi-person image frame;

Obtain a second video image according to the first facial image and the second facial image.
The computer device according to claim 12, wherein said respectively performing micro-expression analysis on each collection image in said service image collection to obtain preset micro-expression matching with said each collection image comprises:

Extracting facial feature points from each of the set images, and calculating facial motion features based on the facial feature points;

Inputting the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression;

According to the matching probability value, a preset micro-expression matching the collection image is selected.
The computer device according to claim 9, wherein said generating the service information file of the target monitoring object according to the service image set and the preset micro-expression comprises:

Associating the preset micro-expression with the corresponding collection image in the service image collection;

Acquiring the object category corresponding to each of the collection images;

Searching for an expression tag corresponding to the preset micro-expression, and determining the emotion category corresponding to the preset micro-expression according to the tag;

According to the object category and the emotion category, the set images in the service image set are divided into a plurality of image subsets, and a service information file is generated according to the image subsets.
The computer device according to claim 14, wherein the method further comprises:

Associating the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set;

Determining whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category;

When corresponding to different emotion categories, the associated first set of images and the second set of images are spliced to obtain an expression comparison map.
A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, a video processing method is implemented, wherein the video processing method includes the following steps:

Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;

Extracting a first video image set containing the target monitoring object from the service sub-video, and extracting a target object image from the first video image set;

Extracting a second video image containing a service object from the service sub-video, and obtaining a service image set according to the target object image and the second video image;

Performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;

The service information file of the target monitoring object is generated according to the service image set and the preset micro-expression.
The computer-readable storage medium according to claim 16, wherein the intercepting the service sub-video of the target monitoring object from the monitoring video comprises:

Acquiring the service identifier of the target monitoring object, and searching the service time and target face image corresponding to the service identifier;

Extracting from the surveillance video a surveillance video clip whose shooting time matches the service time;

Performing face detection on the surveillance video clip according to the target face image, and extracting from the surveillance video clip a video sub-segment that does not detect a face that matches the target face image;

Acquiring the segment duration of each of the video sub segments, and comparing the segment duration with a preset missing threshold;

The first video sub-segment whose segment duration is greater than the preset missing threshold is deleted from the video segment to obtain a service sub-video.
18. The computer-readable storage medium of claim 17, wherein said extracting a target object image from said first video image collection comprises:

Extracting first image frames from the first video image set, and detecting the number of portraits in each of the first image frames;

Extracting, from the first image frame, a multi-person image frame in which the number of portraits is greater than one;

Performing face detection on the multi-person image frame according to the service face image pre-stored in the service face database, and detecting whether there is a face image in the multi-person image frame that does not match the service face image;

If it is detected that there is a face image that does not match the service face image, the face image that matches the target face image in the corresponding multi-person image frame is extracted as a target object image.
18. The computer-readable storage medium of claim 18, wherein the extracting a second video image containing a service object from the service sub-video comprises:

Acquiring a second video sub-segment whose segment duration does not exceed the preset missing threshold, and extracting a first facial image that does not match the service face image from the second video sub-segment;

Extracting a second facial image that does not match the service facial image from the multi-person image frame;

Obtain a second video image according to the first facial image and the second facial image.
18. The computer-readable storage medium according to claim 19, wherein said respectively performing micro-expression analysis on each collection image in said service image collection to obtain preset micro-expression matching with said each collection image comprises:

Extracting facial feature points from each of the set images, and calculating facial motion features based on the facial feature points;

Inputting the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression;

According to the matching probability value, a preset micro-expression matching the collection image is selected.