WO2021000644A1 - Video processing method and apparatus, computer device and storage medium - Google Patents

Video processing method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2021000644A1
WO2021000644A1 PCT/CN2020/087694 CN2020087694W WO2021000644A1 WO 2021000644 A1 WO2021000644 A1 WO 2021000644A1 CN 2020087694 W CN2020087694 W CN 2020087694W WO 2021000644 A1 WO2021000644 A1 WO 2021000644A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
service
video
target
images
Prior art date
Application number
PCT/CN2020/087694
Other languages
French (fr)
Chinese (zh)
Inventor
刘丽珍
吕小立
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021000644A1 publication Critical patent/WO2021000644A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to a video processing method, device, computer equipment and storage medium.
  • a video processing method includes:
  • the service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.
  • a video processing device comprising:
  • the video interception module is used to obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;
  • a target image extraction module configured to extract a first video image set containing the target monitoring object from the service sub-video, and extract a target object image from the first video image set;
  • An image collection generating module configured to extract a second video image containing a service object from the service sub-video, and obtain a service image collection according to the target object image and the second video image;
  • An expression analysis module configured to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;
  • the file generating module is used to generate the service information file of the target monitored object according to the service image collection and the preset micro-expression.
  • a computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the foregoing method when the computer program is executed.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are realized.
  • the above-mentioned video processing method, device, computer equipment and storage medium can intercept the service sub-video of the target monitoring object at the service location from the surveillance video, and detect the service object and the image containing the target object from the service sub-video , Which can automatically filter out the useless redundant image information; it can also perform micro-expression analysis on the filtered images to analyze the micro-expression of the target object and the service object in the image, so as to further process the image information and get the help evaluation The effective information of the target monitoring object, thereby greatly improving the efficiency of obtaining effective video information.
  • Figure 1 is an application scenario diagram of a video processing method in an embodiment
  • Figure 2 is a schematic flowchart of a video processing method in an embodiment
  • FIG. 3 is a schematic flowchart of a method for generating an expression comparison graph in an embodiment
  • Figure 4 is a structural block diagram of a video processing device in an embodiment
  • Fig. 5 is an internal structure diagram of a computer device in an embodiment.
  • the video processing method provided in this application can be applied to the application environment shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network.
  • the terminal 102 sends the surveillance video to the server 104.
  • the server 104 receives the surveillance video, it intercepts the service sub-video of the target monitored object from the surveillance video; extracts the first video image set containing the target monitored object from the service sub-video, from the first video Extract the target object image from the image collection; extract the second video image containing the service object from the service sub-video, and obtain the service image collection based on the target object image and the second video image; respectively perform micro-expression on each collection image in the service image collection Analyze, obtain the preset micro-expression matching with each set of images; generate the service information file of the target monitoring object according to the service image collection and the preset micro-expression.
  • the server 104 returns the generated service information file to the terminal 102.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented as an independent server or a server cluster composed of multiple servers.
  • a video processing method is provided. Taking the method applied to the server 104 in FIG. 1 as an example for description, the method includes the following steps:
  • Step 210 Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video.
  • the target monitoring objects are those who need to perform service quality monitoring and evaluation, such as customer service personnel.
  • Surveillance video is a video for monitoring and shooting the service location of the target surveillance object.
  • the people captured in the surveillance video may also include service objects such as customers or other service personnel.
  • the shooting duration of surveillance video is generally a fixed time period, such as 1 day, 1 week, etc.
  • the terminal used for monitoring can periodically send the captured surveillance video to the server. After the server receives the surveillance recorded video, it can process the surveillance video immediately or regularly.
  • the server stores in advance the information of the target monitoring object that needs to be monitored, such as the face image of the target monitoring image, service time and other information.
  • the server intercepts the acquired surveillance video according to the information of the target surveillance object, the service sub-video corresponding to the target surveillance object, and the service sub-video is the video in which the target surveillance object appears at the service location and performs activities.
  • Step 220 Extract a first video image set containing the target monitoring object from the service sub-video, and extract the target object image from the first video image set.
  • the service sub-video is only a preliminary and rough screening of the target monitoring object in the service position, which may contain other person information or redundant information.
  • the service sub-video may also include the image of the service object, the images of other service personnel, or the images of multiple people and objects due to the rotation and change of the shooting angle during the shooting of the surveillance video.
  • the server performs face monitoring on the service sub-video according to the face information of the target monitored object, and recognizes the first video image set containing the target monitored object, that is, the set of video images capable of detecting the face of the target monitored object.
  • the server may first extract video image frames from the service sub-video at a fixed time interval, thereby reducing the video image Processing volume, but the setting of the extraction time interval needs to take into account the amount of video information at the same time, and it cannot be set too large and lose too much effective information.
  • the server After the server extracts the video collection containing the target monitoring object, the server further filters the images in the collection to determine whether the target monitoring object in the image is in the service behavior state, and removes the target object image with the target monitoring object in the service behavior state from the first Extracted from the video image collection.
  • the service behavior status can be the status of customer service staff serving customers.
  • the target monitoring object may also be in other states, such as the state of communicating with other server personnel, and the idle state without service.
  • the server can judge the target based on information such as the number of people in the image and the situation of other people other than the target monitoring object. Whether the monitored object is in the service behavior state.
  • Step 230 Extract a second video image containing a service object from the service sub-video, and obtain a service image set according to the target object image and the second video image.
  • the service object is the service object of the target monitoring object, such as customers.
  • the server extracts the second video image containing the service object from the service sub-video.
  • the server may store the face information of all service personnel in advance, and perform face monitoring on each image. If a face that does not match the service personnel is detected, it can be determined that there is a service object in the image. Further, it is possible to further determine whether the service object is in the serviced state according to the number of people in the image, etc., and extract the image that contains the service object and is in the serviced state as the second video image.
  • the server obtains the service object set according to the extracted target object image and the second video image.
  • the server can mark the object category of each image in the service object, such as the service object or the target monitoring object, and can sort and organize the service image collection according to the video time corresponding to each image.
  • Step 240 Perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions matching each set of images.
  • the server After obtaining the service image collection, the server performs micro-expression analysis on each collection image in the service image collection. Specifically, the server can extract the facial images of each object from the collection images, and extract facial features from the facial images. Then find the preset micro-expression that you want to match with the facial features.
  • a variety of preset micro-expressions are stored in the database of the server in advance.
  • the preset micro-expressions can be set according to the facial parts.
  • the preset micro-expressions of the eyes can include squinting, staring, etc. Therefore, the obtained and each collection
  • the preset micro-expression can also be a micro-expression synthesized from the features of various parts, and the server matches a comprehensive facial micro-expression, such as smiling, laughing, etc., according to the facial features of multiple parts.
  • Step 250 Generate a service information file of the target monitored object according to the service image collection and the preset micro-expression.
  • the server can classify and sort according to the object category, shooting time and matching preset micro-expressions corresponding to each service image collection, and can compare and analyze the preset micro-expressions matched by the target monitoring object and the service object in the same time period.
  • the preset micro-expressions of the two parties obtain the service score of the target monitoring object, and the service scores of each time period can be comprehensively calculated to obtain the overall service evaluation score of the target monitoring object in the time period.
  • the server can also serve each time period The score is compared with the service warning threshold to determine whether the server score in the time period is qualified and whether the service quality warning prompt information is required.
  • the server can generate a service information file according to one or a combination of the above-mentioned service image collection, each collection image and other preset micro-expressions, service scores of each time period, and warning prompt information, etc.
  • the server can also generate service information files based on the collection images Perform other processing methods with the preset micro-expressions to obtain other analysis information and obtain service information files.
  • the server when it calculates the service score for each time period, it can set the service score corresponding to each preset micro-expression in advance, and the preset micro-expression matching the service object and the target monitoring object can set different service scores. , And set different weights for the preset micro-expressions of the service object and the target monitoring object, and calculate the service score of each time period according to the service score and weight of both parties.
  • the server may also use other methods to calculate the service score.
  • the server can intercept the service sub-video of the target monitoring object in the service location from the surveillance video, and detect the service object and the image containing the target object from the service sub-video, so as to automatically filter to Useless redundant image information; it can also perform micro-expression analysis on the filtered images, analyze the micro-expression of the target object and the service object in the image, and further process the image information to obtain effective information that can help evaluate the target monitoring object , Thereby greatly improving the efficiency of obtaining effective video information.
  • the step of intercepting the service sub-video of the target surveillance object from the surveillance video may include: obtaining the service identifier of the target surveillance object, searching for the service time corresponding to the service identifier and the target face image; extracting the shooting from the surveillance video Surveillance video clips whose time matches the service time; perform face detection on the surveillance video clips according to the target face image, and extract video sub-segments from the surveillance video clips that do not detect a face that matches the target face image; obtain The segment duration of each video sub-segment is compared with the preset missing threshold; the first video sub-segment whose segment duration is greater than the preset missing threshold is deleted from the video segment to obtain the service sub-video.
  • the service identifier is used to uniquely identify the service personnel.
  • the service identifier can be an employee code, name, job number, etc.
  • the mapping relationship between the service identifier of each service personnel and the basic information of the service personnel is stored in the server in advance, and the basic information of the service personnel may include Service hours are like customer service hours in class, and personnel information like gender, age, facial image information, etc.
  • the server obtains the service ID of the target monitored object, finds the service time corresponding to the service ID and the target face image, the server compares the service time with the shooting time of the surveillance video, and extracts the surveillance that matches the shooting time and the service time from the surveillance video
  • Video clips for example, can intercept corresponding clips from surveillance videos according to the starting time of the service time, and obtain fixed rest times, such as meal time, etc., and remove the video clips corresponding to the fixed rest time to obtain surveillance video clips.
  • the server performs face detection on the surveillance video clips based on the target face images found, and can extract image frames from the surveillance video clips at regular intervals to detect whether there is a face image matching the target face image in the image frames , Extract the image frames from which no matching face image is detected, and obtain multiple image frames in which no matching face images are detected in sequential extraction order to obtain video sub-segments.
  • the number of obtained video sub-segments may be multiple,
  • the server obtains the start time and end time of each video sub-segment, and calculates the duration of each video sub-segment based on the end time and the start time.
  • the server obtains the preset missing threshold, which is used to judge whether the service personnel are away.
  • the time threshold of the post if the missing time of the service personnel in the surveillance video exceeds the preset missing threshold, it is determined that the service personnel are leaving the post.
  • the server compares the segment duration of each video sub-segment with a preset missing threshold, and deletes the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.
  • the algorithm for face recognition detection can be a recognition method based on template matching, a principal component analysis method, a method based on singular value features, a subspace analysis method, a partial preservation projection method and other algorithms.
  • the step of extracting the target object image from the first video image set may include: extracting the first image frame from the first video image set, and detecting the number of portraits in each first image frame; Extract multi-person image frames with more than one portrait from the image frame; perform face detection on the multi-person image frame according to the service face image pre-stored in the service face library, and detect whether there is a face in the multi-person image frame and the service face Face images with unmatched images; if a face image that does not match the service face image is detected, the face image matching the target face image in the corresponding multi-person image frame is extracted as the target object image.
  • the first video image set is an image set containing the face images of the target monitored object, and the server may extract the first image frame from the first video image set at a fixed time interval to reduce the amount of image processing data.
  • the server detects the number of portraits in each extracted first image frame. Portrait detection is not the same as face detection. It only needs to detect the number of people present in each first image frame. It does not need to accurately recognize faces. Detect the number of portraits by detecting the outline of the human body.
  • the server extracts multi-person image frames with a number of portraits greater than one from the first image frame, that is, excluding image frames in a non-service state where the target monitoring object appears alone in the video.
  • the service face database is a face information database in the server.
  • the face images of all service personnel, including the target monitored object, are stored in the service face database.
  • the server performs face video on each multi-person image frame and obtains the detected For each facial feature, compare and match the detected facial features with the service facial images of all server personnel pre-stored in the service face database, and determine whether the detected facial features are consistent with those in the service face database.
  • a certain service face image matching when it is detected that there is a face image matching all the service face images in the multi-person image frame, it is determined that the service object exists in the multi-person image frame, and the multi-person image frame
  • the facial image that matches the target face image in is extracted and extracted as the target object image.
  • the step of extracting the second video image containing the service object from the service sub-video may include: obtaining the second video sub-segment whose segment duration does not exceed a preset missing threshold, and extracting the second video sub-segment from the second video sub-segment The first facial image that does not match the service face image; the second facial image that does not match the service face image is extracted from the multi-person image frame; the second video is obtained according to the first facial image and the second facial image image.
  • the server obtains the second video sub-segment whose segment duration does not exceed the preset missing threshold from the service sub-videos.
  • the second video sub-segment is the video for which the target monitoring object does not appear and the missing duration of the target monitoring object does not exceed the preset missing threshold.
  • the server first identifies, from the second video sub-segment, a face image matching the service face image in the service face library, and then extracts other face images as the first facial image of the service object.
  • the multi-person image frame is a multi-person image containing the target monitoring object.
  • the server first stores the multi-person image frame to identify the face image that matches the service face image in the service face database, and then converts other faces The image is extracted as the second facial image of the service object.
  • the server jointly generates a second video image based on the first facial image and the second facial image. Further, the server may mark the second video image in the image category of the service object category, and may mark the shooting time of each second video image.
  • the server extracts all the images of the service object including or not including the target monitoring object, which can avoid losing the facial expression information of the service object.
  • the step of performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match each set of images may include: extracting facial feature points from each set of images, and according to facial features Point to calculate facial motion features; input facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression; select the preset micro-expression matching the collection image according to the matching probability value.
  • Each set of images is the facial image of the target monitored object or service object.
  • the server extracts facial feature points from the set of images.
  • the facial feature points are feature points of facial features and facial contours, such as feature coordinates of eyes, mouth, nose, eyebrows, etc.
  • the server may perform facial feature point extraction on the current facial image through a pre-trained 3D face model or a deep learning neural network.
  • the server can extract facial action features from the collection image through a pre-trained 3D face model or deep learning neural network model, or it can classify the extracted facial feature points and input the corresponding The facial motion feature calculation model of, to obtain the corresponding facial motion features. For example, input facial feature points belonging to the eyes into the eye movement model to obtain facial motion features about the eyes, such as blinking, squinting, and staring.
  • the 3D face model, deep learning neural network model, and facial motion feature calculation model are all obtained by deep learning training on multiple face images in advance.
  • the server can calculate the value of each facial motion feature according to the 3D face model or the deep learning neural network model or the facial motion feature calculation model, and input the facial motion features and values into the pre-trained micro-expression classification model to obtain various Preset the probability value of the micro expression.
  • the micro-expression classification model can use SVM classifiers, deep neural network learning models, decision tree classification models and other models for classification.
  • the micro-expression classification model is obtained by pre-training the facial motion features of multiple facial images.
  • the server can select the preset micro-expression with the largest probability value according to the output result of the model.
  • the step of generating the service information file of the target monitored object according to the service image collection and the preset micro-expression may include: associating the preset micro-expression with the corresponding collection image in the service image collection; and obtaining the corresponding image of each collection Object category; find the expression label corresponding to the preset micro expression, and determine the emotion category corresponding to the preset micro expression according to the label; divide the collection image in the service image collection into multiple image subsets according to the object category and emotion category, Generate service information files based on image subsets.
  • the server associates each collection image with the corresponding preset micro-expression, for example, can mark each collection image with the preset micro-expression, or record the mapping relationship between the collection image and the corresponding preset micro-expression, etc.
  • the server obtains the object category corresponding to each set of images.
  • the object category is divided according to the face objects in the image.
  • the object category can include two categories, namely the target monitoring object category and the service object category.
  • the emoticon tag is the emotional mode tag corresponding to the preset micro-emoji.
  • the emoticon tag can be happy, excited, contemptuous, angry, peaceful, etc.
  • One emoticon tag can correspond to multiple preset micro-emoticons, such as the preset corresponding to the happy emoticon tag
  • the micro expressions can include preset micro expressions such as squinting eyes and raising the corners of the mouth.
  • the mapping relationship between the emoticon tag and the preset microemoticon is stored in the server in advance. The server searches for the emoticon tag corresponding to the preset micro emoticon.
  • Emotion tags can be divided into multiple emotion categories, and one emotion category can correspond to multiple emotion tags.
  • the emotional categories of emoticons can be divided into three categories, including positive emotional categories, neutral emotional categories, and negative emotional categories.
  • expression tags such as happy and excited belong to the positive emotional category, and expressions such as contempt and anger.
  • the label belongs to the negative emotion category, and the Pinghe emoticon label belongs to the neutral emotion category.
  • emotion categories can also be divided in other ways.
  • the mapping relationship between emotion categories and expression tags can be stored in the server in advance, and the server obtains the emotion categories corresponding to the preset micro-expressions of each collection image, and associates the found emotion categories with the corresponding collection images.
  • the server can classify the collection image according to the object category and emotion category corresponding to each collection image.
  • the collection image can be divided into service image sets of multiple object categories according to the object category, and then the service image collection of each object category is based on the collection
  • the emotional category to which the image belongs is divided into multiple small image subsets.
  • the mapping relationship of the preset micro-expression, object category, and emotional category of the shooting time corresponding to each set of images is organized to form the image information of each service image set Table, based on the divided multiple image subsets and corresponding image information tables to generate service information files.
  • the server can push the service information file to the terminal, so that the terminal can reasonably evaluate the service quality of the target monitoring object based on the service information file.
  • FIG. 3 is a flowchart of a method for generating an expression comparison graph, which may specifically include the following steps:
  • Step 310 Associate the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set.
  • the collection images in the server image collection are divided into different image subsets according to the object category, and each image subset has a corresponding image information table.
  • the server starts from the target object category, namely the target monitoring object category, Obtain the shooting time of each collection image from the image information table of the image subset of the service object category, and find the first collection image of the target object category and the second collection image of the service object category that match the shooting time, and compare the two matching images. Association between class images.
  • the matching of the shooting time does not necessarily mean that the shooting time is exactly the same. It can also be determined that the shooting time matches when the time range is the same. For example, the length of the time range can be set to 10 seconds, 20 seconds, 30 seconds, etc.
  • Step 320 Determine whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category.
  • the server separately obtains the matching preset micro-expressions of the first set of images and the second set of images, and searches for the emotional category corresponding to each preset micro-expressions, and determines whether the emotional categories corresponding to the two images are the same, that is, to determine whether the target monitoring object is Whether the emotional state of the client’s expression matches at the time, the emotional changes of the target monitoring target and the client during the communication process are generally relatively synchronized. At this time, it is easy to evaluate the service attitude of the target monitoring target, but when the emotional state of the two is inconsistent and conflicting, Customer analysis and evaluation of the service status of the target monitoring object cannot be performed, and it is often necessary to manually determine the true service status at the time.
  • step 330 when corresponding to different emotion categories, stitching the associated first set of images and second set of images to obtain an expression comparison map.
  • the server stitches the associated first set of images and second set of images to obtain an expression comparison map.
  • the location and form of the two stitching can be set according to the needs of the monitoring personnel .
  • the server may further record and mark the shooting time corresponding to the expression comparison graph.
  • the server may also sort the expression comparison images of multiple shooting times according to the shooting time, and then generate an expression comparison animation.
  • the server may send the generated expression comparison image or expression comparison animation to the terminal to provide an early warning reminder of conflict expressions to the terminal.
  • the images of the target monitoring object and the service object at the same shooting time are associated, and the emotional state of the characters in the images of both parties are automatically matched and detected, and the images of the emotional state can be spliced, so as to facilitate the monitoring personnel Perform comparative analysis.
  • a video processing device including: a video interception module 410, a target image extraction module 420, an image collection generation module 430, an expression analysis module 440, and an archive generation module 450, wherein :
  • the video interception module 410 is configured to obtain surveillance videos, and intercept service sub-videos of the target surveillance object from the surveillance videos.
  • the target image extraction module 420 is configured to extract a first video image set containing the target monitoring object from the service sub-video, and extract a target object image from the first video image set.
  • the image set generating module 430 is configured to extract a second video image containing a service object from the service sub-video, and obtain a service image set according to the target object image and the second video image.
  • the expression analysis module 440 is configured to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images.
  • the file generating module 450 is configured to generate a service information file of the target monitored object according to the service image set and the preset micro-expression.
  • the video interception module 410 may include:
  • the information searching unit is used to obtain the service identification of the target monitoring object, and search for the service time and the target face image corresponding to the service identification.
  • the segment extraction unit is configured to extract, from the surveillance video, a surveillance video segment whose shooting time matches the service time.
  • the re-screening unit is configured to perform face detection on the surveillance video clip according to the target face image, and extract from the surveillance video clip a video whose face matching the target face image is not detected Sub-fragment.
  • the duration comparison unit is used to obtain the segment duration of each of the video sub-segments, and compare the segment duration with a preset missing threshold.
  • the video deletion unit is configured to delete the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.
  • the target image extraction module 420 may include:
  • the portrait detection unit is configured to extract first image frames from the first video image set, and detect the number of portraits in each of the first image frames.
  • the multi-person detection unit is configured to extract a multi-person image frame with more than one portrait from the first image frame.
  • the face matching unit is configured to perform face detection on the multi-person image frame according to the service face image pre-stored in the service face database, and detect whether there is a face image in the multi-person image frame that is different from the service face image Matched face image.
  • the target object extraction unit is configured to, if a face image that does not match the service face image is detected, extract the face image matching the target face image in the corresponding multi-person image frame as The target object image.
  • the image collection generating module 430 may include:
  • the first extraction unit is configured to obtain a second video sub-segment whose segment duration does not exceed the preset missing threshold, and extract from the second video sub-segment a first video that does not match the service face image Facial image.
  • a second extraction unit configured to extract a second facial image that does not match the service facial image from the multi-person image frame
  • the image summary unit is configured to obtain a second video image according to the first facial image and the second facial image.
  • the expression analysis module 440 may include:
  • the feature extraction unit is configured to extract facial feature points from each of the collective images, and calculate facial action features based on the facial feature points.
  • the probability calculation unit is used to input the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression.
  • the expression selecting unit is configured to select a preset micro expression matching the set image according to the matching probability value.
  • the archive generation module 450 may include:
  • the associating unit is configured to associate the preset micro-expression with the corresponding collection image in the service image collection.
  • the category obtaining unit is configured to obtain the object category corresponding to each of the set images.
  • the emotion determination unit is configured to search for an expression tag corresponding to the preset micro-expression, and determine the emotion category corresponding to the preset micro-expression according to the tag.
  • the subset dividing unit is configured to divide the set image in the service image set into multiple image subsets according to the object category and the emotion category, and generate a service information file based on the image subset.
  • the apparatus may further include:
  • the image associating module is used to associate the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set.
  • the category matching module is used to determine whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category.
  • the image splicing module is used for splicing the associated first set of images and the second set of images to obtain an expression comparison map when corresponding to different emotion categories.
  • Each module in the above-mentioned video processing device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 5.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the computer equipment database is used to store video processing data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a video processing method.
  • FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the following steps are implemented: Obtain surveillance video, and intercept target surveillance from the surveillance video The service sub-video of the object; extract from the service sub-video a first video image set containing the target monitoring object, extract the target object image from the first video image set; extract from the service sub-video containing The second video image of the service object obtains a service image set according to the target object image and the second video image; the micro-expression analysis is performed on each set image in the service image set to obtain the same Matching preset micro-expressions; generating the service information file of the target monitoring object according to the service image collection and the preset micro-expressions.
  • the processor when the processor executes the computer program to implement the step of intercepting the service sub-video of the target monitored object from the surveillance video, it is further used to: obtain the service identifier of the target monitored object, and search for the service identifier Corresponding service time and target face image; extract from the surveillance video a surveillance video clip whose shooting time matches the service time; perform face detection on the surveillance video clip according to the target face image, and obtain Extract video sub-segments from the surveillance video segment that do not detect a face that matches the target face image; obtain the segment duration of each of the video sub-segments, and compare the segment duration with a preset missing threshold Delete the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.
  • the processor when the processor executes the computer program to implement the step of extracting the target object image from the first video image set, it is also used to: extract the first image frame from the first video image set, and detect The number of portraits in each of the first image frames; extract the multi-person image frames with the number of portraits greater than one from the first image frame; compare all face images of the service face stored in the service face database Perform face detection on the multi-person image frame, and detect whether there is a face image that does not match the service face image in the multi-person image frame; if it is detected that there is a face image that does not match the service face image For a face image, a face image matching the target face image in the corresponding multi-person image frame is extracted as a target object image.
  • the processor when the processor implements the step of extracting the second video image containing the service object from the service sub-video when executing the computer program, it is further configured to: obtain that the segment duration does not exceed the preset missing threshold Extracting a first facial image that does not match the serving face image from the second video sub-segment; extracting a first facial image that does not match the serving face image from the multi-person image frame A second facial image whose image does not match; a second video image is obtained according to the first facial image and the second facial image.
  • the processor when the processor executes the computer program to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images, it is also used : Extract facial feature points from each of the collection images, calculate facial motion features based on the facial feature points; input the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression; according to the matching The probability value selects a preset micro-expression matching the set of images.
  • the processor when the processor executes the computer program to realize the step of generating the service information file of the target monitored object according to the service image set and the preset micro-expression, it is further used to:
  • the expression is associated with the corresponding collection image in the service image collection; the object category corresponding to each of the collection images is obtained; the expression tag corresponding to the preset micro-expression is searched, and the preset is determined according to the tag
  • the emotion category corresponding to the micro-expression according to the object category and the emotion category, the set images in the service image set are divided into multiple image subsets, and a service information file is generated according to the image subsets.
  • the processor further implements the following steps when executing the computer program: Associating the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set; Whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category; when corresponding to different emotion categories, the associated first set of images Splicing with the second set of images to obtain an expression comparison map.
  • a computer-readable storage medium is provided.
  • the computer-readable storage medium is a volatile storage medium or a non-volatile storage medium, and a computer program is stored thereon, and the computer program is executed by a processor.
  • acquiring surveillance video intercepting the service sub-video of the target surveillance object from the surveillance video; extracting the first video image set containing the target surveillance object from the service sub-video, Extract the target object image from the video image set; extract the second video image containing the service object from the service sub-video, and obtain the service image set according to the target object image and the second video image;
  • Micro-expression analysis is performed on each collection image in the collection to obtain a preset micro-expression matching the collection images; a service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.
  • the computer program when executed by the processor to implement the step of intercepting the service sub-video of the target monitored object from the surveillance video, it is also used to: obtain the service identifier of the target monitored object, and search for the service Identify the corresponding service time and target face image; extract from the surveillance video a surveillance video clip whose shooting time matches the service time; perform face detection on the surveillance video clip according to the target face image, and Extract video sub-segments from the surveillance video segment that does not detect a face that matches the target face image; obtain the segment duration of each of the video sub-segments, and compare the segment duration to a preset missing threshold Compare; delete the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.
  • the computer program when executed by the processor to implement the step of extracting the target object image from the first video image set, it is further used to: extract the first image frame from the first video image set, Detect the number of portraits in each of the first image frames; extract from the first image frame a multi-person image frame with the number of portraits greater than one; according to the service face image pair pre-stored in the service face database Perform face detection on the multi-person image frame, and detect whether there is a face image in the multi-person image frame that does not match the service face image; if it is detected that there is no match with the service face image Then, the facial image matching the target face image in the corresponding multi-person image frame is extracted as the target object image.
  • the computer program when executed by the processor to implement the step of extracting the second video image containing the service object from the service sub-video, it is also used for: acquiring the segment duration not exceeding the preset absence Threshold second video sub-segment, extract from the second video sub-segment a first facial image that does not match the service face image; extract from the multi-person image frame and the service person A second facial image whose facial image does not match; a second video image is obtained according to the first facial image and the second facial image.
  • the computer program when executed by the processor to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images, it is also used Yu: Extract facial feature points from each of the collection images, calculate facial motion features based on the facial feature points; input the facial motion features into a micro-expression analysis model to obtain the matching probability value of each preset micro-expression; The matching probability value selects a preset micro-expression matching the set of images.
  • the computer program when executed by the processor to realize the step of generating the service information file of the target monitored object according to the service image collection and the preset micro-expression, it is also used to:
  • the micro-expression is associated with the corresponding collection image in the service image collection; the object category corresponding to each of the collection images is obtained; the expression tag corresponding to the preset micro-expression is searched, and the preset is determined based on the tag Set the emotion category corresponding to the micro-expression; according to the object category and the emotion category, the set images in the service image set are divided into a plurality of image subsets, and a service information file is generated according to the image subsets.
  • the following steps are further implemented: in the service image collection, the first collection of images of the target object category matching the shooting time is associated with the second collection of images of the service object category; Determine whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category; when corresponding to different emotion categories, associate the first set with The image and the second set of images are spliced to obtain an expression comparison map.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present application relates to the field of image processing, and in particular to a video processing method and apparatus, a computer device and a storage medium. The method comprises: acquiring a monitoring video, and capturing a service sub-video of a target monitoring object from the monitoring video; extracting, from the service sub-video, a set of first video images containing the target monitoring object, and extracting a target object image from the set of first video images; extracting, from the service sub-video, a second video image containing a service object, and obtaining a set of service images according to the target object image and the second video image; respectively carrying out subtle expression analysis on each set image in the set of service images to obtain a preset subtle expression matching each set image; and generating a service information profile of the target monitoring object according to the set of service images and the preset subtle expression. By means of the method, the efficiency of acquiring effective video information can be improved.

Description

视频处理方法、装置、计算机设备和存储介质Video processing method, device, computer equipment and storage medium
本申请要求于2019年7月4日提交中国专利局、申请号为201910599356.4,发明名称为“视频处理方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on July 4, 2019, the application number is 201910599356.4, and the invention title is "video processing methods, devices, computer equipment and storage media", the entire contents of which are incorporated by reference In this application.
技术领域Technical field
本申请涉及人工智能技术领域,特别是涉及一种视频处理方法、装置、计算机设备和存储介质。This application relates to the field of artificial intelligence technology, in particular to a video processing method, device, computer equipment and storage medium.
背景技术Background technique
目前,社会上越来越注重服务行业的服务态度,作为消费者都希望在任何地方都能得到最优质的服务,因此,需要对服务人员的服务质量进行监控,在服务人员的服务质量出现问题时予以及时提醒和纠正。At present, the society pays more and more attention to the service attitude of the service industry. As consumers, they hope to get the best quality service anywhere. Therefore, it is necessary to monitor the service quality of service personnel. When the service quality of service personnel fails Be reminded and corrected in time.
但是,发明人意识到监控视频中包含的信息量往往很大,质量监测人员若想根据监控视频来评价服务人员的服务质量时,需要回看监控视频,且视频中存在数量巨大的图像帧,又包含了较多的冗余信息,因此,从中获取到能够用于评价服务器质量的有效信息需要花费大量时间,视频有效信息的获取效率很低。However, the inventor realizes that the amount of information contained in surveillance video is often very large. If quality monitors want to evaluate the service quality of service personnel based on surveillance video, they need to look back at the surveillance video, and there are a huge number of image frames in the video. It also contains a lot of redundant information. Therefore, it takes a lot of time to obtain effective information that can be used to evaluate the quality of the server, and the efficiency of obtaining effective video information is very low.
发明内容Summary of the invention
基于此,有必要针对上述技术问题,提供一种能够提高视频有效信息获取效率的视频处理方法、装置、计算机设备和存储介质。Based on this, it is necessary to provide a video processing method, device, computer equipment, and storage medium that can improve the efficiency of acquiring effective video information in response to the above technical problems.
一种视频处理方法,所述方法包括:A video processing method, the method includes:
获取监控视频,从所述监控视频中截取目标监控对象的服务子视频;Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;
从所述服务子视频中抽取包含所述目标监控对象的第一视频图像集合,从所述第一视频图像集合中提取目标对象图像;Extracting a first video image set containing the target monitoring object from the service sub-video, and extracting a target object image from the first video image set;
从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合;Extracting a second video image containing a service object from the service sub-video, and obtaining a service image set according to the target object image and the second video image;
分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情;Performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;
根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案。The service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.
一种视频处理装置,所述装置包括:A video processing device, the device comprising:
视频截取模块,用于获取监控视频,从所述监控视频中截取目标监控对象的服务子视频;The video interception module is used to obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;
目标图像提取模块,用于从所述服务子视频中抽取包含所述目标监控对象的第一视频图像集合,从所述第一视频图像集合中提取目标对象图像;A target image extraction module, configured to extract a first video image set containing the target monitoring object from the service sub-video, and extract a target object image from the first video image set;
图像集合生成模块,用于从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合;An image collection generating module, configured to extract a second video image containing a service object from the service sub-video, and obtain a service image collection according to the target object image and the second video image;
表情分析模块,用于分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情;An expression analysis module, configured to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;
档案生成模块,用于根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案。The file generating module is used to generate the service information file of the target monitored object according to the service image collection and the preset micro-expression.
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述方法的步骤。A computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the foregoing method when the computer program is executed.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述方法的步骤。A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are realized.
上述视频处理方法、装置、计算机设备和存储介质,能够从监控视频中截取出目标监控对象的处于服务位置的服务子视频,并从服务子视频中检测出包含服务对象的和包含目标对象的图像,从而能够自动过滤到无用的冗余图像信息;还能够对筛选出的图像进行微表情分析,分析出目标对象及服务对象在图像中的微表情,从而对图像信息进一步处理,得到能够帮助评价目标监控对象的有效信息,从而大大提高了视频有效信息的获取效率。The above-mentioned video processing method, device, computer equipment and storage medium can intercept the service sub-video of the target monitoring object at the service location from the surveillance video, and detect the service object and the image containing the target object from the service sub-video , Which can automatically filter out the useless redundant image information; it can also perform micro-expression analysis on the filtered images to analyze the micro-expression of the target object and the service object in the image, so as to further process the image information and get the help evaluation The effective information of the target monitoring object, thereby greatly improving the efficiency of obtaining effective video information.
附图说明Description of the drawings
图1为一个实施例中视频处理方法的应用场景图;Figure 1 is an application scenario diagram of a video processing method in an embodiment;
图2为一个实施例中视频处理方法的流程示意图;Figure 2 is a schematic flowchart of a video processing method in an embodiment;
图3为一个实施例中表情比对图生成方法的流程示意图;FIG. 3 is a schematic flowchart of a method for generating an expression comparison graph in an embodiment;
图4为一个实施例中视频处理装置的结构框图;Figure 4 is a structural block diagram of a video processing device in an embodiment;
图5为一个实施例中计算机设备的内部结构图。Fig. 5 is an internal structure diagram of a computer device in an embodiment.
具体实施方式Detailed ways
本申请提供的视频处理方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104进行通信。终端102向服务器104发送监控视频,服务器104接收监控视频后,从监控视频中截取目标监控对象的服务子视频;从服务子视频中抽取包含目标监控对象的第一视频图像集合,从第一视频图像集合中提取目标对象图像;从服务子视频中提取包含服务对象的第二视频图像,根据目标对象图像和第二视频图像得到服务图像集合;分别对服务图像集合中的各集合图像进行微表情分析,得到与各集合图像匹配的预设微表情;根据服务图像集合和预设微表情生成目标监控对象的服务信息档案。服务器104将生成的服务信息档案返回给终端102。The video processing method provided in this application can be applied to the application environment shown in FIG. 1. Wherein, the terminal 102 communicates with the server 104 through the network. The terminal 102 sends the surveillance video to the server 104. After the server 104 receives the surveillance video, it intercepts the service sub-video of the target monitored object from the surveillance video; extracts the first video image set containing the target monitored object from the service sub-video, from the first video Extract the target object image from the image collection; extract the second video image containing the service object from the service sub-video, and obtain the service image collection based on the target object image and the second video image; respectively perform micro-expression on each collection image in the service image collection Analyze, obtain the preset micro-expression matching with each set of images; generate the service information file of the target monitoring object according to the service image collection and the preset micro-expression. The server 104 returns the generated service information file to the terminal 102.
其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented as an independent server or a server cluster composed of multiple servers.
在一个实施例中,如图2所示,提供了一种视频处理方法,以该方法应用 于图1中的服务器104为例进行说明,包括以下步骤:In one embodiment, as shown in FIG. 2, a video processing method is provided. Taking the method applied to the server 104 in FIG. 1 as an example for description, the method includes the following steps:
步骤210,获取监控视频,从监控视频中截取目标监控对象的服务子视频。Step 210: Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video.
目标监控对象为需要进行服务质量监控评价的对象,如客服人员等。监控视频为对目标监控对象所处服务位置进行监控拍摄的视频,监控视频中的拍摄到的人物除了目标监控对象以外,还可能包括服务对象如顾客,或其他服务人员等。监控视频的拍摄时长一般为固定的时间周期,如1天、1周等。用于监控的终端可以定期将拍摄的监控视频发送给服务器,服务器接收到监控录视频后,可以立即或定时对监控视频进行处理。服务器中事先存储了需要进行监控的目标监控对象的信息,如目标监控图像的人脸图像、服务时间等信息。服务器根据目标监控对象的信息从获取的监控视频中截取出,目标监控对象对应的服务子视频,服务子视频为目标监控对象出现在所处服务位置进行活动的视频。The target monitoring objects are those who need to perform service quality monitoring and evaluation, such as customer service personnel. Surveillance video is a video for monitoring and shooting the service location of the target surveillance object. In addition to the target surveillance object, the people captured in the surveillance video may also include service objects such as customers or other service personnel. The shooting duration of surveillance video is generally a fixed time period, such as 1 day, 1 week, etc. The terminal used for monitoring can periodically send the captured surveillance video to the server. After the server receives the surveillance recorded video, it can process the surveillance video immediately or regularly. The server stores in advance the information of the target monitoring object that needs to be monitored, such as the face image of the target monitoring image, service time and other information. The server intercepts the acquired surveillance video according to the information of the target surveillance object, the service sub-video corresponding to the target surveillance object, and the service sub-video is the video in which the target surveillance object appears at the service location and performs activities.
步骤220,从服务子视频中抽取包含目标监控对象的第一视频图像集合,从第一视频图像集合中提取目标对象图像。Step 220: Extract a first video image set containing the target monitoring object from the service sub-video, and extract the target object image from the first video image set.
服务子视频只是对目标监控对象处于服务位置的初步和粗略筛选,其中可能包含其他人物信息或冗余信息。服务子视频中除了包含目标监控对象的图像之外,由于监控视频拍摄时拍摄角度的转动和改变,还可能包含服务对象的图像、其他服务人员的图像、或多个人物对象共同出现的图像等。服务器根据目标监控对象的人脸信息对服务子视频进行人脸监测,从中识别出包含目标监控对象的第一视频图像集合,即能够检测出目标监控对象人脸的视频图像的集合。The service sub-video is only a preliminary and rough screening of the target monitoring object in the service position, which may contain other person information or redundant information. In addition to the image of the target monitored object, the service sub-video may also include the image of the service object, the images of other service personnel, or the images of multiple people and objects due to the rotation and change of the shooting angle during the shooting of the surveillance video. . The server performs face monitoring on the service sub-video according to the face information of the target monitored object, and recognizes the first video image set containing the target monitored object, that is, the set of video images capable of detecting the face of the target monitored object.
进一步地,由于服务子视频中包含数量众多的视频图像帧,在对服务子视频进行处理之前,服务器可以先以固定的时间间隔从服务子视频中抽取视频图像帧,从而能够减小视频图像的处理量,但抽取的时间间隔的设定需要同时兼顾视频信息量,不能设置太大而丢失太多有效信息。Further, since the service sub-video contains a large number of video image frames, before processing the service sub-video, the server may first extract video image frames from the service sub-video at a fixed time interval, thereby reducing the video image Processing volume, but the setting of the extraction time interval needs to take into account the amount of video information at the same time, and it cannot be set too large and lose too much effective information.
在服务器提取出包含目标监控对象的视频集合后,服务器对集合中的图像进一步筛选,判断图像中的目标监控对象是否处于服务行为状态,将目标监控对象处于服务行为状态的目标对象图像从第一视频图像集合中提取出来。例如,服务行为状态可以为客服人员对顾客进行服务的状态。此外,目标监控对象也可能处于其他状态,如与其他服务器人员交流的状态,未进行服务的空闲状态,服务器可以通过图像中的人物数量、目标监控对象之外其他人物的情况等信息来判断目标监控对象是否处于服务行为状态。After the server extracts the video collection containing the target monitoring object, the server further filters the images in the collection to determine whether the target monitoring object in the image is in the service behavior state, and removes the target object image with the target monitoring object in the service behavior state from the first Extracted from the video image collection. For example, the service behavior status can be the status of customer service staff serving customers. In addition, the target monitoring object may also be in other states, such as the state of communicating with other server personnel, and the idle state without service. The server can judge the target based on information such as the number of people in the image and the situation of other people other than the target monitoring object. Whether the monitored object is in the service behavior state.
步骤230,从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合。Step 230: Extract a second video image containing a service object from the service sub-video, and obtain a service image set according to the target object image and the second video image.
服务对象为目标监控对象进行服务的对象,如顾客等。服务器从服务子视频中提取出包含服务对象的第二视频图像。服务器中可以事先存储有所有服务人员的人脸信息,对各图像进行人脸监测,若检测到未与服务人员匹配的人脸,则可判定该图像中存在服务对象。进一步地,可以根据图像中的人物数量等进 一步判断服务对象是否处于被服务状态,将包含服务对象且处于被服务状态的图像提取为第二视频图像。The service object is the service object of the target monitoring object, such as customers. The server extracts the second video image containing the service object from the service sub-video. The server may store the face information of all service personnel in advance, and perform face monitoring on each image. If a face that does not match the service personnel is detected, it can be determined that there is a service object in the image. Further, it is possible to further determine whether the service object is in the serviced state according to the number of people in the image, etc., and extract the image that contains the service object and is in the serviced state as the second video image.
服务器根据提取出的目标对象图像和第二视频图像得到服务对象集合。服务器可以对服务对象中各图像的对象类别,如是服务对象还是目标监控对象进行类别标注,并在服务图像集合中可以根据各图像对应的视频时间进行排序和整理。The server obtains the service object set according to the extracted target object image and the second video image. The server can mark the object category of each image in the service object, such as the service object or the target monitoring object, and can sort and organize the service image collection according to the video time corresponding to each image.
步骤240,分别述服务图像集合中的各集合图像进行微表情分析,得到与各集合图像匹配的预设微表情。Step 240: Perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions matching each set of images.
得到服务图像集合后,服务器分别对服务图像集合中各集合图像进行微表情分析,具体地,服务器可以从集合图像中提取出各对象的脸部图像,从脸部图像中提取出脸部特征,再查找与脸部特征想匹配的预设微表情。服务器的数据库中事先存储了多种预设微表情,预设微表情可以根据脸部部位进行设置,如眼部的预设微表情可以包括眯眼睛、瞪眼睛等,因此,得到的与各集合图像匹配的预设微表情的数量可能有多个,如可以为多个部位所匹配的预设微表情。预设微表情也可以为各个部位特征综合得到的微表情,服务器根据多个部位的脸部特征,匹配出一个脸部综合的微表情,如微笑、大笑等。After obtaining the service image collection, the server performs micro-expression analysis on each collection image in the service image collection. Specifically, the server can extract the facial images of each object from the collection images, and extract facial features from the facial images. Then find the preset micro-expression that you want to match with the facial features. A variety of preset micro-expressions are stored in the database of the server in advance. The preset micro-expressions can be set according to the facial parts. For example, the preset micro-expressions of the eyes can include squinting, staring, etc. Therefore, the obtained and each collection There may be multiple preset micro-expressions for image matching, such as preset micro-expressions that can be matched to multiple parts. The preset micro-expression can also be a micro-expression synthesized from the features of various parts, and the server matches a comprehensive facial micro-expression, such as smiling, laughing, etc., according to the facial features of multiple parts.
步骤250,根据服务图像集合和预设微表情生成目标监控对象的服务信息档案。Step 250: Generate a service information file of the target monitored object according to the service image collection and the preset micro-expression.
服务器可以根据各服务图像集合对应的对象类别、拍摄时间及匹配的预设微表情进行分类整理,且可以对相同时间段内目标监控对象与服务对象所匹配的预设微表情进行对比分析,根据双方对象的预设微表情得到目标监控对象的服务得分,并可以将各时间段的服务得分进行综合计算,得到该时间周期内目标监控对象的整体服务评价得分,服务器还可以将各时间段段服务得分与服务预警阈值进行比较,判断该时间段的服务器得分是否合格及是否要进行服务质量预警提示信息等。服务器可以根据上述服务图像集合、各集合图像等预设微表情、各时间段的服务得分、预警提示信息等各信息中的一种或几种的组合生成服务信息档案,服务器也可以根据集合图像和预设微表情进行其他处理方式得到其他分析信息,得到服务信息档案。The server can classify and sort according to the object category, shooting time and matching preset micro-expressions corresponding to each service image collection, and can compare and analyze the preset micro-expressions matched by the target monitoring object and the service object in the same time period. The preset micro-expressions of the two parties obtain the service score of the target monitoring object, and the service scores of each time period can be comprehensively calculated to obtain the overall service evaluation score of the target monitoring object in the time period. The server can also serve each time period The score is compared with the service warning threshold to determine whether the server score in the time period is qualified and whether the service quality warning prompt information is required. The server can generate a service information file according to one or a combination of the above-mentioned service image collection, each collection image and other preset micro-expressions, service scores of each time period, and warning prompt information, etc. The server can also generate service information files based on the collection images Perform other processing methods with the preset micro-expressions to obtain other analysis information and obtain service information files.
具体地,服务器在计算各时间段段服务得分时,可以设定预先设定各预设微表情对应的服务分值,服务对象和目标监控对象匹配的预设微表情可以设定不同的服务分值,并对服务对象和目标监控对象的预设微表情设定不同的权重,根据双方的服务分值和权重计算得到各时间段的服务得分。在其他实施例中,服务器也可以采用其他方法计算服务得分。Specifically, when the server calculates the service score for each time period, it can set the service score corresponding to each preset micro-expression in advance, and the preset micro-expression matching the service object and the target monitoring object can set different service scores. , And set different weights for the preset micro-expressions of the service object and the target monitoring object, and calculate the service score of each time period according to the service score and weight of both parties. In other embodiments, the server may also use other methods to calculate the service score.
在本实施例中,服务器能够从监控视频中截取出目标监控对象的处于服务位置的服务子视频,并从服务子视频中检测出包含服务对象的和包含目标对象的图像,从而能够自动过滤到无用的冗余图像信息;还能够对筛选出的图像进 行微表情分析,分析出目标对象及服务对象在图像中的微表情,从而对图像信息进一步处理,得到能够帮助评价目标监控对象的有效信息,从而大大提高了视频有效信息的获取效率。In this embodiment, the server can intercept the service sub-video of the target monitoring object in the service location from the surveillance video, and detect the service object and the image containing the target object from the service sub-video, so as to automatically filter to Useless redundant image information; it can also perform micro-expression analysis on the filtered images, analyze the micro-expression of the target object and the service object in the image, and further process the image information to obtain effective information that can help evaluate the target monitoring object , Thereby greatly improving the efficiency of obtaining effective video information.
在一个实施例中,从监控视频中截取目标监控对象的服务子视频的步骤可以包括:获取目标监控对象的服务标识,查找服务标识对应的服务时间和目标人脸图像;从监控视频中提取拍摄时间与服务时间匹配的监控视频片段;根据目标人脸图像对监控视频片段进行人脸检测,并从监控视频片段中提取出未检测到与目标人脸图相匹配人脸的视频子片段;获取各视频子片段的片段时长,将片段时长与预设缺失阈值进行比较;将片段时长大于预设缺失阈值的第一视频子片段从视频片段中删除,得到服务子视频。In one embodiment, the step of intercepting the service sub-video of the target surveillance object from the surveillance video may include: obtaining the service identifier of the target surveillance object, searching for the service time corresponding to the service identifier and the target face image; extracting the shooting from the surveillance video Surveillance video clips whose time matches the service time; perform face detection on the surveillance video clips according to the target face image, and extract video sub-segments from the surveillance video clips that do not detect a face that matches the target face image; obtain The segment duration of each video sub-segment is compared with the preset missing threshold; the first video sub-segment whose segment duration is greater than the preset missing threshold is deleted from the video segment to obtain the service sub-video.
服务标识用于对服务人员进行唯一标识,服务标识可以为员工代码、姓名、工号等,各服务人员的服务标识与服务人员基本信息的映射关系事先存储于服务器中,服务人员基本信息可以包括服务时间如在班的客服时间,人员信息如性别、年龄、人脸图像信息等。服务器获取目标监控对象的服务标识,查找服务标识对应的服务时间和目标人脸图像,服务器将服务时间与监控视频的拍摄时间进行比对,从监控视频中提取出拍摄时间与服务时间匹配的监控视频片段,如可以根据服务时间的起始时间从监控视频中截取相应的片段,并获取固定休息时间,如吃饭时间等,将其中固定休息时间对应的视频片段剔除得到监控视频片段。The service identifier is used to uniquely identify the service personnel. The service identifier can be an employee code, name, job number, etc. The mapping relationship between the service identifier of each service personnel and the basic information of the service personnel is stored in the server in advance, and the basic information of the service personnel may include Service hours are like customer service hours in class, and personnel information like gender, age, facial image information, etc. The server obtains the service ID of the target monitored object, finds the service time corresponding to the service ID and the target face image, the server compares the service time with the shooting time of the surveillance video, and extracts the surveillance that matches the shooting time and the service time from the surveillance video Video clips, for example, can intercept corresponding clips from surveillance videos according to the starting time of the service time, and obtain fixed rest times, such as meal time, etc., and remove the video clips corresponding to the fixed rest time to obtain surveillance video clips.
服务器根据查找到的目标人脸图像对监控视频片段进行人脸检测,可以每隔固定时间间隔从监控视频片段中抽取出图像帧,检测图像帧中是否存在与目标人脸图像匹配的人脸图像,将未检测到匹配人脸图像的图像帧提取出来,获取抽取顺序连续的未检测出匹配人脸图像的多个图像帧得到视频子片段,得到的视频子片段的个数可能为多个,服务器获取各视频子片段的起始时间和终止时间,根据终止时间和起始时间计算出各视频子片段的持续的片段时长,服务器获取预设缺失阈值,预设缺失阈值为评判服务人员是否离岗的时间阈值,若服务人员在监控视频中的缺失时间超过预设缺失阈值,则判定服务人员离岗。服务器将各视频子片段的片段时长与预设缺失阈值进行比较,将片段时长大于预设缺失阈值的第一视频子片段从视频片段中删除,得到服务子视频。其中,在进行人脸识别检测时的算法,可以采用基于模板匹配的识别方法、主成分分析法、基于奇异值特征的方法、子空间分析法、局部保持投影法等算法。The server performs face detection on the surveillance video clips based on the target face images found, and can extract image frames from the surveillance video clips at regular intervals to detect whether there is a face image matching the target face image in the image frames , Extract the image frames from which no matching face image is detected, and obtain multiple image frames in which no matching face images are detected in sequential extraction order to obtain video sub-segments. The number of obtained video sub-segments may be multiple, The server obtains the start time and end time of each video sub-segment, and calculates the duration of each video sub-segment based on the end time and the start time. The server obtains the preset missing threshold, which is used to judge whether the service personnel are away. The time threshold of the post, if the missing time of the service personnel in the surveillance video exceeds the preset missing threshold, it is determined that the service personnel are leaving the post. The server compares the segment duration of each video sub-segment with a preset missing threshold, and deletes the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video. Among them, the algorithm for face recognition detection can be a recognition method based on template matching, a principal component analysis method, a method based on singular value features, a subspace analysis method, a partial preservation projection method and other algorithms.
在本实施例中,通过视频匹配和人脸检测可以初步筛选出包含目标监控对象的视频片段,从而能够有效减少与目标监控目标无关的冗余视频片段。且通过设定预设缺失阈值,可以初步判定目标监控对象是长时间离岗如上厕所、外出等离岗,还是短时间离岗如向其他服务人员咨询,获取资料等,并保留短时间离岗时的视频信息(会包含服务对象的表情信息),从而降低有效信息的流失。In this embodiment, through video matching and face detection, it is possible to preliminarily screen out the video clips containing the target monitoring object, thereby effectively reducing redundant video clips that are not related to the target monitoring target. And by setting the preset missing threshold, it can be preliminarily determined whether the target monitoring object is leaving the job for a long time, such as going to the toilet, going out, or leaving the job for a short time, such as consulting with other service personnel, obtaining information, etc., and keeping it for a short time. Time video information (will contain the expression information of the service object), thereby reducing the loss of effective information.
在一个实施例中,从第一视频图像集合中提取目标对象图像的步骤可以包括:从第一视频图像集合中抽取第一图像帧,检测各第一图像帧中的人像个数;从第一图像帧中提取人像个数大于1个的多人图像帧;根据服务人脸库中预存的服务人脸图像对多人图像帧进行人脸检测,检测多人图像帧中是否存在与服务人脸图像均不匹配的人脸图像;若检测出存在与服务人脸图像均不匹配的人脸图像,则将对应的多人图像帧中与目标人脸图像匹配的面部图像提取为目标对象图像。In one embodiment, the step of extracting the target object image from the first video image set may include: extracting the first image frame from the first video image set, and detecting the number of portraits in each first image frame; Extract multi-person image frames with more than one portrait from the image frame; perform face detection on the multi-person image frame according to the service face image pre-stored in the service face library, and detect whether there is a face in the multi-person image frame and the service face Face images with unmatched images; if a face image that does not match the service face image is detected, the face image matching the target face image in the corresponding multi-person image frame is extracted as the target object image.
第一视频图像集合为包含目标监控对象的人脸图像的图像集合,服务器可以以固定时间间隔从第一视频图集合中抽取出第一图像帧,减少图像处理的数据量。服务器检测各抽取的第一图像帧中的人像个数,人像检测并不同于人脸检测,只需检测各第一图像帧中存在的人数即可,不需要精确的识别出人脸,如可以通过检测人体轮廓来检测人像个数。服务器从第一图像帧中提取出人像个数大于1的多人图像帧,即排除目标监控对象单独出现在视频中的非服务状态的图像帧。The first video image set is an image set containing the face images of the target monitored object, and the server may extract the first image frame from the first video image set at a fixed time interval to reduce the amount of image processing data. The server detects the number of portraits in each extracted first image frame. Portrait detection is not the same as face detection. It only needs to detect the number of people present in each first image frame. It does not need to accurately recognize faces. Detect the number of portraits by detecting the outline of the human body. The server extracts multi-person image frames with a number of portraits greater than one from the first image frame, that is, excluding image frames in a non-service state where the target monitoring object appears alone in the video.
服务人脸库为服务器中的一个人脸信息库,所有服务人员包括目标监控对象的人脸图像均存储于服务人脸库中,服务器对各多人图像帧进行人脸视频,得到检测到的各人脸特征,将检测到的人脸特征与服务人脸库中预存的所有服务器人员的服务人脸图像进行特征比对和匹配,判断检测出的人脸特征是否与服务人脸库中的某个服务人脸图像匹配,当检测到多人图像帧中存在与所有服务人脸图像军部匹配的人脸图像时,则判定该多人图像帧中存在服务对象,将该多人图像帧中与目标人脸图像匹配的面部图像提取出来,提取为目标对象图像。The service face database is a face information database in the server. The face images of all service personnel, including the target monitored object, are stored in the service face database. The server performs face video on each multi-person image frame and obtains the detected For each facial feature, compare and match the detected facial features with the service facial images of all server personnel pre-stored in the service face database, and determine whether the detected facial features are consistent with those in the service face database. A certain service face image matching, when it is detected that there is a face image matching all the service face images in the multi-person image frame, it is determined that the service object exists in the multi-person image frame, and the multi-person image frame The facial image that matches the target face image in is extracted and extracted as the target object image.
在本实施例中,通过多人图像检测,可以排除目标监控对象单独存在的图像帧,通过服务人员的服务人脸图像的检测匹配,可以排除只有目标监控对象和服务人员存在的非服务状态的图像帧,从而进一步地缩小视频图像范围,有效减少冗余图像信息。In this embodiment, through multi-person image detection, it is possible to exclude image frames in which the target monitoring object exists alone, and through the detection and matching of the service face image of the service personnel, it is possible to exclude the non-service state where only the target monitoring object and the service personnel exist Image frames, thereby further reducing the range of video images, effectively reducing redundant image information.
在一个实施例中,从服务子视频中提取包含服务对象的第二视频图像的步骤可以包括:获取片段时长未超过预设缺失阈值的第二视频子片段,从第二视频子片段中提取出与服务人脸图像不匹配的第一面部图像;从多人图像帧中提取出与服务人脸图像不匹配的第二面部图像;根据第一面部图像和第二面部图像得到第二视频图像。In one embodiment, the step of extracting the second video image containing the service object from the service sub-video may include: obtaining the second video sub-segment whose segment duration does not exceed a preset missing threshold, and extracting the second video sub-segment from the second video sub-segment The first facial image that does not match the service face image; the second facial image that does not match the service face image is extracted from the multi-person image frame; the second video is obtained according to the first facial image and the second facial image image.
服务器从服务子视频中获取片段时长未超过预设缺失阈值的第二视频子片段,第二视频子片段为未出现目标监控对象的,且目标监控对象的缺失时长未超过预设缺失阈值的视频片段,服务器从第二视频子片段中先识别出与服务人脸库中的服务人脸图像匹配的人脸图像,再将其他人脸图像提取为服务对象的第一面部图像。The server obtains the second video sub-segment whose segment duration does not exceed the preset missing threshold from the service sub-videos. The second video sub-segment is the video for which the target monitoring object does not appear and the missing duration of the target monitoring object does not exceed the preset missing threshold. In the segment, the server first identifies, from the second video sub-segment, a face image matching the service face image in the service face library, and then extracts other face images as the first facial image of the service object.
多人图像帧为包含目标监控对象的多人图像,同样地,服务器先存多人图像帧中识别出与与服务人脸库中的服务人脸图像匹配的人脸图像,再将其他人脸图像提取为服务对象的第二面部图像。服务器根据第一面部图像和第二面部图像共同生成第二视频图像。进一步地,服务器可以对第二视频图像进行服务对象类别的图像类别标注,并可以标注各第二视频图像的拍摄时间。The multi-person image frame is a multi-person image containing the target monitoring object. Similarly, the server first stores the multi-person image frame to identify the face image that matches the service face image in the service face database, and then converts other faces The image is extracted as the second facial image of the service object. The server jointly generates a second video image based on the first facial image and the second facial image. Further, the server may mark the second video image in the image category of the service object category, and may mark the shooting time of each second video image.
在本实施例中,服务器将包含或未包含目标监控对象中的服务对象的图像均提取出来,可以避免丢失服务对象的面部表情信息。In this embodiment, the server extracts all the images of the service object including or not including the target monitoring object, which can avoid losing the facial expression information of the service object.
在一个实施例中,分别对服务图像集合中的各集合图像进行微表情分析,得到与各集合图像匹配的预设微表情的步骤可以包括:从各集合图像中提取面部特征点,根据面部特征点计算面部动作特征;将面部动作特征输入微表情分析模型得到各预设微表情的匹配概率值;根据匹配概率值选取与集合图像匹配的预设微表情。In one embodiment, the step of performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match each set of images may include: extracting facial feature points from each set of images, and according to facial features Point to calculate facial motion features; input facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression; select the preset micro-expression matching the collection image according to the matching probability value.
各集合图像为目标监控对象或服务对象的面部图像。服务器从集合图像中提取出面部特征点,面部特征点为五官及脸部轮廓的特征点,如眼睛、嘴巴、鼻子、眉毛等的特征坐标。具体地,服务器可以通过预先训练好的3D人脸模型或者深度学习神经网络对当前面部图像进行面部特征点提取。Each set of images is the facial image of the target monitored object or service object. The server extracts facial feature points from the set of images. The facial feature points are feature points of facial features and facial contours, such as feature coordinates of eyes, mouth, nose, eyebrows, etc. Specifically, the server may perform facial feature point extraction on the current facial image through a pre-trained 3D face model or a deep learning neural network.
服务器可以基于提取出的面部特征点再通过预先训练好的3D人脸模型或者深度学习神经网络模型从集合图像中提取出面部动作特征,也可以将提取出的各面部特征点进行分类后输入对应的面部动作特征计算模型,得到相应面部动作特征,例如,将属于眼部的面部特征点输入眼动模型可以得到关于眼部的面部动作特征,如眨眼特征、眯眼特征、瞪眼特征等。3D人脸模型、深度学习神经网络模型、面部动作特征计算模型都是通过事先对多张人脸图像深度学习训练得到的。Based on the extracted facial feature points, the server can extract facial action features from the collection image through a pre-trained 3D face model or deep learning neural network model, or it can classify the extracted facial feature points and input the corresponding The facial motion feature calculation model of, to obtain the corresponding facial motion features. For example, input facial feature points belonging to the eyes into the eye movement model to obtain facial motion features about the eyes, such as blinking, squinting, and staring. The 3D face model, deep learning neural network model, and facial motion feature calculation model are all obtained by deep learning training on multiple face images in advance.
服务器可以根据3D人脸模型或者深度学习神经网络模型或者面部动作特征计算模型计算各面部动作特征的取值,并将面部动作特征以及取值输入预先训练好的微表情分类模型中,得到各种预设微表情的概率值。微表情分类模型可以采用SVM分类器、深度神经网络学习模型、决策树分类模型等多种用于分类的模型,微表情分类模型通过事先对多张人脸图像的面部动作特征训练得到。服务器可以根据模型输出结果选取出概率值最大的预设微表情。The server can calculate the value of each facial motion feature according to the 3D face model or the deep learning neural network model or the facial motion feature calculation model, and input the facial motion features and values into the pre-trained micro-expression classification model to obtain various Preset the probability value of the micro expression. The micro-expression classification model can use SVM classifiers, deep neural network learning models, decision tree classification models and other models for classification. The micro-expression classification model is obtained by pre-training the facial motion features of multiple facial images. The server can select the preset micro-expression with the largest probability value according to the output result of the model.
在本实施例中,通过对脸部特征进行提取和特征分类训练,可以得到更加准确的预设微表情,得到的预设微表情可为评价服务人员的服务态度和质量,以及服务对象的满意程度提供重要的数据参考。In this embodiment, through facial feature extraction and feature classification training, more accurate preset micro-expressions can be obtained, and the obtained preset micro-expressions can be used to evaluate the service attitude and quality of service personnel and the satisfaction of service objects The degree provides important data reference.
在一个实施例中,根据服务图像集合和预设微表情生成目标监控对象的服务信息档案的步骤可以包括:将预设微表情与服务图像集合中对应的集合图像进行关联;获取各集合图像对应的对象类别;查找预设微表情对应的表情标签,并根据标签判定预设微表情对应的情感类别;根据对象类别和情感类别,将服 务图像集合中的集合图像划分为多个图像子集,根据图像子集生成服务信息档案。In one embodiment, the step of generating the service information file of the target monitored object according to the service image collection and the preset micro-expression may include: associating the preset micro-expression with the corresponding collection image in the service image collection; and obtaining the corresponding image of each collection Object category; find the expression label corresponding to the preset micro expression, and determine the emotion category corresponding to the preset micro expression according to the label; divide the collection image in the service image collection into multiple image subsets according to the object category and emotion category, Generate service information files based on image subsets.
服务器将各集合图像和相应的预设微表情进行关联,如可以对各集合图像进行预设微表情标注,或记录集合图像与相应预设微表情的映射关系等。服务器获取各集合图像对应的对象类别,在本实施例中,对象类别按照图像中的人脸对象进行划分,对象类别可以包括两类,即目标监控对象类别和服务对象类别,在对各种对象类别的集合图像进行人脸检测匹配时,即根据检测出的人脸图像的对象对集合图像进行类别标注。The server associates each collection image with the corresponding preset micro-expression, for example, can mark each collection image with the preset micro-expression, or record the mapping relationship between the collection image and the corresponding preset micro-expression, etc. The server obtains the object category corresponding to each set of images. In this embodiment, the object category is divided according to the face objects in the image. The object category can include two categories, namely the target monitoring object category and the service object category. When the collection images of the categories are subjected to face detection and matching, the collection images are classified according to the objects of the detected face images.
表情标签即为预设微表情对应的情绪模式标签,表情标签可以开心、兴奋、轻蔑、愤怒、平和等,一种表情标签可以对应于多种预设微表情,如开心表情标签对应的预设微表情可以包括眯眼睛、嘴角上扬等预设微表情。表情标签与预设微表情之间的映射关系事先存储于服务器中。服务器查找预设微表情对应的表情标签。The emoticon tag is the emotional mode tag corresponding to the preset micro-emoji. The emoticon tag can be happy, excited, contemptuous, angry, peaceful, etc. One emoticon tag can correspond to multiple preset micro-emoticons, such as the preset corresponding to the happy emoticon tag The micro expressions can include preset micro expressions such as squinting eyes and raising the corners of the mouth. The mapping relationship between the emoticon tag and the preset microemoticon is stored in the server in advance. The server searches for the emoticon tag corresponding to the preset micro emoticon.
表情标签可以划分为多个情感类别,一种情感类别可以对应于多个表情标签。如在一个实施例中,可以将表情标签的情感类别划分为三种,包括积极情感类别、中性情感类别和消极情感类别,如开心、兴奋等表情标签属于积极情感类别,轻蔑、愤怒等表情标签属于消极情感类别,平和表情标签属于中性情感类别等。在其他实施例中,也可以采用其他方式划分情感类别。情感类别与表情标签的映射关系可以事先存储于服务器中,服务器获取各集合图像的预设微表情对应的情感类别,并将查找到的情感类别与相应的集合图像进行关联。Emotion tags can be divided into multiple emotion categories, and one emotion category can correspond to multiple emotion tags. For example, in one embodiment, the emotional categories of emoticons can be divided into three categories, including positive emotional categories, neutral emotional categories, and negative emotional categories. For example, expression tags such as happy and excited belong to the positive emotional category, and expressions such as contempt and anger. The label belongs to the negative emotion category, and the Pinghe emoticon label belongs to the neutral emotion category. In other embodiments, emotion categories can also be divided in other ways. The mapping relationship between emotion categories and expression tags can be stored in the server in advance, and the server obtains the emotion categories corresponding to the preset micro-expressions of each collection image, and associates the found emotion categories with the corresponding collection images.
服务器可以根据各集合图像对应的对象类别和情感类别对集合图像进行分类,如可以先根据对象类别将集合图像划分为多个对象类别的服务图像集,再在各对象类别的服务图像集中根据集合图像所属的情感类别划分为多个小的图像子集,同时将各集合图像对应的拍摄时间预设微表情、对象类别及情感类别等信息的映射关系进行整理,形成各服务图像集的图像信息表,根据划分的多个图像子集及对应的图像信息表生成服务信息档案。服务器可以将服务信息档案推送至终端,以使终端根据服务信息档案对目标监控对象的服务质量进行合理评测。The server can classify the collection image according to the object category and emotion category corresponding to each collection image. For example, the collection image can be divided into service image sets of multiple object categories according to the object category, and then the service image collection of each object category is based on the collection The emotional category to which the image belongs is divided into multiple small image subsets. At the same time, the mapping relationship of the preset micro-expression, object category, and emotional category of the shooting time corresponding to each set of images is organized to form the image information of each service image set Table, based on the divided multiple image subsets and corresponding image information tables to generate service information files. The server can push the service information file to the terminal, so that the terminal can reasonably evaluate the service quality of the target monitoring object based on the service information file.
在本实施例中,通过对预设微表情进行情感类别的判定,及根据集合图像的对象类别和情感类别对集合图像进行分类整理,能够方便对集合图像进行查找,并便于获取集合图像中的对象表情信息。In this embodiment, by judging the emotion category of the preset micro-expression, and sorting and sorting the collection image according to the object category and emotion category of the collection image, it is convenient to search the collection image and to obtain the information in the collection image. Object expression information.
在一个实施例中,如图3所示,图3为表情比对图生成方法的流程图,具体可以包括以下步骤:In an embodiment, as shown in FIG. 3, FIG. 3 is a flowchart of a method for generating an expression comparison graph, which may specifically include the following steps:
步骤310,将服务图像集合中,拍摄时间匹配的目标对象类别的第一集合图像与服务对象类别的第二集合图像进行关联。Step 310: Associate the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set.
上述实施例中,服务器图像集合中的集合图像根据对象类别划分为不同的 图像子集,且各图像子集均有对应的图像信息表,服务器从分别从目标对象类别即目标监控对象类别的,和服务对象类别的图像子集的图像信息表中获取各集合图像的拍摄时间,并查找拍摄时间相匹配的标对象类别的第一集合图像与服务对象类别的第二集合图像,将匹配的两类图像之间进行关联。其中,拍摄时间相匹配并不一定是拍摄时间完全一致,也可以是所属时间范围一致则判定拍摄时间相匹配,如可以将时间范围长度设定为10秒、20秒、30秒等。In the above embodiment, the collection images in the server image collection are divided into different image subsets according to the object category, and each image subset has a corresponding image information table. The server starts from the target object category, namely the target monitoring object category, Obtain the shooting time of each collection image from the image information table of the image subset of the service object category, and find the first collection image of the target object category and the second collection image of the service object category that match the shooting time, and compare the two matching images. Association between class images. Wherein, the matching of the shooting time does not necessarily mean that the shooting time is exactly the same. It can also be determined that the shooting time matches when the time range is the same. For example, the length of the time range can be set to 10 seconds, 20 seconds, 30 seconds, etc.
步骤320,判断第一集合图像关联的预设微表情与第二集合图像关联的预设微表情,是否对应相同的情感类别。Step 320: Determine whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category.
服务器分别获取相匹配的第一集合图像与第二集合图像的预设微表情,并查找各预设微表情对应的情感类别,判定两个图像对应的情感类别是否一致,即判断目标监控对象与服务对象当时的表情情感状态是否相符,目标监控对象与服务对象在交流过程中的情感变化一般都会相对同步,这时易于评价目标监控对象的服务态度,但当两者情感状态不一致具有冲突时,无法对目标监控对象的服务状态进行客户分析评价,往往需要人工确定当时的真实服务情况。The server separately obtains the matching preset micro-expressions of the first set of images and the second set of images, and searches for the emotional category corresponding to each preset micro-expressions, and determines whether the emotional categories corresponding to the two images are the same, that is, to determine whether the target monitoring object is Whether the emotional state of the client’s expression matches at the time, the emotional changes of the target monitoring target and the client during the communication process are generally relatively synchronized. At this time, it is easy to evaluate the service attitude of the target monitoring target, but when the emotional state of the two is inconsistent and conflicting, Customer analysis and evaluation of the service status of the target monitoring object cannot be performed, and it is often necessary to manually determine the true service status at the time.
步骤330,当对应不同的情感类别时,将关联的第一集合图像和第二集合图像进行拼接得到表情比对图。In step 330, when corresponding to different emotion categories, stitching the associated first set of images and second set of images to obtain an expression comparison map.
当服务器判定出两个图像对应不同的情感类别时,服务器将关联的第一集合图像和第二集合图像进行拼接得到表情比对图,二者拼接的位置和形式可以根据监控人员的需求进行设置。进一步地,服务器可以进一步地对表情比对图对应的拍摄时间进行记录和标注。服务器也可以将多个拍摄时间的表情比对图根据拍摄时间进行排序后,生成表情比对动图。服务器可以将生成的表情比对图或表情比对动图发送至终端,以对终端进行冲突表情的预警提示。When the server determines that the two images correspond to different emotion categories, the server stitches the associated first set of images and second set of images to obtain an expression comparison map. The location and form of the two stitching can be set according to the needs of the monitoring personnel . Further, the server may further record and mark the shooting time corresponding to the expression comparison graph. The server may also sort the expression comparison images of multiple shooting times according to the shooting time, and then generate an expression comparison animation. The server may send the generated expression comparison image or expression comparison animation to the terminal to provide an early warning reminder of conflict expressions to the terminal.
在本实施例中,将同一拍摄时间的目标监控对象与服务对象的图像进行关联,并自动对双方图像中人物情感状态进行匹配检测,可以将情感状态出图的图像进行拼接,从而便于监控人员进行比对分析。In this embodiment, the images of the target monitoring object and the service object at the same shooting time are associated, and the emotional state of the characters in the images of both parties are automatically matched and detected, and the images of the emotional state can be spliced, so as to facilitate the monitoring personnel Perform comparative analysis.
应该理解的是,虽然图2-3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowchart of FIGS. 2-3 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在一个实施例中,如图4所示,提供了一种视频处理装置,包括:视频截取模块410、目标图像提取模块420、图像集合生成模块430、表情分析模块440和档案生成模块450,其中:In one embodiment, as shown in FIG. 4, a video processing device is provided, including: a video interception module 410, a target image extraction module 420, an image collection generation module 430, an expression analysis module 440, and an archive generation module 450, wherein :
视频截取模块410,用于获取监控视频,从所述监控视频中截取目标监控对象的服务子视频。The video interception module 410 is configured to obtain surveillance videos, and intercept service sub-videos of the target surveillance object from the surveillance videos.
目标图像提取模块420,用于从所述服务子视频中抽取包含所述目标监控对象的第一视频图像集合,从所述第一视频图像集合中提取目标对象图像。The target image extraction module 420 is configured to extract a first video image set containing the target monitoring object from the service sub-video, and extract a target object image from the first video image set.
图像集合生成模块430,用于从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合。The image set generating module 430 is configured to extract a second video image containing a service object from the service sub-video, and obtain a service image set according to the target object image and the second video image.
表情分析模块440,用于分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情。The expression analysis module 440 is configured to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images.
档案生成模块450,用于根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案。The file generating module 450 is configured to generate a service information file of the target monitored object according to the service image set and the preset micro-expression.
在一个实施例中,视频截取模块410可以包括:In an embodiment, the video interception module 410 may include:
信息查找单元,用于获取所述目标监控对象的服务标识,查找所述服务标识对应的服务时间和目标人脸图像。The information searching unit is used to obtain the service identification of the target monitoring object, and search for the service time and the target face image corresponding to the service identification.
片段提取单元,用于从所述监控视频中提取拍摄时间与所述服务时间匹配的监控视频片段。The segment extraction unit is configured to extract, from the surveillance video, a surveillance video segment whose shooting time matches the service time.
再筛选单元,用于根据所述目标人脸图像对所述监控视频片段进行人脸检测,并从所述监控视频片段中提取出未检测到与所述目标人脸图相匹配人脸的视频子片段。The re-screening unit is configured to perform face detection on the surveillance video clip according to the target face image, and extract from the surveillance video clip a video whose face matching the target face image is not detected Sub-fragment.
时长比较单元,用于获取各所述视频子片段的片段时长,将所述片段时长与预设缺失阈值进行比较。The duration comparison unit is used to obtain the segment duration of each of the video sub-segments, and compare the segment duration with a preset missing threshold.
视频删除单元,用于将所述片段时长大于所述预设缺失阈值的第一视频子片段从所述视频片段中删除,得到服务子视频。The video deletion unit is configured to delete the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.
在一个实施例中,目标图像提取模块420可以包括:In an embodiment, the target image extraction module 420 may include:
人像检测单元,用于从所述第一视频图像集合中抽取第一图像帧,检测各所述第一图像帧中的人像个数。The portrait detection unit is configured to extract first image frames from the first video image set, and detect the number of portraits in each of the first image frames.
多人检测单元,用于从所述第一图像帧中提取所述人像个数大于1个的多人图像帧。The multi-person detection unit is configured to extract a multi-person image frame with more than one portrait from the first image frame.
人脸匹配单元,用于根据服务人脸库中预存的服务人脸图像对所述多人图像帧进行人脸检测,检测所述多人图像帧中是否存在与所述服务人脸图像均不匹配的人脸图像。The face matching unit is configured to perform face detection on the multi-person image frame according to the service face image pre-stored in the service face database, and detect whether there is a face image in the multi-person image frame that is different from the service face image Matched face image.
目标对象提取单元,用于若检测出存在与所述服务人脸图像均不匹配的人脸图像,则将对应的所述多人图像帧中与所述目标人脸图像匹配的面部图像提取为目标对象图像。The target object extraction unit is configured to, if a face image that does not match the service face image is detected, extract the face image matching the target face image in the corresponding multi-person image frame as The target object image.
在一个实施例中,图像集合生成模块430可以包括:In an embodiment, the image collection generating module 430 may include:
第一提取单元,用于获取所述片段时长未超过所述预设缺失阈值的第二视频子片段,从所述第二视频子片段中提取出与所述服务人脸图像不匹配的第一 面部图像。The first extraction unit is configured to obtain a second video sub-segment whose segment duration does not exceed the preset missing threshold, and extract from the second video sub-segment a first video that does not match the service face image Facial image.
第二提取单元,用于从所述多人图像帧中提取出与所述服务人脸图像不匹配的第二面部图像;A second extraction unit, configured to extract a second facial image that does not match the service facial image from the multi-person image frame;
图像汇总单元,用于根据所述第一面部图像和所述第二面部图像得到第二视频图像。The image summary unit is configured to obtain a second video image according to the first facial image and the second facial image.
在一个实施例中,表情分析模块440可以包括:In an embodiment, the expression analysis module 440 may include:
特征提取单元,用于从各所述集合图像中提取面部特征点,根据所述面部特征点计算面部动作特征。The feature extraction unit is configured to extract facial feature points from each of the collective images, and calculate facial action features based on the facial feature points.
概率计算单元,用于将所述面部动作特征输入微表情分析模型得到各预设微表情的匹配概率值。The probability calculation unit is used to input the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression.
表情选取单元,用于根据所述匹配概率值选取与所述集合图像匹配的预设微表情。The expression selecting unit is configured to select a preset micro expression matching the set image according to the matching probability value.
在一个实施例中,档案生成模块450可以包括:In an embodiment, the archive generation module 450 may include:
关联单元,用于将所述预设微表情与所述服务图像集合中对应的集合图像进行关联。The associating unit is configured to associate the preset micro-expression with the corresponding collection image in the service image collection.
类别获取单元,用于获取所述各所述集合图像对应的对象类别。The category obtaining unit is configured to obtain the object category corresponding to each of the set images.
情感判定单元,用于查找所述预设微表情对应的表情标签,并根据所述标签判定所述预设微表情对应的情感类别。The emotion determination unit is configured to search for an expression tag corresponding to the preset micro-expression, and determine the emotion category corresponding to the preset micro-expression according to the tag.
子集划分单元,用于根据所述对象类别和所述情感类别,将所述服务图像集合中的集合图像划分为多个图像子集,根据所述图像子集生成服务信息档案。The subset dividing unit is configured to divide the set image in the service image set into multiple image subsets according to the object category and the emotion category, and generate a service information file based on the image subset.
在一个实施例中,装置还可以包括:In an embodiment, the apparatus may further include:
图像关联模块,用于将所述服务图像集合中,拍摄时间匹配的目标对象类别的第一集合图像与服务对象类别的第二集合图像进行关联。The image associating module is used to associate the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set.
类别匹配模块,用于判断所述第一集合图像关联的预设微表情与所述第二集合图像关联的预设微表情,是否对应相同的情感类别。The category matching module is used to determine whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category.
图像拼接模块,用于当对应不同的情感类别时,将关联的所述第一集合图像和所述第二集合图像进行拼接得到表情比对图。The image splicing module is used for splicing the associated first set of images and the second set of images to obtain an expression comparison map when corresponding to different emotion categories.
关于视频处理装置的具体限定可以参见上文中对于视频处理方法的限定,在此不再赘述。上述视频处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the video processing device, please refer to the above limitation of the video processing method, which will not be repeated here. Each module in the above-mentioned video processing device may be implemented in whole or in part by software, hardware, and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图5所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失 性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储视频处理数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种视频处理方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The computer equipment database is used to store video processing data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a video processing method.
本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,该存储器存储有计算机程序,该处理器执行计算机程序时实现以下步骤:获取监控视频,从所述监控视频中截取目标监控对象的服务子视频;从所述服务子视频中抽取包含所述目标监控对象的第一视频图像集合,从所述第一视频图像集合中提取目标对象图像;从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合;分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情;根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案。In one embodiment, a computer device is provided, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the following steps are implemented: Obtain surveillance video, and intercept target surveillance from the surveillance video The service sub-video of the object; extract from the service sub-video a first video image set containing the target monitoring object, extract the target object image from the first video image set; extract from the service sub-video containing The second video image of the service object obtains a service image set according to the target object image and the second video image; the micro-expression analysis is performed on each set image in the service image set to obtain the same Matching preset micro-expressions; generating the service information file of the target monitoring object according to the service image collection and the preset micro-expressions.
在一个实施例中,处理器执行计算机程序时实现从所述监控视频中截取目标监控对象的服务子视频的步骤时,还用于:获取所述目标监控对象的服务标识,查找所述服务标识对应的服务时间和目标人脸图像;从所述监控视频中提取拍摄时间与所述服务时间匹配的监控视频片段;根据所述目标人脸图像对所述监控视频片段进行人脸检测,并从所述监控视频片段中提取出未检测到与所述目标人脸图相匹配人脸的视频子片段;获取各所述视频子片段的片段时长,将所述片段时长与预设缺失阈值进行比较;将所述片段时长大于所述预设缺失阈值的第一视频子片段从所述视频片段中删除,得到服务子视频。In one embodiment, when the processor executes the computer program to implement the step of intercepting the service sub-video of the target monitored object from the surveillance video, it is further used to: obtain the service identifier of the target monitored object, and search for the service identifier Corresponding service time and target face image; extract from the surveillance video a surveillance video clip whose shooting time matches the service time; perform face detection on the surveillance video clip according to the target face image, and obtain Extract video sub-segments from the surveillance video segment that do not detect a face that matches the target face image; obtain the segment duration of each of the video sub-segments, and compare the segment duration with a preset missing threshold Delete the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.
在一个实施例中,处理器执行计算机程序时实现从所述第一视频图像集合中提取目标对象图像的步骤时,还用于:从所述第一视频图像集合中抽取第一图像帧,检测各所述第一图像帧中的人像个数;从所述第一图像帧中提取所述人像个数大于1个的多人图像帧;根据服务人脸库中预存的服务人脸图像对所述多人图像帧进行人脸检测,检测所述多人图像帧中是否存在与所述服务人脸图像均不匹配的人脸图像;若检测出存在与所述服务人脸图像均不匹配的人脸图像,则将对应的所述多人图像帧中与所述目标人脸图像匹配的面部图像提取为目标对象图像。In one embodiment, when the processor executes the computer program to implement the step of extracting the target object image from the first video image set, it is also used to: extract the first image frame from the first video image set, and detect The number of portraits in each of the first image frames; extract the multi-person image frames with the number of portraits greater than one from the first image frame; compare all face images of the service face stored in the service face database Perform face detection on the multi-person image frame, and detect whether there is a face image that does not match the service face image in the multi-person image frame; if it is detected that there is a face image that does not match the service face image For a face image, a face image matching the target face image in the corresponding multi-person image frame is extracted as a target object image.
在一个实施例中,处理器执行计算机程序时实现从所述服务子视频中提取包含服务对象的第二视频图像的步骤时,还用于:获取所述片段时长未超过所述预设缺失阈值的第二视频子片段,从所述第二视频子片段中提取出与所述服 务人脸图像不匹配的第一面部图像;从所述多人图像帧中提取出与所述服务人脸图像不匹配的第二面部图像;根据所述第一面部图像和所述第二面部图像得到第二视频图像。In one embodiment, when the processor implements the step of extracting the second video image containing the service object from the service sub-video when executing the computer program, it is further configured to: obtain that the segment duration does not exceed the preset missing threshold Extracting a first facial image that does not match the serving face image from the second video sub-segment; extracting a first facial image that does not match the serving face image from the multi-person image frame A second facial image whose image does not match; a second video image is obtained according to the first facial image and the second facial image.
在一个实施例中,处理器执行计算机程序时实现分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情的步骤时,还用于:从各所述集合图像中提取面部特征点,根据所述面部特征点计算面部动作特征;将所述面部动作特征输入微表情分析模型得到各预设微表情的匹配概率值;根据所述匹配概率值选取与所述集合图像匹配的预设微表情。In one embodiment, when the processor executes the computer program to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images, it is also used : Extract facial feature points from each of the collection images, calculate facial motion features based on the facial feature points; input the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression; according to the matching The probability value selects a preset micro-expression matching the set of images.
在一个实施例中,处理器执行计算机程序时实现根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案的步骤时,还用于:将所述预设微表情与所述服务图像集合中对应的集合图像进行关联;获取所述各所述集合图像对应的对象类别;查找所述预设微表情对应的表情标签,并根据所述标签判定所述预设微表情对应的情感类别;根据所述对象类别和所述情感类别,将所述服务图像集合中的集合图像划分为多个图像子集,根据所述图像子集生成服务信息档案。In one embodiment, when the processor executes the computer program to realize the step of generating the service information file of the target monitored object according to the service image set and the preset micro-expression, it is further used to: The expression is associated with the corresponding collection image in the service image collection; the object category corresponding to each of the collection images is obtained; the expression tag corresponding to the preset micro-expression is searched, and the preset is determined according to the tag The emotion category corresponding to the micro-expression; according to the object category and the emotion category, the set images in the service image set are divided into multiple image subsets, and a service information file is generated according to the image subsets.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:将所述服务图像集合中,拍摄时间匹配的目标对象类别的第一集合图像与服务对象类别的第二集合图像进行关联;判断所述第一集合图像关联的预设微表情与所述第二集合图像关联的预设微表情,是否对应相同的情感类别;当对应不同的情感类别时,将关联的所述第一集合图像和所述第二集合图像进行拼接得到表情比对图。In one embodiment, the processor further implements the following steps when executing the computer program: Associating the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set; Whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category; when corresponding to different emotion categories, the associated first set of images Splicing with the second set of images to obtain an expression comparison map.
在一个实施例中,提供了一种计算机可读存储介质,所述计算机可读存储介质为易失性存储介质或非易失性存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取监控视频,从所述监控视频中截取目标监控对象的服务子视频;从所述服务子视频中抽取包含所述目标监控对象的第一视频图像集合,从所述第一视频图像集合中提取目标对象图像;从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合;分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情;根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案。In one embodiment, a computer-readable storage medium is provided. The computer-readable storage medium is a volatile storage medium or a non-volatile storage medium, and a computer program is stored thereon, and the computer program is executed by a processor. When realizing the following steps: acquiring surveillance video, intercepting the service sub-video of the target surveillance object from the surveillance video; extracting the first video image set containing the target surveillance object from the service sub-video, Extract the target object image from the video image set; extract the second video image containing the service object from the service sub-video, and obtain the service image set according to the target object image and the second video image; Micro-expression analysis is performed on each collection image in the collection to obtain a preset micro-expression matching the collection images; a service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.
在一个实施例中,计算机程序被处理器执行时实现从所述监控视频中截取目标监控对象的服务子视频的步骤时,还用于:获取所述目标监控对象的服务标识,查找所述服务标识对应的服务时间和目标人脸图像;从所述监控视频中提取拍摄时间与所述服务时间匹配的监控视频片段;根据所述目标人脸图像对所述监控视频片段进行人脸检测,并从所述监控视频片段中提取出未检测到与所述目标人脸图相匹配人脸的视频子片段;获取各所述视频子片段的片段时长, 将所述片段时长与预设缺失阈值进行比较;将所述片段时长大于所述预设缺失阈值的第一视频子片段从所述视频片段中删除,得到服务子视频。In one embodiment, when the computer program is executed by the processor to implement the step of intercepting the service sub-video of the target monitored object from the surveillance video, it is also used to: obtain the service identifier of the target monitored object, and search for the service Identify the corresponding service time and target face image; extract from the surveillance video a surveillance video clip whose shooting time matches the service time; perform face detection on the surveillance video clip according to the target face image, and Extract video sub-segments from the surveillance video segment that does not detect a face that matches the target face image; obtain the segment duration of each of the video sub-segments, and compare the segment duration to a preset missing threshold Compare; delete the first video sub-segment whose segment duration is greater than the preset missing threshold from the video segment to obtain a service sub-video.
在一个实施例中,计算机程序被处理器执行时实现从所述第一视频图像集合中提取目标对象图像的步骤时,还用于:从所述第一视频图像集合中抽取第一图像帧,检测各所述第一图像帧中的人像个数;从所述第一图像帧中提取所述人像个数大于1个的多人图像帧;根据服务人脸库中预存的服务人脸图像对所述多人图像帧进行人脸检测,检测所述多人图像帧中是否存在与所述服务人脸图像均不匹配的人脸图像;若检测出存在与所述服务人脸图像均不匹配的人脸图像,则将对应的所述多人图像帧中与所述目标人脸图像匹配的面部图像提取为目标对象图像。In one embodiment, when the computer program is executed by the processor to implement the step of extracting the target object image from the first video image set, it is further used to: extract the first image frame from the first video image set, Detect the number of portraits in each of the first image frames; extract from the first image frame a multi-person image frame with the number of portraits greater than one; according to the service face image pair pre-stored in the service face database Perform face detection on the multi-person image frame, and detect whether there is a face image in the multi-person image frame that does not match the service face image; if it is detected that there is no match with the service face image Then, the facial image matching the target face image in the corresponding multi-person image frame is extracted as the target object image.
在一个实施例中,计算机程序被处理器执行时实现从所述服务子视频中提取包含服务对象的第二视频图像的步骤时,还用于:获取所述片段时长未超过所述预设缺失阈值的第二视频子片段,从所述第二视频子片段中提取出与所述服务人脸图像不匹配的第一面部图像;从所述多人图像帧中提取出与所述服务人脸图像不匹配的第二面部图像;根据所述第一面部图像和所述第二面部图像得到第二视频图像。In one embodiment, when the computer program is executed by the processor to implement the step of extracting the second video image containing the service object from the service sub-video, it is also used for: acquiring the segment duration not exceeding the preset absence Threshold second video sub-segment, extract from the second video sub-segment a first facial image that does not match the service face image; extract from the multi-person image frame and the service person A second facial image whose facial image does not match; a second video image is obtained according to the first facial image and the second facial image.
在一个实施例中,计算机程序被处理器执行时实现分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情的步骤时,还用于:从各所述集合图像中提取面部特征点,根据所述面部特征点计算面部动作特征;将所述面部动作特征输入微表情分析模型得到各预设微表情的匹配概率值;根据所述匹配概率值选取与所述集合图像匹配的预设微表情。In one embodiment, when the computer program is executed by the processor to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images, it is also used Yu: Extract facial feature points from each of the collection images, calculate facial motion features based on the facial feature points; input the facial motion features into a micro-expression analysis model to obtain the matching probability value of each preset micro-expression; The matching probability value selects a preset micro-expression matching the set of images.
在一个实施例中,计算机程序被处理器执行时实现根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案的步骤时,还用于:将所述预设微表情与所述服务图像集合中对应的集合图像进行关联;获取所述各所述集合图像对应的对象类别;查找所述预设微表情对应的表情标签,并根据所述标签判定所述预设微表情对应的情感类别;根据所述对象类别和所述情感类别,将所述服务图像集合中的集合图像划分为多个图像子集,根据所述图像子集生成服务信息档案。In one embodiment, when the computer program is executed by the processor to realize the step of generating the service information file of the target monitored object according to the service image collection and the preset micro-expression, it is also used to: The micro-expression is associated with the corresponding collection image in the service image collection; the object category corresponding to each of the collection images is obtained; the expression tag corresponding to the preset micro-expression is searched, and the preset is determined based on the tag Set the emotion category corresponding to the micro-expression; according to the object category and the emotion category, the set images in the service image set are divided into a plurality of image subsets, and a service information file is generated according to the image subsets.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:将所述服务图像集合中,拍摄时间匹配的目标对象类别的第一集合图像与服务对象类别的第二集合图像进行关联;判断所述第一集合图像关联的预设微表情与所述第二集合图像关联的预设微表情,是否对应相同的情感类别;当对应不同的情感类别时,将关联的所述第一集合图像和所述第二集合图像进行拼接得到表情比对图。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: in the service image collection, the first collection of images of the target object category matching the shooting time is associated with the second collection of images of the service object category; Determine whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category; when corresponding to different emotion categories, associate the first set with The image and the second set of images are spliced to obtain an expression comparison map.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程, 是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the procedures of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Claims (20)

  1. 一种视频处理方法,其中,所述方法包括:A video processing method, wherein the method includes:
    获取监控视频,从所述监控视频中截取目标监控对象的服务子视频;Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;
    从所述服务子视频中抽取包含所述目标监控对象的第一视频图像集合,从所述第一视频图像集合中提取目标对象图像;Extracting a first video image set containing the target monitoring object from the service sub-video, and extracting a target object image from the first video image set;
    从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合;Extracting a second video image containing a service object from the service sub-video, and obtaining a service image set according to the target object image and the second video image;
    分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情;Performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;
    根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案。The service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.
  2. 根据权利要求1所述的方法,其中,所述从所述监控视频中截取目标监控对象的服务子视频,包括:The method according to claim 1, wherein the intercepting the service sub-video of the target monitoring object from the monitoring video comprises:
    获取所述目标监控对象的服务标识,查找所述服务标识对应的服务时间和目标人脸图像;Acquiring the service identifier of the target monitoring object, and searching the service time and target face image corresponding to the service identifier;
    从所述监控视频中提取拍摄时间与所述服务时间匹配的监控视频片段;Extracting from the surveillance video a surveillance video clip whose shooting time matches the service time;
    根据所述目标人脸图像对所述监控视频片段进行人脸检测,并从所述监控视频片段中提取出未检测到与所述目标人脸图相匹配人脸的视频子片段;Performing face detection on the surveillance video clip according to the target face image, and extracting from the surveillance video clip a video sub-segment that does not detect a face that matches the target face image;
    获取各所述视频子片段的片段时长,将所述片段时长与预设缺失阈值进行比较;Acquiring the segment duration of each of the video sub segments, and comparing the segment duration with a preset missing threshold;
    将所述片段时长大于所述预设缺失阈值的第一视频子片段从所述视频片段中删除,得到服务子视频。The first video sub-segment whose segment duration is greater than the preset missing threshold is deleted from the video segment to obtain a service sub-video.
  3. 根据权利要求2所述的方法,其中,所述从所述第一视频图像集合中提取目标对象图像,包括:The method according to claim 2, wherein said extracting a target object image from said first video image set comprises:
    从所述第一视频图像集合中抽取第一图像帧,检测各所述第一图像帧中的人像个数;Extracting first image frames from the first video image set, and detecting the number of portraits in each of the first image frames;
    从所述第一图像帧中提取所述人像个数大于1个的多人图像帧;Extracting, from the first image frame, a multi-person image frame in which the number of portraits is greater than one;
    根据服务人脸库中预存的服务人脸图像对所述多人图像帧进行人脸检测,检测所述多人图像帧中是否存在与所述服务人脸图像均不匹配的人脸图像;Performing face detection on the multi-person image frame according to the service face image pre-stored in the service face database, and detecting whether there is a face image in the multi-person image frame that does not match the service face image;
    若检测出存在与所述服务人脸图像均不匹配的人脸图像,则将对应的所述多人图像帧中与所述目标人脸图像匹配的面部图像提取为目标对象图像。If it is detected that there is a face image that does not match the service face image, the face image that matches the target face image in the corresponding multi-person image frame is extracted as a target object image.
  4. 根据权利要求3所述的方法,其中,所述从所述服务子视频中提取包含服务对象的第二视频图像,包括:The method according to claim 3, wherein the extracting a second video image containing a service object from the service sub-video comprises:
    获取所述片段时长未超过所述预设缺失阈值的第二视频子片段,从所述第二视频子片段中提取出与所述服务人脸图像不匹配的第一面部图像;Acquiring a second video sub-segment whose segment duration does not exceed the preset missing threshold, and extracting a first facial image that does not match the service face image from the second video sub-segment;
    从所述多人图像帧中提取出与所述服务人脸图像不匹配的第二面部图像;Extracting a second facial image that does not match the service facial image from the multi-person image frame;
    根据所述第一面部图像和所述第二面部图像得到第二视频图像。Obtain a second video image according to the first facial image and the second facial image.
  5. 根据权利要求4所述的方法,其中,所述分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情,包括:The method according to claim 4, wherein said respectively performing micro-expression analysis on each collection image in said service image collection to obtain preset micro-expression matching with said each collection image comprises:
    从各所述集合图像中提取面部特征点,根据所述面部特征点计算面部动作特征;Extracting facial feature points from each of the set images, and calculating facial motion features based on the facial feature points;
    将所述面部动作特征输入微表情分析模型得到各预设微表情的匹配概率值;Inputting the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression;
    根据所述匹配概率值选取与所述集合图像匹配的预设微表情。According to the matching probability value, a preset micro-expression matching the collection image is selected.
  6. 根据权利要求1所述的方法,其中,所述根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案,包括:The method according to claim 1, wherein the generating a service information file of the target monitored object according to the service image set and the preset micro-expressions comprises:
    将所述预设微表情与所述服务图像集合中对应的集合图像进行关联;Associating the preset micro-expression with the corresponding collection image in the service image collection;
    获取所述各所述集合图像对应的对象类别;Acquiring the object category corresponding to each of the collection images;
    查找所述预设微表情对应的表情标签,并根据所述标签判定所述预设微表情对应的情感类别;Searching for an expression tag corresponding to the preset micro-expression, and determining the emotion category corresponding to the preset micro-expression according to the tag;
    根据所述对象类别和所述情感类别,将所述服务图像集合中的集合图像划分为多个图像子集,根据所述图像子集生成服务信息档案。According to the object category and the emotion category, the set images in the service image set are divided into a plurality of image subsets, and a service information file is generated according to the image subsets.
  7. 根据权利要求6所述的方法,其中,所述方法还包括:The method according to claim 6, wherein the method further comprises:
    将所述服务图像集合中,拍摄时间匹配的目标对象类别的第一集合图像与服务对象类别的第二集合图像进行关联;Associating the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set;
    判断所述第一集合图像关联的预设微表情与所述第二集合图像关联的预设微表情,是否对应相同的情感类别;Determining whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category;
    当对应不同的情感类别时,将关联的所述第一集合图像和所述第二集合图像进行拼接得到表情比对图。When corresponding to different emotion categories, the associated first set of images and the second set of images are spliced to obtain an expression comparison map.
  8. 一种视频处理装置,其中,所述装置包括:A video processing device, wherein the device includes:
    视频截取模块,用于获取监控视频,从所述监控视频中截取目标监控对象的服务子视频;The video interception module is used to obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;
    目标图像提取模块,用于从所述服务子视频中抽取包含所述目标监控对象的第一视频图像集合,从所述第一视频图像集合中提取目标对象图像;A target image extraction module, configured to extract a first video image set containing the target monitoring object from the service sub-video, and extract a target object image from the first video image set;
    图像集合生成模块,用于从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合;An image collection generating module, configured to extract a second video image containing a service object from the service sub-video, and obtain a service image collection according to the target object image and the second video image;
    表情分析模块,用于分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情;An expression analysis module, configured to perform micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;
    档案生成模块,用于根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案。The file generating module is used to generate the service information file of the target monitored object according to the service image collection and the preset micro-expression.
  9. 一种计算机设备,包括:A computer device including:
    一个或多个处理器;One or more processors;
    存储器;Memory
    一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于执行一种视频处理方法,其中,所述视频处理方法包括:One or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and the one or more computer programs are configured to execute A video processing method, wherein the video processing method includes:
    获取监控视频,从所述监控视频中截取目标监控对象的服务子视频;Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;
    从所述服务子视频中抽取包含所述目标监控对象的第一视频图像集合,从所述第一视频图像集合中提取目标对象图像;Extracting a first video image set containing the target monitoring object from the service sub-video, and extracting a target object image from the first video image set;
    从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合;Extracting a second video image containing a service object from the service sub-video, and obtaining a service image set according to the target object image and the second video image;
    分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情;Performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;
    根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案。The service information file of the target monitoring object is generated according to the service image collection and the preset micro-expression.
  10. 根据权利要求9所述的计算机设备,其中,所述从所述监控视频中截取目标监控对象的服务子视频,包括:The computer device according to claim 9, wherein the intercepting the service sub-video of the target monitoring object from the monitoring video comprises:
    获取所述目标监控对象的服务标识,查找所述服务标识对应的服务时间和目标人脸图像;Acquiring the service identifier of the target monitoring object, and searching the service time and target face image corresponding to the service identifier;
    从所述监控视频中提取拍摄时间与所述服务时间匹配的监控视频片段;Extracting from the surveillance video a surveillance video clip whose shooting time matches the service time;
    根据所述目标人脸图像对所述监控视频片段进行人脸检测,并从所述监控视频片段中提取出未检测到与所述目标人脸图相匹配人脸的视频子片段;Performing face detection on the surveillance video clip according to the target face image, and extracting from the surveillance video clip a video sub-segment that does not detect a face that matches the target face image;
    获取各所述视频子片段的片段时长,将所述片段时长与预设缺失阈值进行比较;Acquiring the segment duration of each of the video sub segments, and comparing the segment duration with a preset missing threshold;
    将所述片段时长大于所述预设缺失阈值的第一视频子片段从所述视频片段中删除,得到服务子视频。The first video sub-segment whose segment duration is greater than the preset missing threshold is deleted from the video segment to obtain a service sub-video.
  11. 根据权利要求10所述的计算机设备,其中,其中,所述从所述第一视频图像集合中提取目标对象图像,包括:The computer device according to claim 10, wherein said extracting a target object image from said first video image set comprises:
    从所述第一视频图像集合中抽取第一图像帧,检测各所述第一图像帧中的人像个数;Extracting first image frames from the first video image set, and detecting the number of portraits in each of the first image frames;
    从所述第一图像帧中提取所述人像个数大于1个的多人图像帧;Extracting, from the first image frame, a multi-person image frame in which the number of portraits is greater than one;
    根据服务人脸库中预存的服务人脸图像对所述多人图像帧进行人脸检测,检测所述多人图像帧中是否存在与所述服务人脸图像均不匹配的人脸图像;Performing face detection on the multi-person image frame according to the service face image pre-stored in the service face database, and detecting whether there is a face image in the multi-person image frame that does not match the service face image;
    若检测出存在与所述服务人脸图像均不匹配的人脸图像,则将对应的所述多人图像帧中与所述目标人脸图像匹配的面部图像提取为目标对象图像。If it is detected that there is a face image that does not match the service face image, the face image that matches the target face image in the corresponding multi-person image frame is extracted as a target object image.
  12. 根据权利要求11所述的计算机设备,其中,所述从所述服务子视频中提取包含服务对象的第二视频图像,包括:11. The computer device according to claim 11, wherein said extracting a second video image containing a service object from said service sub-video comprises:
    获取所述片段时长未超过所述预设缺失阈值的第二视频子片段,从所述第二视频子片段中提取出与所述服务人脸图像不匹配的第一面部图像;Acquiring a second video sub-segment whose segment duration does not exceed the preset missing threshold, and extracting a first facial image that does not match the service face image from the second video sub-segment;
    从所述多人图像帧中提取出与所述服务人脸图像不匹配的第二面部图像;Extracting a second facial image that does not match the service facial image from the multi-person image frame;
    根据所述第一面部图像和所述第二面部图像得到第二视频图像。Obtain a second video image according to the first facial image and the second facial image.
  13. 根据权利要求12所述的计算机设备,其中,所述分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情,包括:The computer device according to claim 12, wherein said respectively performing micro-expression analysis on each collection image in said service image collection to obtain preset micro-expression matching with said each collection image comprises:
    从各所述集合图像中提取面部特征点,根据所述面部特征点计算面部动作特征;Extracting facial feature points from each of the set images, and calculating facial motion features based on the facial feature points;
    将所述面部动作特征输入微表情分析模型得到各预设微表情的匹配概率值;Inputting the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression;
    根据所述匹配概率值选取与所述集合图像匹配的预设微表情。According to the matching probability value, a preset micro-expression matching the collection image is selected.
  14. 根据权利要求9所述的计算机设备,其中,所述根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信息档案,包括:The computer device according to claim 9, wherein said generating the service information file of the target monitoring object according to the service image set and the preset micro-expression comprises:
    将所述预设微表情与所述服务图像集合中对应的集合图像进行关联;Associating the preset micro-expression with the corresponding collection image in the service image collection;
    获取所述各所述集合图像对应的对象类别;Acquiring the object category corresponding to each of the collection images;
    查找所述预设微表情对应的表情标签,并根据所述标签判定所述预设微表情对应的情感类别;Searching for an expression tag corresponding to the preset micro-expression, and determining the emotion category corresponding to the preset micro-expression according to the tag;
    根据所述对象类别和所述情感类别,将所述服务图像集合中的集合图像划分为多个图像子集,根据所述图像子集生成服务信息档案。According to the object category and the emotion category, the set images in the service image set are divided into a plurality of image subsets, and a service information file is generated according to the image subsets.
  15. 根据权利要求14所述的计算机设备,其中,所述方法还包括:The computer device according to claim 14, wherein the method further comprises:
    将所述服务图像集合中,拍摄时间匹配的目标对象类别的第一集合图像与服务对象类别的第二集合图像进行关联;Associating the first set of images of the target object category whose shooting time matches with the second set of images of the service object category in the service image set;
    判断所述第一集合图像关联的预设微表情与所述第二集合图像关联的预设微表情,是否对应相同的情感类别;Determining whether the preset micro-expressions associated with the first set of images and the preset micro-expressions associated with the second set of images correspond to the same emotion category;
    当对应不同的情感类别时,将关联的所述第一集合图像和所述第二集合图像进行拼接得到表情比对图。When corresponding to different emotion categories, the associated first set of images and the second set of images are spliced to obtain an expression comparison map.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现视频处理方法,其中,所述视频处理方法包括以下步骤:A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, a video processing method is implemented, wherein the video processing method includes the following steps:
    获取监控视频,从所述监控视频中截取目标监控对象的服务子视频;Obtain surveillance video, and intercept the service sub-video of the target surveillance object from the surveillance video;
    从所述服务子视频中抽取包含所述目标监控对象的第一视频图像集合,从所述第一视频图像集合中提取目标对象图像;Extracting a first video image set containing the target monitoring object from the service sub-video, and extracting a target object image from the first video image set;
    从所述服务子视频中提取包含服务对象的第二视频图像,根据所述目标对象图像和所述第二视频图像得到服务图像集合;Extracting a second video image containing a service object from the service sub-video, and obtaining a service image set according to the target object image and the second video image;
    分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情;Performing micro-expression analysis on each set of images in the service image set to obtain preset micro-expressions that match the set of images;
    根据所述服务图像集合和所述预设微表情生成所述目标监控对象的服务信 息档案。The service information file of the target monitoring object is generated according to the service image set and the preset micro-expression.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述从所述监控视频中截取目标监控对象的服务子视频,包括:The computer-readable storage medium according to claim 16, wherein the intercepting the service sub-video of the target monitoring object from the monitoring video comprises:
    获取所述目标监控对象的服务标识,查找所述服务标识对应的服务时间和目标人脸图像;Acquiring the service identifier of the target monitoring object, and searching the service time and target face image corresponding to the service identifier;
    从所述监控视频中提取拍摄时间与所述服务时间匹配的监控视频片段;Extracting from the surveillance video a surveillance video clip whose shooting time matches the service time;
    根据所述目标人脸图像对所述监控视频片段进行人脸检测,并从所述监控视频片段中提取出未检测到与所述目标人脸图相匹配人脸的视频子片段;Performing face detection on the surveillance video clip according to the target face image, and extracting from the surveillance video clip a video sub-segment that does not detect a face that matches the target face image;
    获取各所述视频子片段的片段时长,将所述片段时长与预设缺失阈值进行比较;Acquiring the segment duration of each of the video sub segments, and comparing the segment duration with a preset missing threshold;
    将所述片段时长大于所述预设缺失阈值的第一视频子片段从所述视频片段中删除,得到服务子视频。The first video sub-segment whose segment duration is greater than the preset missing threshold is deleted from the video segment to obtain a service sub-video.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述从所述第一视频图像集合中提取目标对象图像,包括:18. The computer-readable storage medium of claim 17, wherein said extracting a target object image from said first video image collection comprises:
    从所述第一视频图像集合中抽取第一图像帧,检测各所述第一图像帧中的人像个数;Extracting first image frames from the first video image set, and detecting the number of portraits in each of the first image frames;
    从所述第一图像帧中提取所述人像个数大于1个的多人图像帧;Extracting, from the first image frame, a multi-person image frame in which the number of portraits is greater than one;
    根据服务人脸库中预存的服务人脸图像对所述多人图像帧进行人脸检测,检测所述多人图像帧中是否存在与所述服务人脸图像均不匹配的人脸图像;Performing face detection on the multi-person image frame according to the service face image pre-stored in the service face database, and detecting whether there is a face image in the multi-person image frame that does not match the service face image;
    若检测出存在与所述服务人脸图像均不匹配的人脸图像,则将对应的所述多人图像帧中与所述目标人脸图像匹配的面部图像提取为目标对象图像。If it is detected that there is a face image that does not match the service face image, the face image that matches the target face image in the corresponding multi-person image frame is extracted as a target object image.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述从所述服务子视频中提取包含服务对象的第二视频图像,包括:18. The computer-readable storage medium of claim 18, wherein the extracting a second video image containing a service object from the service sub-video comprises:
    获取所述片段时长未超过所述预设缺失阈值的第二视频子片段,从所述第二视频子片段中提取出与所述服务人脸图像不匹配的第一面部图像;Acquiring a second video sub-segment whose segment duration does not exceed the preset missing threshold, and extracting a first facial image that does not match the service face image from the second video sub-segment;
    从所述多人图像帧中提取出与所述服务人脸图像不匹配的第二面部图像;Extracting a second facial image that does not match the service facial image from the multi-person image frame;
    根据所述第一面部图像和所述第二面部图像得到第二视频图像。Obtain a second video image according to the first facial image and the second facial image.
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述分别对所述服务图像集合中的各集合图像进行微表情分析,得到与所述各集合图像匹配的预设微表情,包括:18. The computer-readable storage medium according to claim 19, wherein said respectively performing micro-expression analysis on each collection image in said service image collection to obtain preset micro-expression matching with said each collection image comprises:
    从各所述集合图像中提取面部特征点,根据所述面部特征点计算面部动作特征;Extracting facial feature points from each of the set images, and calculating facial motion features based on the facial feature points;
    将所述面部动作特征输入微表情分析模型得到各预设微表情的匹配概率值;Inputting the facial motion features into the micro-expression analysis model to obtain the matching probability value of each preset micro-expression;
    根据所述匹配概率值选取与所述集合图像匹配的预设微表情。According to the matching probability value, a preset micro-expression matching the collection image is selected.
PCT/CN2020/087694 2019-07-04 2020-04-29 Video processing method and apparatus, computer device and storage medium WO2021000644A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910599356.4 2019-07-04
CN201910599356.4A CN110458008A (en) 2019-07-04 2019-07-04 Method for processing video frequency, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021000644A1 true WO2021000644A1 (en) 2021-01-07

Family

ID=68482120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087694 WO2021000644A1 (en) 2019-07-04 2020-04-29 Video processing method and apparatus, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110458008A (en)
WO (1) WO2021000644A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818960A (en) * 2021-03-25 2021-05-18 平安科技(深圳)有限公司 Waiting time processing method, device, equipment and medium based on face recognition
CN113873191A (en) * 2021-10-12 2021-12-31 苏州万店掌软件技术有限公司 Video backtracking method, device and system based on voice
CN113925511A (en) * 2021-11-08 2022-01-14 北京九州安华信息安全技术有限公司 Muscle nerve vibration time-frequency image processing method and device
CN114445896A (en) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 Method and device for evaluating confidence degree of human statement content in video
CN114866843A (en) * 2022-05-06 2022-08-05 杭州登虹科技有限公司 Video data encryption system for network video monitoring
CN115512427A (en) * 2022-11-04 2022-12-23 北京城建设计发展集团股份有限公司 User face registration method and system combined with matched biopsy

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458008A (en) * 2019-07-04 2019-11-15 深圳壹账通智能科技有限公司 Method for processing video frequency, device, computer equipment and storage medium
CN112052357B (en) * 2020-04-15 2022-04-01 上海摩象网络科技有限公司 Video clip marking method and device and handheld camera
CN113642357A (en) * 2020-04-27 2021-11-12 阿里巴巴集团控股有限公司 Monitoring method, system, device, storage medium and processor
CN111935453A (en) * 2020-07-27 2020-11-13 浙江大华技术股份有限公司 Learning supervision method and device, electronic equipment and storage medium
CN112017339A (en) * 2020-09-24 2020-12-01 柳州柳工挖掘机有限公司 Excavator control system
CN113392271A (en) * 2021-05-25 2021-09-14 珠海格力电器股份有限公司 Cat eye data processing method, module, electronic device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665147A (en) * 2018-04-18 2018-10-16 深圳市云领天下科技有限公司 A kind of method and device of children education credit early warning
CN109168052A (en) * 2018-10-31 2019-01-08 杭州比智科技有限公司 The determination method, apparatus and calculating equipment of service satisfaction
CN109190601A (en) * 2018-10-19 2019-01-11 银河水滴科技(北京)有限公司 Recongnition of objects method and device under a kind of monitoring scene
CN109858949A (en) * 2018-12-26 2019-06-07 秒针信息技术有限公司 A kind of customer satisfaction appraisal procedure and assessment system based on monitoring camera
CN110458008A (en) * 2019-07-04 2019-11-15 深圳壹账通智能科技有限公司 Method for processing video frequency, device, computer equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830633A (en) * 2018-04-26 2018-11-16 华慧视科技(天津)有限公司 A kind of friendly service evaluation method based on smiling face's detection
CN109766766A (en) * 2018-12-18 2019-05-17 深圳壹账通智能科技有限公司 Employee work condition monitoring method, device, computer equipment and storage medium
CN109766770A (en) * 2018-12-18 2019-05-17 深圳壹账通智能科技有限公司 QoS evaluating method, device, computer equipment and storage medium
CN109871751A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Attitude appraisal procedure, device and storage medium based on facial expression recognition
CN109829388A (en) * 2019-01-07 2019-05-31 平安科技(深圳)有限公司 Video data handling procedure, device and computer equipment based on micro- expression
CN109766859B (en) * 2019-01-17 2023-12-19 平安科技(深圳)有限公司 Campus monitoring method, device, equipment and storage medium based on micro-expressions
CN109886111A (en) * 2019-01-17 2019-06-14 深圳壹账通智能科技有限公司 Match monitoring method, device, computer equipment and storage medium based on micro- expression
CN109858410A (en) * 2019-01-18 2019-06-07 深圳壹账通智能科技有限公司 Service evaluation method, apparatus, equipment and storage medium based on Expression analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665147A (en) * 2018-04-18 2018-10-16 深圳市云领天下科技有限公司 A kind of method and device of children education credit early warning
CN109190601A (en) * 2018-10-19 2019-01-11 银河水滴科技(北京)有限公司 Recongnition of objects method and device under a kind of monitoring scene
CN109168052A (en) * 2018-10-31 2019-01-08 杭州比智科技有限公司 The determination method, apparatus and calculating equipment of service satisfaction
CN109858949A (en) * 2018-12-26 2019-06-07 秒针信息技术有限公司 A kind of customer satisfaction appraisal procedure and assessment system based on monitoring camera
CN110458008A (en) * 2019-07-04 2019-11-15 深圳壹账通智能科技有限公司 Method for processing video frequency, device, computer equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818960A (en) * 2021-03-25 2021-05-18 平安科技(深圳)有限公司 Waiting time processing method, device, equipment and medium based on face recognition
CN112818960B (en) * 2021-03-25 2023-09-05 平安科技(深圳)有限公司 Waiting time processing method, device, equipment and medium based on face recognition
CN113873191A (en) * 2021-10-12 2021-12-31 苏州万店掌软件技术有限公司 Video backtracking method, device and system based on voice
CN113873191B (en) * 2021-10-12 2023-11-28 苏州万店掌软件技术有限公司 Video backtracking method, device and system based on voice
CN113925511A (en) * 2021-11-08 2022-01-14 北京九州安华信息安全技术有限公司 Muscle nerve vibration time-frequency image processing method and device
CN114445896A (en) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 Method and device for evaluating confidence degree of human statement content in video
CN114445896B (en) * 2022-01-28 2024-04-05 北京百度网讯科技有限公司 Method and device for evaluating confidence of content of person statement in video
CN114866843A (en) * 2022-05-06 2022-08-05 杭州登虹科技有限公司 Video data encryption system for network video monitoring
CN114866843B (en) * 2022-05-06 2023-08-11 杭州登虹科技有限公司 Video data encryption system for network video monitoring
CN115512427A (en) * 2022-11-04 2022-12-23 北京城建设计发展集团股份有限公司 User face registration method and system combined with matched biopsy

Also Published As

Publication number Publication date
CN110458008A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
WO2021000644A1 (en) Video processing method and apparatus, computer device and storage medium
US11380119B2 (en) Pose-aligned networks for deep attribute modeling
CN108733819B (en) Personnel archive establishing method and device
US9542419B1 (en) Computer-implemented method for performing similarity searches
WO2020140665A1 (en) Method and apparatus for quality detection of double-recorded video, and computer device and storage medium
US9367756B2 (en) Selection of representative images
US9171012B2 (en) Facial image search system and facial image search method
Abd El Meguid et al. Fully automated recognition of spontaneous facial expressions in videos using random forest classifiers
CN109359548A (en) Plurality of human faces identifies monitoring method and device, electronic equipment and storage medium
CN105917305B (en) Filtering and shutter shooting based on image emotion content
WO2015101289A1 (en) Image management method, apparatus and system
WO2018040306A1 (en) Method for detecting frequent passers-by in monitoring video
KR20070105074A (en) Method of managing image in a mobile communication terminal
JP2016200969A (en) Image processing apparatus, image processing method, and program
Shanmugavadivu et al. Rapid face detection and annotation with loosely face geometry
WO2015131571A1 (en) Method and terminal for implementing image sequencing
Xue et al. Composite sketch recognition using multi-scale HOG features and semantic attributes
WO2016054918A1 (en) Method, device and storage medium for image processing
CN110866418A (en) Image base generation method, device, equipment, system and storage medium
Shakyawar et al. Eigenface method through through facial expression recognition
Abate et al. Biometric face recognition based on landmark dynamics
Prabhu et al. Real time multimodal emotion recognition system using facial landmarks and hand over face gestures
CN113449560A (en) Technology for comparing human faces based on dynamic portrait library
Alattab et al. Efficient method of visual feature extraction for facial image detection and retrieval
Entezami et al. Automatic Portrait Image Selection for Smart Phones

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20835076

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20835076

Country of ref document: EP

Kind code of ref document: A1