WO2021159609A1

WO2021159609A1 - Video lag identification method and apparatus, and terminal device

Info

Publication number: WO2021159609A1
Application number: PCT/CN2020/087381
Authority: WO
Inventors: 胡甜敏
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2020-02-11
Filing date: 2020-04-28
Publication date: 2021-08-19
Also published as: CN111339842A

Abstract

The present application is suitable for the technical field of video processing and provides a video lag identification method and apparatus, and a terminal device. The method comprises: when a monitoring state of a video is a first state and a face exists in the video, modifying the monitoring state into a second state; if the monitoring state is the second state, sampling the video to obtain N first image frames; if an image difference degree among the N first image frames is smaller than a first difference threshold, taking the earliest sampling moment as a lag starting moment, and setting the monitoring state as a third state; if the monitoring state is the third state, sampling the video to obtain M second image frames; and if an image difference degree among the M second image frames is greater than or equal to the first difference threshold, taking the latest sampling moment as a lag termination moment of the video, setting the monitoring state as the first state, and identifying a lag time period of the video. According to the embodiment of the present application, precise identification on the start and end of lag is achieved, and precise identification of a lag time period is achieved.

Description

Video jam recognition method, device and terminal equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 11, 2020, the application number is 202010087225.0, and the invention title is "a method, device and terminal equipment for video jam identification", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

This application belongs to the technical field related to computer vision in artificial intelligence, and in particular relates to a video jam recognition method and terminal equipment.

Background technique

With the continuous advancement of technology, video calls over the Internet (hereinafter referred to as calls) have become a common scene in life and work. The effect of real-time calls will be affected by real-time network status, video equipment status, and video server resources, etc. , As long as there is a problem in any one of them, it may cause the video to freeze during the call. The inventor found that in order to improve the stuttering situation, it is first necessary to determine when the stuttering occurred during the call, and then analyze the various links during the stuttering to locate the cause of the stuttering, so as to accurately improve the video stuttering situation.

In the prior art, the tester manually consults the recorded video files of the call, and determines the period of freezing in the video files, but the efficiency of such identification is extremely low.

technical problem

In view of this, the embodiments of the present application provide a method and terminal device for identifying video freezes, which can solve the problem of low efficiency in identifying video call freezes.

Technical solutions

The first aspect of the embodiments of the present application provides a method for identifying video freezes, including:

When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the first state. Two states

If the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and the N first frame images are compared, where N is a positive value greater than 1. Integer

If the comparison result is that the degree of image difference between N images of the first frame is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each of the first frame images is used as the video card Stop the start time, and set the monitoring state to the third state;

If the monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive value greater than 1. Integer

If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.

A second aspect of the embodiments of the present application provides a video freeze identification device, including:

The face detection module is used to perform face detection on the video when the monitoring state corresponding to the video during the real-time call is the first state, and when a face is detected in the video, the video The corresponding monitoring state is modified to the second state;

The first image comparison module is configured to, if the monitoring state is the second state, sample the video at the first frequency to obtain N first frame images, and compare the N first frame images , Where N is a positive integer greater than 1;

The stutter start recognition module is configured to, if the comparison result is that the image difference between N images of the first frame is less than the first difference threshold, the earliest sampling of the sampling moments corresponding to each of the first frame images Time, as the start time of the video freeze, and set the monitoring state to the third state;

The second image comparison module is configured to, if the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images , Where M is a positive integer greater than 1;

The stutter end recognition module is configured to, if the comparison result is that the image difference between M images of the second frame is greater than or equal to the first difference threshold, compare the sampling time corresponding to each of the second frame images The latest sampling time in the video is used as the stop time of the video, the monitoring state is set to the first state, and the card of the video is identified based on the start time of the stop and the stop time of the stop. Time period.

A third aspect of the embodiments of the present application provides a terminal device. The terminal device includes a memory and a processor. The memory stores a computer program that can run on the processor. The processor executes the The computer program implements the steps of the video freeze recognition method as described in any one of the first aspect above, which includes: when the monitoring state corresponding to the video during the real-time call is the first state, performing facial recognition on the video Detect, and when a face is detected in the video, modify the monitoring state corresponding to the video to the second state; if the monitoring state is the second state, sample the video at the first frequency to obtain N first-frame images, and N first-frame images are compared, where N is a positive integer greater than 1, if the comparison result is the image difference between N first-frame images Is less than the first difference threshold, the earliest sampling moment among the sampling moments corresponding to each of the first frame images is used as the starting moment of the video freeze, and the monitoring state is set to the third state; The monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive integer greater than 1; If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, including: a computer program is stored, wherein the computer program is executed by a processor to implement the video card described in any one of the above-mentioned first aspects. The steps of the instant recognition method include: when the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, perform face detection on the video. The monitoring state corresponding to the video is modified to the second state; if the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and N first frame images are obtained. Perform comparison, where N is a positive integer greater than 1. If the comparison result is that the image difference between N images of the first frame is less than the first difference threshold, compare each of the first frame images to The earliest sampling moment among the sampling moments is used as the start moment of the video freeze, and the monitoring state is set to the third state; if the monitoring state is the third state, the video is performed at the second frequency M images of the second frame are sampled, and M images of the second frame are compared, where M is a positive integer greater than 1; if the comparison result is an image between the M images of the second frame The degree of difference is greater than or equal to the first difference threshold, the latest sampling moment among the sampling moments corresponding to each of the second frame images is used as the stop time of the video, and the monitoring state is set to the first A state, and identify the freeze time period of the video based on the freeze start time and the freeze end time.

The fifth aspect of the embodiments of the present application provides a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the video freeze identification method described in any one of the above-mentioned first aspects.

Beneficial effect

Compared with the prior art, the embodiment of the application has the beneficial effects that: on the one hand, the embodiment of the application realizes the efficient and accurate identification of the start and end of the jam; on the other hand, it sets different monitoring states for different stages of the jam. , So as to realize the effective distinguishing processing for each freeze, and ensure the accuracy of the recognition of each freeze.

Description of the drawings

FIG. 1 is a schematic diagram of the implementation process of the method for identifying video freezes provided by Embodiment 1 of the present application;

2 is a schematic diagram of the implementation process of the method for identifying video freezes provided in the second embodiment of the present application;

FIG. 3 is a schematic diagram of the implementation process of the method for identifying video freezes provided in the third embodiment of the present application;

4 is a schematic diagram of the implementation process of the method for identifying video freezes provided by the fourth embodiment of the present application;

FIG. 5 is a schematic diagram of the implementation process of the method for identifying video freezes provided by Embodiment 5 of the present application;

FIG. 6 is a schematic diagram of the implementation process of the method for identifying video freezes provided by the sixth embodiment of the present application;

FIG. 7 is a schematic structural diagram of a video freeze identification device provided by Embodiment 7 of the present application;

FIG. 8 is a schematic diagram of a terminal device provided in Embodiment 8 of the present application.

Embodiments of the present invention

In order to facilitate the understanding of the application, the embodiments of the application are briefly explained here. Since the effect of real-time calls will be affected by all links of the call network, any problem in any link may cause the video of the call to freeze. In order to accurately identify the time period when the video is stuck, technicians are now manually reviewing the recorded video after the call. However, on the one hand, the cost of reviewing and positioning in this way is high and the efficiency is extremely low, and it cannot adapt to more and more Analyze the demand for quantity, cost, and efficiency of video stuttering. On the other hand, in some special scenes that require high call fluency, such as the scene of bank loan video face-to-face review, stuttering can be recognized after the call is over. It provides some analytical guarantees for subsequent calls, but it does not have much practical significance for the current call.

In order to improve the recognition of video freezes during a call, in this embodiment of the application, the first state monitoring state is set for the video during the call in advance, and then the video generated during the real-time call is monitored in real time. When the video is found When there is a face that needs to be subjected to a freeze analysis, set the monitoring state of the video to the second state, and when the video is in the first state, the frame image of the video is sampled, and it is recognized whether there are continuous multiple frame images unchanged. If it indicates that the video has started to freeze, and the monitoring status of the video is marked as the third state at the same time. After the video has started to freeze, continue frame image sampling and comparison. If there is a large difference between consecutive frames , Which means that the video has resumed and changed. At this time, the corresponding stutter end time can be obtained. At this time, according to the recorded stutter start time and stutter end time, you can identify the time period of the video corresponding to the stutter, and finally change the video The monitoring status of is changed to the first status, thus ending the current stutter recognition of the video. On the one hand, the embodiment of the application realizes the accurate identification of the start and end of the stuttering. On the other hand, by setting different monitoring states for different recognition stages and stuttering stages, the effective distinguishing and processing of each stuttering can be realized and guaranteed. This improves the accuracy of the recognition of each stutter, so that the embodiment of the present application can realize continuous stutter recognition of the video.

At the same time, the execution subject of the video freeze identification method in the embodiments of this application is a terminal device with a certain video processing capability. A certain video processing capability refers to the ability to extract and compare frames from the video. The terminal The specific equipment types of the equipment are not limited here, and can be selected by the technicians according to the actual needs of the scene, including but not limited to terminal equipment for video calls, such as mobile phones, computers, etc., and can also be connected to the terminal equipment for video calls. Third-party equipment, such as servers, etc.

The embodiments of this application are described in detail as follows:

Fig. 1 shows a flow chart of the method for identifying video freezes provided in the first embodiment of the present application, and the details are as follows:

S101: When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the second state.

In the embodiment of this application, the monitoring state is used to mark the freeze stage of the video in real time, where the monitoring state includes the first state, the second state, and the third state, which respectively correspond to the unknown stage of the video freeze and the start of the freeze. Phase and the end of the stall. At the same time, the embodiment of this application will design different identification strategies according to the characteristics or actual needs of each stall stage to realize the accurate identification of the beginning and end of the stall. In the embodiment of this application, it is wrong. The specific marking method of the monitoring state can be limited and can be set by the technicians. For example, the monitoring state can be marked by adding different marks to the video, such as setting the marks corresponding to the first state, the second state, and the third state. The numbers are 1, 2 and 3. At this time, you only need to add/modify different digital identities to the video to realize the flexible setting of the video surveillance status. In this embodiment of the application, all real-time call videos are marked as the first state by default, so as to ensure the accurate selection of the video freeze state and the freeze identification strategy at the beginning of the call.

In the embodiment of this application, the monitoring status of the call video will be detected in real time during the call. Considering the actual application, the video call is meaningful only when there are users on both sides of the video call (when there is no user, even if there is no user There is no need for freeze analysis and optimization). Therefore, in order to improve the effectiveness of the freeze analysis, when detecting that the video is in the first state, the embodiment of the application will simultaneously start real-time detection of whether a human face appears in the video, and Only when there is a human face, the monitoring state is switched to the second state, and the stuttering starts to be recognized.

S102: If the monitoring state is the second state, sample the video at the first frequency to obtain N first frame images, and compare the N first frame images, where N is a positive integer greater than 1.

When the monitoring state is in the second state, it means that the real-time video may be stuck at this stage. Therefore, the embodiment of this application will start to analyze whether the video stuck, specifically:

In order to identify whether the video starts to freeze, this embodiment of the application will sample the frame images of the video at the first frequency to obtain the first frame of image to be analyzed. At the same time, considering that the freeze is not necessarily full of the entire call process. It will start at a certain time during the call. Therefore, in the embodiment of the present application, each time a new first frame image is sampled, only the newly sampled N first frame images will be compared with each other to achieve Accurate identification of the start of the freeze, for example, suppose that the first frequency is 1 frame/second, N=5, and the time of sampling is 00:00, in a total of one minute from 00:00 to 01:00, theoretically every A new first frame image can be collected every second, and a total of 60 first frame images are collected at 01:00. At this time, every time a new frame image is collected, this application is implemented Examples will obtain the latest 5 first frame images for comparison. For example, when the 60th first frame image is collected at 01:00, a total of 5 first frame images from the 56th to the 60th frame will be compared. Yes, to ensure that the recognition at the beginning of the freeze will not be affected by historical images that are too far apart to ensure the accuracy of the recognition.

Among them, the embodiment of the present application does not limit the specific values of the first frequency and N, which can be set by technicians according to actual needs. For example, the first frequency can be set to 1 to 5 frames per second, and N is set to 5 to 5 frames per second. 12. Or refer to the actual application to determine whether it is stuck or not to set the conversion relationship between the first frequency and N, and set any specific value of the first frequency and N, and then according to the conversion relationship and the known specific value To calculate another value, for example, if there is no change in the video picture within 5 seconds, the video is considered to be in a stuck state. At this time, it can be seen that the conversion relationship is N = 5 seconds × the first frequency. At this time, if the first frequency is set It is 1 time/second, and N=5 can be calculated according to the conversion relationship. The length of time for determining whether it is stuck can also be set by the technician according to the needs of the actual scene, and it is not limited here.

At the same time, the embodiment of the present application does not limit the image comparison method between the first frame images, and can be selected or set by the technician according to actual needs, including but not limited to, calculating all phases in N first frame images. The image Euclidean distance between adjacent images, and the mean or maximum Euclidean distance as the corresponding image difference degree, or random image pair combination of N first frame images, and the first frame image in each image pair. To perform cross-correlation calculation, and then calculate the corresponding difference degree based on the cross-correlation result, you can also refer to the relevant descriptions in Embodiments 2 to 6 of the present application.

S103: If the comparison result is that the image difference degree between the N first frame images is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each first frame image is used as the start time of the video freeze , And set the monitoring state to the third state.

When the image comparison result is that the image difference between N first frame images is less than the first difference threshold, it means that the video has lasted for a period of time and there is no or almost no picture change. At this time, the embodiment of the application will directly determine that the video has started. Stalling, and at the same time, the earliest sampling time of the N first frame images processed this time is taken as the specific moment of the start of the stutter, so as to realize the accurate recognition of the start of the stutter and the precise positioning of the start of the stutter. Wherein, the specific value of the first difference threshold can be selected or set by the technician according to the requirements of the actual scene, for example, it can be set to 5% to 15%.

Taking into account that stuttering generally lasts for a period of time, if S102 and S103 are used to continue processing during this period, a large number of stuttering start moments will occur, resulting in abnormal stutter recognition. Therefore, the embodiment of the application is When determining the start time of the freeze, the monitoring state of the video is also modified to the third state. Since the monitoring state is no longer the second state, the recognition operation of the start of the video freeze in S102 and S103 is terminated, and at the same time This enables the subsequent recognition operation for the end of the freeze to be turned on, thereby ensuring the normal recognition of the start and end of the video freeze.

S104: If the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images, where M is a positive integer greater than 1.

When the monitoring state is in the third state, it means that the real-time video is in a freeze at this stage, and the video may return to normal at any time, that is, the freeze may be terminated at any time. Therefore, the embodiment of this application will start to determine whether the video freeze is terminated. Analysis, the principle of S104 sampling and image comparison is basically the same as that of S102. For details, please refer to the description of S102, which will not be repeated here, but it should be noted that the second frequency and the first frequency may be the same or different. At the same time, M and N can be the same or different, and the specific values of these parameters can be selected and set by the technicians according to actual needs, and they are not limited here.

As a specific embodiment of the present application, considering that in actual situations, even if there is no freeze, there may be situations where the similarity of consecutive multiple frames of images is high. For example, when both video parties are thinking about a certain problem, a short paragraph It may happen that both parties are basically not moving within a period of time. Therefore, in order to improve the accuracy of the recognition of the freeze start, in the embodiment of this application, the first frequency>the second frequency, and M>N are set to ensure that the video is in progress. The amount of sampled data when the freeze starts to be recognized.

S105: If the comparison result is that the image difference degree between the M second frame images is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each second frame image is used as the video freeze At the end time, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.

When the image comparison result is that the difference between the M images of the second frame is large, it means that the video picture has returned to normal. At this time, the embodiment of this application will directly determine that the video freeze is over, and at the same time, the M The latest sampling time among all the sampling moments corresponding to the second frame of image is used as the specific time when the freeze ends, and the precise positioning of the start and end time of the freeze is realized, from the start time of the freeze to the end of the freeze The time period between the end moments is the freeze time period of this video. At the same time, the embodiment of the present application will also restore the monitoring state of the video to the first state. At this time, the embodiment of the present application will return to perform the operation of S101 to identify the next video freeze. Therefore, the embodiment of the present application For a while, it will continue to loop during the video call, and it will not be terminated until the end of the call, so as to achieve accurate identification of all jams during the entire call.

As an embodiment of the present application, after obtaining the stall time period, the stall time period can also be sent to a third-party device for subsequent stall analysis and improvement operations, for example, can be sent to a specific server , The server performs operations such as jam analysis and optimization.

In order to improve the recognition of video freezes during a call, in this embodiment of the application, the first state monitoring state is set for the video during the call in advance, and then the video generated during the real-time call is monitored in real time. When the video is found When there is a face that needs to be subjected to a freeze analysis, set the monitoring state of the video to the second state, and when the video is in the first state, the frame image of the video is sampled, and it is recognized whether there are continuous multiple frame images unchanged. If it indicates that the video has started to freeze, and the monitoring status of the video is marked as the third state at the same time. After the video has started to freeze, continue frame image sampling and comparison. If there is a large difference between consecutive frames , Which means that the video has resumed and changed. At this time, the corresponding stutter end time can be obtained. At this time, according to the recorded stutter start time and stutter end time, you can identify the time period of the video corresponding to the stutter, and finally change the video The monitoring status of is changed to the first status, thus ending the current stutter recognition of the video.

On the one hand, the embodiments of the present application realize the real-time and accurate identification of the start and end moments of each freeze, and can determine the time period during which the current freeze occurs when the freeze ends, which requires high call fluency. In terms of scenarios, on the basis of the embodiments of this application, only a solution that can analyze and optimize the cause of stuttering in real time can realize the real-time stuttering optimization of video calls, and ensure the real-time fluency of the call. As far as the time period of freezing can be found only after the video call is over, the quality of the video call can be greatly improved.

On the other hand, since there may be multiple stalls in a video call, and the duration of each stall cannot be predicted, different monitoring states are set for different stall stages to achieve the start and stop of each stall. The effective distinction processing of termination allows the embodiment of the present application to quickly enter the next freeze recognition after each stutter recognition is completed, and there will be no mutual interference between the two recognitions, which ensures the recognition of each stutter The accuracy of this allows the embodiment of the present application to realize continuous stutter recognition of the video.

As a specific implementation of image comparison in the first embodiment of the present application, considering that the focus of video calls in practical applications is often on the user's face, and the core of the face is the activity of the various organs of the face, so The embodiment of the application does not perform comparison processing on the entire frame image when performing image comparison, but only analyzes and compares the facial organs therein, so as to improve the efficiency of comparison, as shown in FIG. 2, The steps of image comparison in the second embodiment of the present application include:

S201: Perform face organ coordinate analysis for each first frame image to obtain N first face organ coordinate sets.

In the embodiment of the present application, the facial organs of each first frame image are recognized, and the coordinates of each facial organ in the first frame image are extracted, so as to obtain the person corresponding to each first frame image. Face organ coordinate set, where the specific number of recognized face organ types can be set by the technician, including but not limited to any one or more of the mouth, eyes, eyebrows and nose. At the same time, the specific use The face organ recognition method is also not limited here, and can be selected or designed by the technicians according to actual needs, including but not limited to face organ recognition based on geometric features, neural network models, and elastic models. You can also refer to this Application examples three to six.

S202: Use the N first face organ coordinate sets to compare the N first frame images.

After obtaining the coordinate sets of the first face organs corresponding to each of the first frame images, the coordinate sets are compared to obtain the difference degree between the N first face organ coordinate sets, and the difference degree As the image difference degree between the N first frame images, the image comparison in Embodiment 1 of the present application is realized. The specific coordinate set data comparison method is not limited here, and can be selected or set by the technician. Setting, including but not limited to calculating the Euclidean distance between the coordinate sets, and calculating the corresponding difference degree based on the Euclidean distance.

As a specific implementation method for extracting the coordinate set of face organs in the second embodiment of this application, in this embodiment of the application, each face organ is taken as an independent analysis object for analysis and coordinate extraction, and each person is obtained The coordinate group corresponding to the face organ, that is, the first face organ coordinate set is a collection of multiple coordinate groups, as shown in FIG. The analysis operations include:

S301: Use the first frame of image to be analyzed as a target image, and perform face contour drawing on the target image to obtain a corresponding face contour graphic.

Considering that the distribution position of each face organ in the face is relatively fixed in the actual situation, for example, the position of the mouth in the face is roughly 1/4 of the length of the face and 1/2 of the width of the face, and the nose is the face of the face. It is 1/2 of the length and 1/2 of the width. Therefore, in order to improve the positioning efficiency of the facial organs, the embodiment of the present application will pre-set the relative positions of various facial organs in the face, and will perform the calculation on the target image. Face recognition is to locate the human face in the target image, and then draw the outline of the human face, so as to obtain the corresponding human face contour graphics for the subsequent rough positioning of the human face organs.

S302: Obtain a first relative position of each face organ in the face contour graph.

After drawing the face contour graphic, according to the pre-stored relative positions, the first relative position of each face organ in the face contour graphic can be determined, so as to realize the coarse positioning of the face organ.

As a specific implementation for obtaining the first relative position of the face organ in the face contour graph in the third embodiment of the present application, although the approximate position of the face organ in the face can be known, different users are considered There may be certain differences in their faces, resulting in differences in the actual face shapes and positions of face organs in the faces of different users. In order to improve the accuracy of rough positioning of face organs, the fourth embodiment of the present application will pre-correct Some face shapes in real life and the distribution of face organs under various face shapes are analyzed, and multiple sample contour graphics of different face shapes are drawn according to the analysis, and the facial organ corresponding to each sample contour graphic is determined According to the relative position data of the samples, the relative position recognition of the actual facial organs is performed according to these sample contour graphics. As shown in FIG. 4, the step of obtaining the first relative position in the fourth embodiment of the present application specifically includes:

S401: Perform graphic matching of a face contour graphic with a plurality of sample contour images in a face contour library.

In the embodiment of the present application, a plurality of drawn sample contour graphics will be stored in a face contour library in advance, and after the face contour graphics are drawn, the face contour library will be matched with graphics to filter out Appropriate sample contour graphics.

S402: If the matching is successful, obtain a relative position set corresponding to the sample contour graphic that is successfully matched, and the relative position set includes the second relative position of each human face organ in the sample contour graphic. The second relative position corresponding to each face organ in the relative position set is taken as the first relative position of each face organ in the face contour graph.

If there is a successful match between the sample contour graphic and the facial contour graphic, it means that the drawn facial contour graphic is similar to the facial contour of the sample contour graphic. In this case, the embodiment of the application will directly read the relative position data of the facial organ corresponding to the sample contour graphic. It is directly used as the relative position of the face organs of the face contour figure in the third embodiment of the present application.

As an embodiment of the present application, on the basis of the fourth embodiment of the present application, considering that the number of pre-stored sample contour graphics in actual situations is generally limited, sometimes the pre-pattern matching fails, in order to prevent the matching failure. The accurate relative position of the face organs in the target image can be obtained. In the embodiment of the present application, a default relative position set is preset, in which the relative position data corresponding to each face organ is stored, and when the matching fails, Just read the default relative position set.

S303. Use the first relative position to locate the face organs in the target image, obtain the organ center coordinates of each face organ in the target image, and identify that each face organ is in the target image based on the multiple organ center coordinates. The corresponding first image area.

After obtaining the relative position of the face organs, the embodiment of the present application will determine the coordinates of each face in the target image according to the actual position of the face in the target image and the relative position of the face organ in the face. Position, and use this position as the corresponding organ center coordinate to achieve rough positioning in the target image.

Because the face is in different shooting environments, the obtained face images will be different, and even the position of the face organs under different expressions will change a little. For example, when the mouth is curled, the coordinates of the mouth will be changed. There are certain changes, so the coarse positioning coordinates cannot represent the facial organs, and may not even be in the regional image of the facial organs. Therefore, after the organ center coordinates are obtained, the embodiment of the present application will use the organ center coordinates as the starting point to perform organ recognition on the surrounding image area to achieve precise positioning of various facial organs, for example, using the organ center coordinates corresponding to the mouth As a starting point, mouth recognition is performed on a piece of image area around the coordinates, so as to determine the first image area corresponding to the actual mouth, so as to achieve precise positioning of the mouth.

As a specific implementation of accurate positioning of face organs in the fourth embodiment of the present application, considering the actual situation, even if the organ center coordinates of a certain face organ are already known, if there is no specific retrieval range, it is still It takes more attempts to determine the appropriate surrounding search range and identify the face organs in it, which requires more computer resources and is less efficient. Therefore, in order to improve the search efficiency of the face organs, the human face Accurate and rapid positioning of facial organs, as shown in FIG. 5, the steps of precise positioning of facial organs in Embodiment 5 of the present application specifically include:

S501: Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the organ center coordinates, and the shape of the second image area is a rectangle.

Considering the actual situation, the approximate proportion of face organs in the face is relatively fixed. For example, the height of the nose is generally about 1/3 of the length of the face, and the width is about 1/5 of the width of the face. In the case of the center coordinates, combined with the size data of the face and the ratio data of the face organ to the face size, the image area where the face organ is roughly located in the face can be quickly located. Based on the above principle, in the embodiment of the application, the ratio data of each face organ to the face size will be set in advance, and then each face organ will be determined according to the actual face size in the target image and the set ratio data. Corresponding retrieval matrix size, such as still taking the example of the above nose as an example, assuming that the face length in the target image is 10cm and the width is 7cm, and the width is calculated based on the height being 1/3 of the face length and the width of the face being 1/5, that is It can be determined that the size of the corresponding retrieval matrix is 3.33cm×1.4cm.

After determining the size of the retrieval matrix, take the organ center coordinates of the face organs as the center point of the matrix, and determine the second image area whose length and width are the size of the retrieval matrix in the target image to obtain the corresponding position of each face organ. The approximate area.

S502: Perform face organ detection on each second image area, and identify the first image area corresponding to the face organ from the second image area according to the detection result.

After the second image area to be retrieved is determined, the corresponding facial organ search in the second image area can be performed to achieve precise positioning of the facial organs. For example, in the nose example above, after determining the corresponding nose After the second image area with a size of 3.33cm×1.4cm, the embodiment of the present application will perform a nose recognition search on the second image area to determine the nose contained therein and the actual first image area of the nose in the second image area. An image area.

S304: Perform coordinate extraction on each first image area to obtain coordinate groups corresponding to each human face organ in the target image.

After accurately locating the first image area where each face organ is located, the embodiment of the present application will extract the coordinates of the first image area. Since the first image area contains more image information, every time after the coordinate extraction is performed Each first image area will correspond to multiple coordinate data. In the embodiment of the present application, all the coordinate data corresponding to a single first image area will be stored in a coordinate group, so as to obtain the coordinates corresponding to the distribution of various facial organs. Group, and then obtain the first face organ coordinate set of the target image required in the second embodiment of the present application.

In the third embodiment of the present application, by first performing coarse positioning of the facial organs, and then quickly performing the facial organ retrieval in the surrounding image area based on the coarse positioning, the rapid and accurate positioning and recognition of the facial organs can be realized, relatively directly on the face. In terms of the identification of various organs in the present application, the identification efficiency of the embodiment of the present application is higher, and the amount of calculation is smaller.

As a specific implementation for extracting the coordinates of the first image area in the fifth embodiment of the present application, considering that in the actual video call process, the frequency of use of different facial organs is different, resulting in a greater frequency of change during the video process. For example, under normal circumstances, people will say a lot in the video process. At this time, the frequency of mouth usage is extremely high, and the frequency of mouth changes in the corresponding video is also extremely high. For the nose, if the call process is If the user’s head stays still in the video, the basic state of the nose and the corresponding coordinates in the video will basically not change. Therefore, the reference value of comparing the first frame of image with different facial organs to determine whether the video is stuck is quite different. In order to improve the effectiveness of coordinate extraction and ensure the accuracy and reliability of subsequent image comparison results, as shown in FIG. 6, the step of extracting the coordinates of the first image region in the sixth embodiment of the present application specifically includes:

S601: Acquire the number of sampling points corresponding to each face organ, where the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced.

In the embodiment of this application, the mouth, eyes, eyebrows, and nose are sorted according to the frequency of use of each face organ during a call, and corresponding sampling points are set for each face organ according to the frequency. The higher the frequency, the greater the number of sampling points, so that the subsequent actual sampling of feature points and coordinate extraction can distinguish the number of extracted coordinates. Among them, the specific number of sampling points can be set by technicians according to actual needs, and is not limited here.

S602. Perform feature point sampling on the first image region corresponding to each face organ, and obtain the coordinates of each feature point to obtain the coordinate group corresponding to each face organ in the target image, and the number of sampling points for the feature point sampling is the face The number of sampling points corresponding to each organ.

After the number of sampling points corresponding to each first image area is determined, the feature point sampling of the first image area is started. For each first image area, only the feature points corresponding to the number of sampling points will be sampled, for example, Assuming that the number of sampling points corresponding to the first image area A is 20, only 20 feature points will be sampled from the first image area A at this time. After sampling the required number of feature points, the coordinate data of these feature points are obtained, thereby obtaining the coordinate group corresponding to each first image area, that is, obtaining the coordinate group of each human face organ in the target image. Among them, the specific feature point sampling method is not limited here, and can be set by the technician according to actual needs, including but not limited to such as SIFT algorithm and Susan algorithm, etc., in order to accurately control the number of extracted sampling points, if the feature is used The point extraction algorithm itself cannot set the number of feature point samples. After the normal feature point extraction is completed, the feature points can be deleted or added until the corresponding number of sampling points is met.

It should be understood that although the above-mentioned Embodiments 2 to 6 of the present application are refinement or optimization solutions for the comparison of the first frame of image, in the embodiment of the present application, it can also be applied to the comparison operation of the second frame of image, namely The above-mentioned Embodiments 2 to 6 of this application can also be combined with S104 of Embodiment 1 of this application for application. In this case, it is only necessary to replace the processed object from the first frame of image to the second frame of image. For details, please refer to the above Note, I won’t repeat it here.

Corresponding to the method of the above embodiment, FIG. 7 shows a structural block diagram of a video freeze identification device provided in an embodiment of the present application. For ease of description, only the parts related to the embodiment of the present application are shown. The video freeze identification device illustrated in FIG. 7 may be the main body of execution of the video freeze identification method provided in the first embodiment.

Referring to Figure 7, the device for identifying video freezes includes:

The face detection module 71 is configured to perform face detection on the video when the monitoring state corresponding to the video during a real-time call is the first state, and when a face is detected in the video, the The monitoring state corresponding to the video is modified to the second state.

The first image comparison module 72 is configured to, if the monitoring state is the second state, sample the video at a first frequency to obtain N first frame images, and compare N first frame images Yes, where N is a positive integer greater than 1.

The stutter start recognition module 73 is configured to, if the comparison result is that the image difference degree between the N first frame images is less than the first difference threshold, the earliest one among the sampling moments corresponding to each of the first frame images The sampling time is used as the start time of the video freeze, and the monitoring state is set to the third state.

The second image comparison module 74 is configured to, if the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images Yes, where M is a positive integer greater than 1.

The stall end recognition module 75 is configured to, if the comparison result is that the image difference degree between the M images of the second frame is greater than or equal to the first difference threshold, sample the corresponding samples of each of the second frame images The latest sampling time among the time, as the stop time of the video, the monitoring state is set to the first state, and the video is identified based on the start time of the stop time and the stop time of the stop time Caton time period.

Further, the first image comparison module 72 includes:

The coordinate analysis module is used to analyze the coordinates of the face organs for each of the first frame images to obtain N first face organ coordinate sets.

The coordinate comparison touch is used to compare the N first frame images by using the N first face organ coordinate sets.

Further, the first face organ coordinate set is a set of multiple coordinate groups, each coordinate group corresponds to a type of face organ, and each coordinate group contains multiple coordinates of the corresponding face organ, and the coordinate analysis Modules, including:

The contour drawing module is used to take the first frame image to be analyzed as a target image, and draw a face contour on the target image to obtain a corresponding face contour graphic.

The position obtaining module is used to obtain the first relative position of each face organ in the face contour graph.

The organ image search module is used for locating the face organs of the target image by using the first relative position to obtain the organ center coordinates of each face organ in the target image, and based on a plurality of the organ centers The coordinates are used to identify the first image area corresponding to each face organ in the target image.

The coordinate extraction module is configured to extract the coordinates of each of the first image regions to obtain the coordinate groups corresponding to each of the facial organs in the target image.

Further, the location acquisition module includes:

Graphically matching the face contour graphics with a plurality of sample contour images in the face contour library.

If the matching is successful, obtain the relative position set corresponding to the sample contour graph that is successfully matched, and the relative position set includes the second relative position of each human face organ in the sample contour graph. The second relative position corresponding to each face organ in the relative position set is taken as the first relative position of each face organ in the face contour graph.

Further, the organ image search module includes:

Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the center coordinates of the organ, and the second image area The shape is rectangular.

The face organ detection is performed on each of the second image areas, and the first image area corresponding to the face organ is identified from the second image area according to the detection result.

Further, the coordinate extraction module includes:

The number of sampling points corresponding to each face organ is acquired, wherein the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced.

Perform feature point sampling on the first image region corresponding to each of the facial organs, and obtain the coordinates of each feature point to obtain the coordinate set corresponding to each facial organ in the target image, and the feature The number of sampling points for point sampling is the number of sampling points corresponding to each of the face organs.

For the process of implementing respective functions of each module in the video freeze identification device provided in the embodiment of the present application, please refer to the description of the first embodiment shown in FIG. 1 for details, which will not be repeated here.

The video freeze recognition method provided by the embodiments of this application can be applied to mobile phones, tablet computers, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, and super mobiles. For terminal devices such as ultra-mobile personal computers (UMPC), netbooks, and personal digital assistants (personal digital assistants, PDAs), the embodiments of this application do not impose any restrictions on the specific types of terminal devices.

For example, the terminal device may be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a wireless local loop (Wireless Local Loop, WLL) station, a personal digital processing (Personal Digital Assistant, PDA) device, Handheld devices with wireless communication functions, computing devices or other processing devices connected to wireless modems, vehicle-mounted devices, car networking terminals, computers, laptop computers, handheld communication devices, handheld computing devices, satellite wireless devices, wireless modems Card, TV set top box (STB), customer premise equipment (CPE) and/or other equipment used for communication on wireless systems and next-generation communication systems, such as mobile terminals in 5G networks Or mobile terminals in the public land mobile network (PLMN) network that will evolve in the future.

As an example and not a limitation, when the terminal device is a wearable device, the wearable device can also be a general term for applying wearable technology to intelligently design daily wear and develop wearable devices, such as glasses, gloves, Watches, clothing and shoes, etc. A wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories. Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction. In a broad sense, wearable smart devices include full-featured, large-sized, complete or partial functions that can be implemented without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, and need to be used in conjunction with other devices such as smart phones. , Such as all kinds of smart bracelets and smart jewelry for physical sign monitoring.

FIG. 8 is a schematic structural diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 8, the terminal device 8 of this embodiment includes: at least one processor 80 (only one is shown in FIG. Computer program 82. When the processor 80 executes the computer program 82, the steps in the foregoing embodiments of the video freeze identification method, such as steps 101 to 105 shown in FIG. 1, are implemented. Alternatively, when the processor 80 executes the computer program 82, the functions of the modules/units in the foregoing device embodiments, for example, the functions of the modules 71 to 75 shown in FIG. 7 are realized.

The terminal device 8 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal device may include, but is not limited to, a processor 80 and a memory 81. Those skilled in the art can understand that FIG. 8 is only an example of the terminal device 8 and does not constitute a limitation on the terminal device 8. It may include more or less components than shown in the figure, or a combination of certain components, or different components. For example, the terminal device may also include an input sending device, a network access device, a bus, and the like.

The so-called processor 80 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The memory 81 may be an internal storage unit of the terminal device 8 in some embodiments, such as a hard disk or memory of the terminal device 8. The memory 81 may also be an external storage device of the terminal device 8, such as a plug-in hard disk equipped on the terminal device 8, a smart memory card (Smart Media Card, SMC), or a Secure Digital (SD). Card, Flash Card, etc. Further, the memory 81 may also include both an internal storage unit of the terminal device 8 and an external storage device. The memory 81 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 81 can also be used to temporarily store data that has been sent or will be sent.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

The embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores a computer program, and the computer When the program is executed by the processor, the steps in each of the above method embodiments can be realized, including: when the monitoring state corresponding to the video in the real-time call is the first state, face detection is performed on the video, and the video is detected. When there is a face in the video, modify the monitoring state corresponding to the video to the second state; if the monitoring state is the second state, sample the video at the first frequency to obtain N first frames Image, and compare N images of the first frame, where N is a positive integer greater than 1, if the comparison result is that the image difference between N images of the first frame is less than the first difference threshold , The earliest sampling moment among the sampling moments corresponding to each of the first frame images is used as the starting moment of the video freeze, and the monitoring state is set to the third state; if the monitoring state is the first In three states, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive integer greater than 1; if the comparison result is The image difference degree between the M second frame images is greater than or equal to the first difference threshold, and the latest sampling time among the sampling times corresponding to each second frame image is used as the video card At the end time of the frame, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.

The embodiments of the present application provide a computer program product. When the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.

If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the present application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (Read-Only Memory, ROM) , Random Access Memory (RAM), electrical carrier signal, telecommunications signal, and software distribution media, etc.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A method for identifying video freezes, which includes:

When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the first state. Two states

If the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and the N first frame images are compared, where N is a positive value greater than 1. Integer

If the comparison result is that the degree of image difference between N images of the first frame is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each of the first frame images is used as the video card Stop the start time, and set the monitoring state to the third state;

If the monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive value greater than 1. Integer

If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
The method for identifying video freezes according to claim 1, wherein said comparing the N images of the first frame comprises:

Perform face organ coordinate analysis for each of the first frame images to obtain N first face organ coordinate sets;

The N first frame images are compared by using the N first face organ coordinate sets.
The video freeze recognition method of claim 2, wherein the first face organ coordinate set is a set of multiple coordinate groups, each coordinate group corresponds to a face organ, and each coordinate group contains Multiple coordinates of the corresponding facial organs;

The performing coordinate analysis of the face organs for each of the first frame images includes:

Taking the first frame of image to be analyzed as a target image, and performing face contour drawing on the target image to obtain a corresponding face contour graphic;

Acquiring the first relative position of each face organ in the face contour graph;

Use the first relative position to locate the face organs of the target image, obtain the organ center coordinates of each face organ in the target image, and identify each face based on the multiple organ center coordinates First image regions corresponding to the organs in the target image;

Coordinate extraction is performed on each of the first image regions to obtain the coordinate group corresponding to each human face organ in the target image.
The method for identifying video freezes according to claim 3, wherein acquiring the first relative position of each face organ in the face contour graph comprises:

Graphically matching the face contour graphics with a plurality of sample contour images in the face contour library;

If the matching is successful, obtain a set of relative positions corresponding to the sample contour graphics that are successfully matched, and the relative position set includes the second relative position of each face organ in the sample contour graphics; each of the relative positions is collected The second relative position corresponding to the personal facial organ is used as the first relative position of each facial organ in the facial contour graph.
The method for identifying video freezes according to claim 3, wherein the first image area corresponding to each face organ in the target image is identified based on the center coordinates of a plurality of the organs:

Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the center coordinates of the organ, and the second image area The shape is rectangular;

The face organ detection is performed on each of the second image areas, and the first image area corresponding to the face organ is identified from the second image area according to the detection result.
The method for identifying video freezes according to claim 3, wherein said extracting the coordinates of each of the first image regions to obtain the coordinate groups corresponding to each of the facial organs in the target image respectively includes :

Acquiring the number of sampling points corresponding to each face organ, wherein the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced;

Perform feature point sampling on the first image region corresponding to each of the facial organs, and obtain the coordinates of each feature point to obtain the coordinate set corresponding to each facial organ in the target image, and the feature The number of sampling points for point sampling is the number of sampling points corresponding to each of the face organs.
A video jam recognition device, which includes:

The face detection module is used to perform face detection on the video when the monitoring state corresponding to the video during the real-time call is the first state, and when a face is detected in the video, the video The corresponding monitoring state is modified to the second state;

The first image comparison module is configured to, if the monitoring state is the second state, sample the video at the first frequency to obtain N first frame images, and compare the N first frame images , Where N is a positive integer greater than 1;

The stutter start recognition module is configured to, if the comparison result is that the image difference between N images of the first frame is less than the first difference threshold, the earliest sampling of the sampling moments corresponding to each of the first frame images Time, as the start time of the video freeze, and set the monitoring state to the third state;

The second image comparison module is configured to, if the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images , Where M is a positive integer greater than 1;

The stutter end recognition module is configured to, if the comparison result is that the image difference between M images of the second frame is greater than or equal to the first difference threshold, compare the sampling time corresponding to each of the second frame images The latest sampling time in the video is used as the stop time of the video, the monitoring state is set to the first state, and the card of the video is identified based on the start time of the stop and the stop time of the stop. Time period.
8. The video freeze recognition device of claim 7, wherein the first image comparison module comprises:

The coordinate analysis module is configured to perform face organ coordinate analysis for each of the first frame images to obtain N first face organ coordinate sets;

The coordinate comparison touch is used to compare the N first frame images by using the N first face organ coordinate sets.
A terminal device, wherein the terminal device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements a video card when the computer program is executed. Steps of the frame identification method, the method includes:

When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the first state. Two states

If the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and the N first frame images are compared, where N is a positive value greater than 1. Integer

If the comparison result is that the degree of image difference between N images of the first frame is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each of the first frame images is used as the video card Stop the start time, and set the monitoring state to the third state;

If the monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive value greater than 1. Integer

If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
The terminal device according to claim 9, wherein said comparing the N images of the first frame comprises:

Perform face organ coordinate analysis for each of the first frame images to obtain N first face organ coordinate sets;

The N first frame images are compared by using the N first face organ coordinate sets.
The terminal device according to claim 10, wherein the first face organ coordinate set is a set of multiple coordinate groups, each coordinate group corresponds to a face organ, and each coordinate group contains a corresponding person. Multiple coordinates of facial organs;

The performing coordinate analysis of the face organs for each of the first frame images includes:

Taking the first frame of image to be analyzed as a target image, and performing face contour drawing on the target image to obtain a corresponding face contour graphic;

Acquiring the first relative position of each face organ in the face contour graph;

Use the first relative position to locate the face organs of the target image, obtain the organ center coordinates of each face organ in the target image, and identify each face based on the plurality of the organ center coordinates First image regions corresponding to the organs in the target image;

The coordinate extraction is performed on each of the first image regions to obtain the coordinate group corresponding to each human face organ in the target image.
The terminal device according to claim 11, wherein acquiring the first relative position of each face organ in the face contour graph comprises:

Graphically matching the face contour graphics with a plurality of sample contour images in the face contour library;

If the matching is successful, obtain a set of relative positions corresponding to the sample contour graphics that are successfully matched, and the relative position set includes the second relative position of each face organ in the sample contour graphics; each of the relative positions is collected The second relative position corresponding to the personal facial organ is used as the first relative position of each facial organ in the facial contour graph.
The terminal device according to claim 11, wherein the first image area corresponding to each face organ in the target image is identified based on a plurality of center coordinates of the organ:

Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the center coordinates of the organ, and the second image area The shape is rectangular;

The face organ detection is performed on each of the second image areas, and the first image area corresponding to the face organ is identified from the second image area according to the detection result.
The terminal device according to claim 11, wherein said extracting the coordinates of each of the first image regions to obtain the coordinate groups corresponding to each of the facial organs in the target image respectively comprises:

Acquiring the number of sampling points corresponding to each face organ, wherein the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced;

Perform feature point sampling on the first image region corresponding to each of the facial organs, and obtain the coordinates of each feature point to obtain the coordinate set corresponding to each facial organ in the target image, and the feature The number of sampling points for point sampling is the number of sampling points corresponding to each of the face organs.
A computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, the steps of a method for identifying video freezes are implemented, the method comprising:

When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the first state. Two states

If the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and the N first frame images are compared, where N is a positive value greater than 1. Integer

If the comparison result is that the degree of image difference between N images of the first frame is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each of the first frame images is used as the video card Stop the start time, and set the monitoring state to the third state;

If the monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive value greater than 1. Integer

If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
15. The computer-readable storage medium of claim 15, wherein said comparing the N images of the first frame comprises:

Perform face organ coordinate analysis for each of the first frame images to obtain N first face organ coordinate sets;

The N first frame images are compared by using the N first face organ coordinate sets.
The computer-readable storage medium of claim 16, wherein the first face organ coordinate set is a set of multiple coordinate groups, each coordinate group corresponds to a face organ, and each coordinate group contains Multiple coordinates of the corresponding facial organs;

The performing coordinate analysis of the face organs for each of the first frame images includes:

Taking the first frame of image to be analyzed as a target image, and performing face contour drawing on the target image to obtain a corresponding face contour graphic;

Acquiring the first relative position of each face organ in the face contour graph;

Use the first relative position to locate the face organs of the target image, obtain the organ center coordinates of each face organ in the target image, and identify each face based on the plurality of the organ center coordinates First image regions corresponding to the organs in the target image;

Coordinate extraction is performed on each of the first image regions to obtain the coordinate group corresponding to each human face organ in the target image.
17. The computer-readable storage medium of claim 17, wherein acquiring the first relative position of each human face organ in the human face contour graph comprises:

Graphically matching the face contour graphics with a plurality of sample contour images in the face contour library;

If the matching is successful, obtain a set of relative positions corresponding to the sample contour graphics that are successfully matched, and the relative position set includes the second relative position of each face organ in the sample contour graphics; each of the relative positions is collected The second relative position corresponding to the personal facial organ is used as the first relative position of each facial organ in the facial contour graph.
17. The computer-readable storage medium according to claim 17, wherein the first image area corresponding to each human face organ in the target image is identified based on a plurality of center coordinates of the organ:

Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the center coordinates of the organ, and the second image area The shape is rectangular;

The face organ detection is performed on each of the second image areas, and the first image area corresponding to the face organ is identified from the second image area according to the detection result.
17. The computer-readable storage medium of claim 17, wherein said extracting the coordinates of each of the first image regions to obtain the coordinate groups corresponding to each of the facial organs in the target image respectively comprises :

Acquiring the number of sampling points corresponding to each face organ, wherein the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced;

Perform feature point sampling on the first image region corresponding to each of the facial organs, and obtain the coordinates of each feature point to obtain the coordinate set corresponding to each facial organ in the target image, and the feature The number of sampling points for point sampling is the number of sampling points corresponding to each of the face organs.