WO2021159609A1 - Video lag identification method and apparatus, and terminal device - Google Patents

Video lag identification method and apparatus, and terminal device Download PDF

Info

Publication number
WO2021159609A1
WO2021159609A1 PCT/CN2020/087381 CN2020087381W WO2021159609A1 WO 2021159609 A1 WO2021159609 A1 WO 2021159609A1 CN 2020087381 W CN2020087381 W CN 2020087381W WO 2021159609 A1 WO2021159609 A1 WO 2021159609A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
video
organ
image
state
Prior art date
Application number
PCT/CN2020/087381
Other languages
French (fr)
Chinese (zh)
Inventor
胡甜敏
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021159609A1 publication Critical patent/WO2021159609A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Definitions

  • This application belongs to the technical field related to computer vision in artificial intelligence, and in particular relates to a video jam recognition method and terminal equipment.
  • the tester manually consults the recorded video files of the call, and determines the period of freezing in the video files, but the efficiency of such identification is extremely low.
  • the embodiments of the present application provide a method and terminal device for identifying video freezes, which can solve the problem of low efficiency in identifying video call freezes.
  • the first aspect of the embodiments of the present application provides a method for identifying video freezes, including:
  • the monitoring state corresponding to the video during the real-time call is the first state
  • perform face detection on the video and when a face is detected in the video, modify the monitoring state corresponding to the video to the first state.
  • the video is sampled at the first frequency to obtain N first frame images, and the N first frame images are compared, where N is a positive value greater than 1.
  • the earliest sampling time among the sampling times corresponding to each of the first frame images is used as the video card Stop the start time, and set the monitoring state to the third state;
  • the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive value greater than 1.
  • the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
  • a second aspect of the embodiments of the present application provides a video freeze identification device, including:
  • the face detection module is used to perform face detection on the video when the monitoring state corresponding to the video during the real-time call is the first state, and when a face is detected in the video, the video The corresponding monitoring state is modified to the second state;
  • the first image comparison module is configured to, if the monitoring state is the second state, sample the video at the first frequency to obtain N first frame images, and compare the N first frame images , Where N is a positive integer greater than 1;
  • the stutter start recognition module is configured to, if the comparison result is that the image difference between N images of the first frame is less than the first difference threshold, the earliest sampling of the sampling moments corresponding to each of the first frame images Time, as the start time of the video freeze, and set the monitoring state to the third state;
  • the second image comparison module is configured to, if the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images , Where M is a positive integer greater than 1;
  • the stutter end recognition module is configured to, if the comparison result is that the image difference between M images of the second frame is greater than or equal to the first difference threshold, compare the sampling time corresponding to each of the second frame images The latest sampling time in the video is used as the stop time of the video, the monitoring state is set to the first state, and the card of the video is identified based on the start time of the stop and the stop time of the stop. Time period.
  • a third aspect of the embodiments of the present application provides a terminal device.
  • the terminal device includes a memory and a processor.
  • the memory stores a computer program that can run on the processor.
  • the processor executes the The computer program implements the steps of the video freeze recognition method as described in any one of the first aspect above, which includes: when the monitoring state corresponding to the video during the real-time call is the first state, performing facial recognition on the video Detect, and when a face is detected in the video, modify the monitoring state corresponding to the video to the second state; if the monitoring state is the second state, sample the video at the first frequency to obtain N first-frame images, and N first-frame images are compared, where N is a positive integer greater than 1, if the comparison result is the image difference between N first-frame images Is less than the first difference threshold, the earliest sampling moment among the sampling moments corresponding to each of the first frame images is used as the starting moment of the video freeze, and the monitoring state is set to the third state;
  • the monitoring state is the third state,
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, including: a computer program is stored, wherein the computer program is executed by a processor to implement the video card described in any one of the above-mentioned first aspects.
  • the steps of the instant recognition method include: when the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, perform face detection on the video.
  • the monitoring state corresponding to the video is modified to the second state; if the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and N first frame images are obtained. Perform comparison, where N is a positive integer greater than 1.
  • the comparison result is that the image difference between N images of the first frame is less than the first difference threshold, compare each of the first frame images to The earliest sampling moment among the sampling moments is used as the start moment of the video freeze, and the monitoring state is set to the third state; if the monitoring state is the third state, the video is performed at the second frequency M images of the second frame are sampled, and M images of the second frame are compared, where M is a positive integer greater than 1; if the comparison result is an image between the M images of the second frame The degree of difference is greater than or equal to the first difference threshold, the latest sampling moment among the sampling moments corresponding to each of the second frame images is used as the stop time of the video, and the monitoring state is set to the first A state, and identify the freeze time period of the video based on the freeze start time and the freeze end time.
  • the fifth aspect of the embodiments of the present application provides a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the video freeze identification method described in any one of the above-mentioned first aspects.
  • the embodiment of the application has the beneficial effects that: on the one hand, the embodiment of the application realizes the efficient and accurate identification of the start and end of the jam; on the other hand, it sets different monitoring states for different stages of the jam. , So as to realize the effective distinguishing processing for each freeze, and ensure the accuracy of the recognition of each freeze.
  • FIG. 1 is a schematic diagram of the implementation process of the method for identifying video freezes provided by Embodiment 1 of the present application;
  • FIG. 2 is a schematic diagram of the implementation process of the method for identifying video freezes provided in the second embodiment of the present application;
  • FIG. 3 is a schematic diagram of the implementation process of the method for identifying video freezes provided in the third embodiment of the present application.
  • FIG. 4 is a schematic diagram of the implementation process of the method for identifying video freezes provided by the fourth embodiment of the present application.
  • FIG. 5 is a schematic diagram of the implementation process of the method for identifying video freezes provided by Embodiment 5 of the present application;
  • FIG. 6 is a schematic diagram of the implementation process of the method for identifying video freezes provided by the sixth embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a video freeze identification device provided by Embodiment 7 of the present application.
  • FIG. 8 is a schematic diagram of a terminal device provided in Embodiment 8 of the present application.
  • the first state monitoring state is set for the video during the call in advance, and then the video generated during the real-time call is monitored in real time.
  • the video is found
  • set the monitoring state of the video to the second state, and when the video is in the first state, the frame image of the video is sampled, and it is recognized whether there are continuous multiple frame images unchanged. If it indicates that the video has started to freeze, and the monitoring status of the video is marked as the third state at the same time.
  • continue frame image sampling and comparison If there is a large difference between consecutive frames , Which means that the video has resumed and changed.
  • the embodiment of the application realizes the accurate identification of the start and end of the stuttering.
  • the embodiment of the application realizes the accurate distinguishing and processing of each stuttering. This improves the accuracy of the recognition of each stutter, so that the embodiment of the present application can realize continuous stutter recognition of the video.
  • the execution subject of the video freeze identification method in the embodiments of this application is a terminal device with a certain video processing capability.
  • a certain video processing capability refers to the ability to extract and compare frames from the video.
  • the terminal The specific equipment types of the equipment are not limited here, and can be selected by the technicians according to the actual needs of the scene, including but not limited to terminal equipment for video calls, such as mobile phones, computers, etc., and can also be connected to the terminal equipment for video calls. Third-party equipment, such as servers, etc.
  • Fig. 1 shows a flow chart of the method for identifying video freezes provided in the first embodiment of the present application, and the details are as follows:
  • the monitoring state is used to mark the freeze stage of the video in real time, where the monitoring state includes the first state, the second state, and the third state, which respectively correspond to the unknown stage of the video freeze and the start of the freeze. Phase and the end of the stall.
  • the embodiment of this application will design different identification strategies according to the characteristics or actual needs of each stall stage to realize the accurate identification of the beginning and end of the stall. In the embodiment of this application, it is wrong.
  • the specific marking method of the monitoring state can be limited and can be set by the technicians.
  • the monitoring state can be marked by adding different marks to the video, such as setting the marks corresponding to the first state, the second state, and the third state.
  • the numbers are 1, 2 and 3.
  • all real-time call videos are marked as the first state by default, so as to ensure the accurate selection of the video freeze state and the freeze identification strategy at the beginning of the call.
  • the monitoring status of the call video will be detected in real time during the call.
  • the video call is meaningful only when there are users on both sides of the video call (when there is no user, even if there is no user There is no need for freeze analysis and optimization). Therefore, in order to improve the effectiveness of the freeze analysis, when detecting that the video is in the first state, the embodiment of the application will simultaneously start real-time detection of whether a human face appears in the video, and Only when there is a human face, the monitoring state is switched to the second state, and the stuttering starts to be recognized.
  • S102 If the monitoring state is the second state, sample the video at the first frequency to obtain N first frame images, and compare the N first frame images, where N is a positive integer greater than 1.
  • the embodiment of this application will start to analyze whether the video stuck, specifically:
  • this embodiment of the application will sample the frame images of the video at the first frequency to obtain the first frame of image to be analyzed. At the same time, considering that the freeze is not necessarily full of the entire call process. It will start at a certain time during the call.
  • the embodiment of the present application does not limit the specific values of the first frequency and N, which can be set by technicians according to actual needs.
  • the first frequency can be set to 1 to 5 frames per second
  • N is set to 5 to 5 frames per second. 12.
  • the actual application to determine whether it is stuck or not to set the conversion relationship between the first frequency and N, and set any specific value of the first frequency and N, and then according to the conversion relationship and the known specific value
  • the length of time for determining whether it is stuck can also be set by the technician according to the needs of the actual scene, and it is not limited here.
  • the embodiment of the present application does not limit the image comparison method between the first frame images, and can be selected or set by the technician according to actual needs, including but not limited to, calculating all phases in N first frame images.
  • the image comparison result is that the image difference between N first frame images is less than the first difference threshold, it means that the video has lasted for a period of time and there is no or almost no picture change.
  • the embodiment of the application will directly determine that the video has started. Stalling, and at the same time, the earliest sampling time of the N first frame images processed this time is taken as the specific moment of the start of the stutter, so as to realize the accurate recognition of the start of the stutter and the precise positioning of the start of the stutter.
  • the specific value of the first difference threshold can be selected or set by the technician according to the requirements of the actual scene, for example, it can be set to 5% to 15%.
  • the embodiment of the application is When determining the start time of the freeze, the monitoring state of the video is also modified to the third state. Since the monitoring state is no longer the second state, the recognition operation of the start of the video freeze in S102 and S103 is terminated, and at the same time This enables the subsequent recognition operation for the end of the freeze to be turned on, thereby ensuring the normal recognition of the start and end of the video freeze.
  • S104 If the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images, where M is a positive integer greater than 1.
  • the monitoring state When the monitoring state is in the third state, it means that the real-time video is in a freeze at this stage, and the video may return to normal at any time, that is, the freeze may be terminated at any time. Therefore, the embodiment of this application will start to determine whether the video freeze is terminated.
  • the principle of S104 sampling and image comparison is basically the same as that of S102.
  • the second frequency and the first frequency may be the same or different.
  • M and N can be the same or different, and the specific values of these parameters can be selected and set by the technicians according to actual needs, and they are not limited here.
  • the first frequency>the second frequency, and M>N are set to ensure that the video is in progress. The amount of sampled data when the freeze starts to be recognized.
  • the embodiment of this application will directly determine that the video freeze is over, and at the same time, the M The latest sampling time among all the sampling moments corresponding to the second frame of image is used as the specific time when the freeze ends, and the precise positioning of the start and end time of the freeze is realized, from the start time of the freeze to the end of the freeze The time period between the end moments is the freeze time period of this video.
  • the embodiment of the present application will also restore the monitoring state of the video to the first state.
  • the embodiment of the present application will return to perform the operation of S101 to identify the next video freeze. Therefore, the embodiment of the present application For a while, it will continue to loop during the video call, and it will not be terminated until the end of the call, so as to achieve accurate identification of all jams during the entire call.
  • the stall time period can also be sent to a third-party device for subsequent stall analysis and improvement operations, for example, can be sent to a specific server ,
  • the server performs operations such as jam analysis and optimization.
  • the first state monitoring state is set for the video during the call in advance, and then the video generated during the real-time call is monitored in real time.
  • the video is found
  • set the monitoring state of the video to the second state, and when the video is in the first state, the frame image of the video is sampled, and it is recognized whether there are continuous multiple frame images unchanged. If it indicates that the video has started to freeze, and the monitoring status of the video is marked as the third state at the same time.
  • continue frame image sampling and comparison If there is a large difference between consecutive frames , Which means that the video has resumed and changed.
  • the corresponding stutter end time can be obtained.
  • the embodiments of the present application realize the real-time and accurate identification of the start and end moments of each freeze, and can determine the time period during which the current freeze occurs when the freeze ends, which requires high call fluency.
  • a solution that can analyze and optimize the cause of stuttering in real time can realize the real-time stuttering optimization of video calls, and ensure the real-time fluency of the call.
  • the time period of freezing can be found only after the video call is over, the quality of the video call can be greatly improved.
  • the embodiment of the application does not perform comparison processing on the entire frame image when performing image comparison, but only analyzes and compares the facial organs therein, so as to improve the efficiency of comparison, as shown in FIG. 2,
  • the steps of image comparison in the second embodiment of the present application include:
  • S201 Perform face organ coordinate analysis for each first frame image to obtain N first face organ coordinate sets.
  • the facial organs of each first frame image are recognized, and the coordinates of each facial organ in the first frame image are extracted, so as to obtain the person corresponding to each first frame image.
  • Face organ coordinate set where the specific number of recognized face organ types can be set by the technician, including but not limited to any one or more of the mouth, eyes, eyebrows and nose.
  • the face organ recognition method is also not limited here, and can be selected or designed by the technicians according to actual needs, including but not limited to face organ recognition based on geometric features, neural network models, and elastic models. You can also refer to this Application examples three to six.
  • S202 Use the N first face organ coordinate sets to compare the N first frame images.
  • the coordinate sets are compared to obtain the difference degree between the N first face organ coordinate sets, and the difference degree As the image difference degree between the N first frame images, the image comparison in Embodiment 1 of the present application is realized.
  • the specific coordinate set data comparison method is not limited here, and can be selected or set by the technician. Setting, including but not limited to calculating the Euclidean distance between the coordinate sets, and calculating the corresponding difference degree based on the Euclidean distance.
  • each face organ is taken as an independent analysis object for analysis and coordinate extraction, and each person is obtained
  • the coordinate group corresponding to the face organ, that is, the first face organ coordinate set is a collection of multiple coordinate groups, as shown in FIG.
  • the analysis operations include:
  • S301 Use the first frame of image to be analyzed as a target image, and perform face contour drawing on the target image to obtain a corresponding face contour graphic.
  • the embodiment of the present application will pre-set the relative positions of various facial organs in the face, and will perform the calculation on the target image. Face recognition is to locate the human face in the target image, and then draw the outline of the human face, so as to obtain the corresponding human face contour graphics for the subsequent rough positioning of the human face organs.
  • the first relative position of each face organ in the face contour graphic can be determined, so as to realize the coarse positioning of the face organ.
  • the fourth embodiment of the present application will pre-correct Some face shapes in real life and the distribution of face organs under various face shapes are analyzed, and multiple sample contour graphics of different face shapes are drawn according to the analysis, and the facial organ corresponding to each sample contour graphic is determined According to the relative position data of the samples, the relative position recognition of the actual facial organs is performed according to these sample contour graphics.
  • the step of obtaining the first relative position in the fourth embodiment of the present application specifically includes:
  • S401 Perform graphic matching of a face contour graphic with a plurality of sample contour images in a face contour library.
  • a plurality of drawn sample contour graphics will be stored in a face contour library in advance, and after the face contour graphics are drawn, the face contour library will be matched with graphics to filter out Appropriate sample contour graphics.
  • the embodiment of the application will directly read the relative position data of the facial organ corresponding to the sample contour graphic. It is directly used as the relative position of the face organs of the face contour figure in the third embodiment of the present application.
  • the pre-pattern matching fails, in order to prevent the matching failure.
  • the accurate relative position of the face organs in the target image can be obtained.
  • a default relative position set is preset, in which the relative position data corresponding to each face organ is stored, and when the matching fails, Just read the default relative position set.
  • the embodiment of the present application will determine the coordinates of each face in the target image according to the actual position of the face in the target image and the relative position of the face organ in the face. Position, and use this position as the corresponding organ center coordinate to achieve rough positioning in the target image.
  • the obtained face images will be different, and even the position of the face organs under different expressions will change a little.
  • the coordinates of the mouth will be changed. There are certain changes, so the coarse positioning coordinates cannot represent the facial organs, and may not even be in the regional image of the facial organs.
  • the embodiment of the present application will use the organ center coordinates as the starting point to perform organ recognition on the surrounding image area to achieve precise positioning of various facial organs, for example, using the organ center coordinates corresponding to the mouth As a starting point, mouth recognition is performed on a piece of image area around the coordinates, so as to determine the first image area corresponding to the actual mouth, so as to achieve precise positioning of the mouth.
  • the steps of precise positioning of facial organs in Embodiment 5 of the present application specifically include:
  • S501 Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the organ center coordinates, and the shape of the second image area is a rectangle.
  • the approximate proportion of face organs in the face is relatively fixed.
  • the height of the nose is generally about 1/3 of the length of the face, and the width is about 1/5 of the width of the face.
  • the image area where the face organ is roughly located in the face can be quickly located.
  • the ratio data of each face organ to the face size will be set in advance, and then each face organ will be determined according to the actual face size in the target image and the set ratio data.
  • Corresponding retrieval matrix size such as still taking the example of the above nose as an example, assuming that the face length in the target image is 10cm and the width is 7cm, and the width is calculated based on the height being 1/3 of the face length and the width of the face being 1/5, that is It can be determined that the size of the corresponding retrieval matrix is 3.33cm ⁇ 1.4cm.
  • the size of the retrieval matrix After determining the size of the retrieval matrix, take the organ center coordinates of the face organs as the center point of the matrix, and determine the second image area whose length and width are the size of the retrieval matrix in the target image to obtain the corresponding position of each face organ. The approximate area.
  • S502 Perform face organ detection on each second image area, and identify the first image area corresponding to the face organ from the second image area according to the detection result.
  • the corresponding facial organ search in the second image area can be performed to achieve precise positioning of the facial organs.
  • the embodiment of the present application will perform a nose recognition search on the second image area to determine the nose contained therein and the actual first image area of the nose in the second image area. An image area.
  • S304 Perform coordinate extraction on each first image area to obtain coordinate groups corresponding to each human face organ in the target image.
  • the embodiment of the present application After accurately locating the first image area where each face organ is located, the embodiment of the present application will extract the coordinates of the first image area. Since the first image area contains more image information, every time after the coordinate extraction is performed Each first image area will correspond to multiple coordinate data. In the embodiment of the present application, all the coordinate data corresponding to a single first image area will be stored in a coordinate group, so as to obtain the coordinates corresponding to the distribution of various facial organs. Group, and then obtain the first face organ coordinate set of the target image required in the second embodiment of the present application.
  • the third embodiment of the present application by first performing coarse positioning of the facial organs, and then quickly performing the facial organ retrieval in the surrounding image area based on the coarse positioning, the rapid and accurate positioning and recognition of the facial organs can be realized, relatively directly on the face.
  • the identification efficiency of the embodiment of the present application is higher, and the amount of calculation is smaller.
  • the step of extracting the coordinates of the first image region in the sixth embodiment of the present application specifically includes:
  • S601 Acquire the number of sampling points corresponding to each face organ, where the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced.
  • the mouth, eyes, eyebrows, and nose are sorted according to the frequency of use of each face organ during a call, and corresponding sampling points are set for each face organ according to the frequency.
  • the higher the frequency the greater the number of sampling points, so that the subsequent actual sampling of feature points and coordinate extraction can distinguish the number of extracted coordinates.
  • the specific number of sampling points can be set by technicians according to actual needs, and is not limited here.
  • S602. Perform feature point sampling on the first image region corresponding to each face organ, and obtain the coordinates of each feature point to obtain the coordinate group corresponding to each face organ in the target image, and the number of sampling points for the feature point sampling is the face The number of sampling points corresponding to each organ.
  • the feature point sampling of the first image area is started. For each first image area, only the feature points corresponding to the number of sampling points will be sampled, for example, Assuming that the number of sampling points corresponding to the first image area A is 20, only 20 feature points will be sampled from the first image area A at this time. After sampling the required number of feature points, the coordinate data of these feature points are obtained, thereby obtaining the coordinate group corresponding to each first image area, that is, obtaining the coordinate group of each human face organ in the target image.
  • the specific feature point sampling method is not limited here, and can be set by the technician according to actual needs, including but not limited to such as SIFT algorithm and Susan algorithm, etc., in order to accurately control the number of extracted sampling points, if the feature is used
  • the point extraction algorithm itself cannot set the number of feature point samples. After the normal feature point extraction is completed, the feature points can be deleted or added until the corresponding number of sampling points is met.
  • Embodiments 2 to 6 of the present application are refinement or optimization solutions for the comparison of the first frame of image
  • it can also be applied to the comparison operation of the second frame of image
  • S104 of Embodiment 1 of this application for application.
  • it is only necessary to replace the processed object from the first frame of image to the second frame of image.
  • FIG. 7 shows a structural block diagram of a video freeze identification device provided in an embodiment of the present application.
  • the video freeze identification device illustrated in FIG. 7 may be the main body of execution of the video freeze identification method provided in the first embodiment.
  • the device for identifying video freezes includes:
  • the face detection module 71 is configured to perform face detection on the video when the monitoring state corresponding to the video during a real-time call is the first state, and when a face is detected in the video, the The monitoring state corresponding to the video is modified to the second state.
  • the first image comparison module 72 is configured to, if the monitoring state is the second state, sample the video at a first frequency to obtain N first frame images, and compare N first frame images Yes, where N is a positive integer greater than 1.
  • the stutter start recognition module 73 is configured to, if the comparison result is that the image difference degree between the N first frame images is less than the first difference threshold, the earliest one among the sampling moments corresponding to each of the first frame images The sampling time is used as the start time of the video freeze, and the monitoring state is set to the third state.
  • the second image comparison module 74 is configured to, if the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images Yes, where M is a positive integer greater than 1.
  • the stall end recognition module 75 is configured to, if the comparison result is that the image difference degree between the M images of the second frame is greater than or equal to the first difference threshold, sample the corresponding samples of each of the second frame images The latest sampling time among the time, as the stop time of the video, the monitoring state is set to the first state, and the video is identified based on the start time of the stop time and the stop time of the stop time Caton time period.
  • the first image comparison module 72 includes:
  • the coordinate analysis module is used to analyze the coordinates of the face organs for each of the first frame images to obtain N first face organ coordinate sets.
  • the coordinate comparison touch is used to compare the N first frame images by using the N first face organ coordinate sets.
  • the first face organ coordinate set is a set of multiple coordinate groups, each coordinate group corresponds to a type of face organ, and each coordinate group contains multiple coordinates of the corresponding face organ, and the coordinate analysis Modules, including:
  • the contour drawing module is used to take the first frame image to be analyzed as a target image, and draw a face contour on the target image to obtain a corresponding face contour graphic.
  • the position obtaining module is used to obtain the first relative position of each face organ in the face contour graph.
  • the organ image search module is used for locating the face organs of the target image by using the first relative position to obtain the organ center coordinates of each face organ in the target image, and based on a plurality of the organ centers The coordinates are used to identify the first image area corresponding to each face organ in the target image.
  • the coordinate extraction module is configured to extract the coordinates of each of the first image regions to obtain the coordinate groups corresponding to each of the facial organs in the target image.
  • the location acquisition module includes:
  • the relative position set corresponding to the sample contour graph that is successfully matched, and the relative position set includes the second relative position of each human face organ in the sample contour graph.
  • the second relative position corresponding to each face organ in the relative position set is taken as the first relative position of each face organ in the face contour graph.
  • organ image search module includes:
  • the face organ detection is performed on each of the second image areas, and the first image area corresponding to the face organ is identified from the second image area according to the detection result.
  • the coordinate extraction module includes:
  • the number of sampling points corresponding to each face organ is acquired, wherein the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced.
  • the number of sampling points for point sampling is the number of sampling points corresponding to each of the face organs.
  • the video freeze recognition method provided by the embodiments of this application can be applied to mobile phones, tablet computers, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, and super mobiles.
  • AR augmented reality
  • VR virtual reality
  • terminal devices such as ultra-mobile personal computers (UMPC), netbooks, and personal digital assistants (personal digital assistants, PDAs)
  • UMPC ultra-mobile personal computers
  • PDAs personal digital assistants
  • the terminal device may be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a wireless local loop (Wireless Local Loop, WLL) station, a personal digital processing (Personal Digital Assistant, PDA) device, Handheld devices with wireless communication functions, computing devices or other processing devices connected to wireless modems, vehicle-mounted devices, car networking terminals, computers, laptop computers, handheld communication devices, handheld computing devices, satellite wireless devices, wireless modems Card, TV set top box (STB), customer premise equipment (CPE) and/or other equipment used for communication on wireless systems and next-generation communication systems, such as mobile terminals in 5G networks Or mobile terminals in the public land mobile network (PLMN) network that will evolve in the future.
  • SIP Session Initiation Protocol
  • WLL Wireless Local Loop
  • PDA Personal Digital Assistant
  • Handheld devices with wireless communication functions computing devices or other processing devices connected to wireless modems, vehicle-mounted devices, car networking terminals, computers, laptop computers, handheld communication devices,
  • the wearable device can also be a general term for applying wearable technology to intelligently design daily wear and develop wearable devices, such as glasses, gloves, Watches, clothing and shoes, etc.
  • a wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories.
  • Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction.
  • wearable smart devices include full-featured, large-sized, complete or partial functions that can be implemented without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, and need to be used in conjunction with other devices such as smart phones. , Such as all kinds of smart bracelets and smart jewelry for physical sign monitoring.
  • FIG. 8 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device 8 of this embodiment includes: at least one processor 80 (only one is shown in FIG. Computer program 82.
  • the processor 80 executes the computer program 82, the steps in the foregoing embodiments of the video freeze identification method, such as steps 101 to 105 shown in FIG. 1, are implemented.
  • the processor 80 executes the computer program 82, the functions of the modules/units in the foregoing device embodiments, for example, the functions of the modules 71 to 75 shown in FIG. 7 are realized.
  • the terminal device 8 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, a processor 80 and a memory 81.
  • FIG. 8 is only an example of the terminal device 8 and does not constitute a limitation on the terminal device 8. It may include more or less components than shown in the figure, or a combination of certain components, or different components.
  • the terminal device may also include an input sending device, a network access device, a bus, and the like.
  • the so-called processor 80 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 81 may be an internal storage unit of the terminal device 8 in some embodiments, such as a hard disk or memory of the terminal device 8.
  • the memory 81 may also be an external storage device of the terminal device 8, such as a plug-in hard disk equipped on the terminal device 8, a smart memory card (Smart Media Card, SMC), or a Secure Digital (SD). Card, Flash Card, etc.
  • the memory 81 may also include both an internal storage unit of the terminal device 8 and an external storage device.
  • the memory 81 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program.
  • the memory 81 can also be used to temporarily store data that has been sent or will be sent.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores a computer program, and the computer When the program is executed by the processor, the steps in each of the above method embodiments can be realized, including: when the monitoring state corresponding to the video in the real-time call is the first state, face detection is performed on the video, and the video is detected.
  • the monitoring state When there is a face in the video, modify the monitoring state corresponding to the video to the second state; if the monitoring state is the second state, sample the video at the first frequency to obtain N first frames Image, and compare N images of the first frame, where N is a positive integer greater than 1, if the comparison result is that the image difference between N images of the first frame is less than the first difference threshold , The earliest sampling moment among the sampling moments corresponding to each of the first frame images is used as the starting moment of the video freeze, and the monitoring state is set to the third state; if the monitoring state is the first In three states, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive integer greater than 1; if the comparison result is The image difference degree between the M second frame images is greater than or equal to the first difference threshold, and the latest sampling time among the sampling times corresponding to each second frame image is used as the video card At the end time of the frame, the monitoring state is set to the first state, and the freeze
  • the embodiments of the present application provide a computer program product.
  • the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the present application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (Read-Only Memory, ROM) , Random Access Memory (RAM), electrical carrier signal, telecommunications signal, and software distribution media, etc.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The present application is suitable for the technical field of video processing and provides a video lag identification method and apparatus, and a terminal device. The method comprises: when a monitoring state of a video is a first state and a face exists in the video, modifying the monitoring state into a second state; if the monitoring state is the second state, sampling the video to obtain N first image frames; if an image difference degree among the N first image frames is smaller than a first difference threshold, taking the earliest sampling moment as a lag starting moment, and setting the monitoring state as a third state; if the monitoring state is the third state, sampling the video to obtain M second image frames; and if an image difference degree among the M second image frames is greater than or equal to the first difference threshold, taking the latest sampling moment as a lag termination moment of the video, setting the monitoring state as the first state, and identifying a lag time period of the video. According to the embodiment of the present application, precise identification on the start and end of lag is achieved, and precise identification of a lag time period is achieved.

Description

一种视频卡顿识别方法、装置及终端设备Video jam recognition method, device and terminal equipment
本申请要求于2020年2月11日提交中国专利局、申请号为202010087225.0,发明名称为“一种视频卡顿识别方法、装置及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 11, 2020, the application number is 202010087225.0, and the invention title is "a method, device and terminal equipment for video jam identification", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请属于人工智能中的计算机视觉相关技术领域,尤其涉及视频卡顿识别方法及终端设备。This application belongs to the technical field related to computer vision in artificial intelligence, and in particular relates to a video jam recognition method and terminal equipment.
背景技术Background technique
随着技术的不断进步,通过网络进行视频通话(以下简称通话)已经成为了生活和工作中常见的一幕场景,由于实时通话的效果会受到实时网络状态、视频设备状态和视频服务器资源等的影响,其中只要任意一环出现问题都有可能导致通话过程中出现视频卡顿。发明人发现为了改善卡顿的情况,首先需要确定出通话过程中具体何时出现了卡顿,再对卡顿时的各环节情况进行分析进而定位出卡顿原因,以精确改善视频卡顿情况。With the continuous advancement of technology, video calls over the Internet (hereinafter referred to as calls) have become a common scene in life and work. The effect of real-time calls will be affected by real-time network status, video equipment status, and video server resources, etc. , As long as there is a problem in any one of them, it may cause the video to freeze during the call. The inventor found that in order to improve the stuttering situation, it is first necessary to determine when the stuttering occurred during the call, and then analyze the various links during the stuttering to locate the cause of the stuttering, so as to accurately improve the video stuttering situation.
现有技术中都是由测试人员手动对通话录制的视频文件进行查阅,并确定出其中存在卡顿的时段,然而这样识别的效率极为低下。In the prior art, the tester manually consults the recorded video files of the call, and determines the period of freezing in the video files, but the efficiency of such identification is extremely low.
技术问题technical problem
有鉴于此,本申请实施例提供了一种视频卡顿识别方法及终端设备,可以解决对视频通话卡顿识别效率低下的问题。In view of this, the embodiments of the present application provide a method and terminal device for identifying video freezes, which can solve the problem of low efficiency in identifying video call freezes.
技术解决方案Technical solutions
本申请实施例的第一方面提供了一种视频卡顿识别方法,包括:The first aspect of the embodiments of the present application provides a method for identifying video freezes, including:
当实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态;When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the first state. Two states
若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数;If the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and the N first frame images are compared, where N is a positive value greater than 1. Integer
若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态;If the comparison result is that the degree of image difference between N images of the first frame is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each of the first frame images is used as the video card Stop the start time, and set the monitoring state to the third state;
若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数;If the monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive value greater than 1. Integer
若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
本申请实施例的第二方面提供了一种视频卡顿识别装置,包括:A second aspect of the embodiments of the present application provides a video freeze identification device, including:
人脸检测模块,用于在实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态;The face detection module is used to perform face detection on the video when the monitoring state corresponding to the video during the real-time call is the first state, and when a face is detected in the video, the video The corresponding monitoring state is modified to the second state;
第一图像比对模块,用于若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数;The first image comparison module is configured to, if the monitoring state is the second state, sample the video at the first frequency to obtain N first frame images, and compare the N first frame images , Where N is a positive integer greater than 1;
卡顿开始识别模块,用于若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态;The stutter start recognition module is configured to, if the comparison result is that the image difference between N images of the first frame is less than the first difference threshold, the earliest sampling of the sampling moments corresponding to each of the first frame images Time, as the start time of the video freeze, and set the monitoring state to the third state;
第二图像比对模块,用于若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数;The second image comparison module is configured to, if the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images , Where M is a positive integer greater than 1;
卡顿结束识别模块,用于若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第 一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。The stutter end recognition module is configured to, if the comparison result is that the image difference between M images of the second frame is greater than or equal to the first difference threshold, compare the sampling time corresponding to each of the second frame images The latest sampling time in the video is used as the stop time of the video, the monitoring state is set to the first state, and the card of the video is identified based on the start time of the stop and the stop time of the stop. Time period.
本申请实施例的第三方面提供了一种终端设备,所述终端设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述第一方面中任一项所述视频卡顿识别方法的步骤,即包括:当实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态;若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数;若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态;若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数;若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。A third aspect of the embodiments of the present application provides a terminal device. The terminal device includes a memory and a processor. The memory stores a computer program that can run on the processor. The processor executes the The computer program implements the steps of the video freeze recognition method as described in any one of the first aspect above, which includes: when the monitoring state corresponding to the video during the real-time call is the first state, performing facial recognition on the video Detect, and when a face is detected in the video, modify the monitoring state corresponding to the video to the second state; if the monitoring state is the second state, sample the video at the first frequency to obtain N first-frame images, and N first-frame images are compared, where N is a positive integer greater than 1, if the comparison result is the image difference between N first-frame images Is less than the first difference threshold, the earliest sampling moment among the sampling moments corresponding to each of the first frame images is used as the starting moment of the video freeze, and the monitoring state is set to the third state; The monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive integer greater than 1; If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
本申请实施例的第四方面提供了一种计算机可读存储介质,包括:存储有计算机程序,其中,所述计算机程序被处理器执行时实现如上述第一方面中任一项所述视频卡顿识别方法的步骤,即包括:当实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态;若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数;若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态;若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数;若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, including: a computer program is stored, wherein the computer program is executed by a processor to implement the video card described in any one of the above-mentioned first aspects. The steps of the instant recognition method include: when the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, perform face detection on the video. The monitoring state corresponding to the video is modified to the second state; if the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and N first frame images are obtained. Perform comparison, where N is a positive integer greater than 1. If the comparison result is that the image difference between N images of the first frame is less than the first difference threshold, compare each of the first frame images to The earliest sampling moment among the sampling moments is used as the start moment of the video freeze, and the monitoring state is set to the third state; if the monitoring state is the third state, the video is performed at the second frequency M images of the second frame are sampled, and M images of the second frame are compared, where M is a positive integer greater than 1; if the comparison result is an image between the M images of the second frame The degree of difference is greater than or equal to the first difference threshold, the latest sampling moment among the sampling moments corresponding to each of the second frame images is used as the stop time of the video, and the monitoring state is set to the first A state, and identify the freeze time period of the video based on the freeze start time and the freeze end time.
本申请实施例的第五方面提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述第一方面中任一项所述视频卡顿识别方法。The fifth aspect of the embodiments of the present application provides a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the video freeze identification method described in any one of the above-mentioned first aspects.
有益效果Beneficial effect
本申请实施例与现有技术相比存在的有益效果是:本申请实施例一方面实现了对卡顿开始和结束的高效精确识别,另一方面,通过对不同卡顿阶段设置不同的监控状态,从而实现对每次卡顿的有效区分处理,保障了对每一次卡顿识别的准确性。Compared with the prior art, the embodiment of the application has the beneficial effects that: on the one hand, the embodiment of the application realizes the efficient and accurate identification of the start and end of the jam; on the other hand, it sets different monitoring states for different stages of the jam. , So as to realize the effective distinguishing processing for each freeze, and ensure the accuracy of the recognition of each freeze.
附图说明Description of the drawings
图1是本申请实施例一提供的视频卡顿识别方法的实现流程示意图;FIG. 1 is a schematic diagram of the implementation process of the method for identifying video freezes provided by Embodiment 1 of the present application;
图2是本申请实施例二提供的视频卡顿识别方法的实现流程示意图;2 is a schematic diagram of the implementation process of the method for identifying video freezes provided in the second embodiment of the present application;
图3是本申请实施例三提供的视频卡顿识别方法的实现流程示意图;FIG. 3 is a schematic diagram of the implementation process of the method for identifying video freezes provided in the third embodiment of the present application;
图4是本申请实施例四提供的视频卡顿识别方法的实现流程示意图;4 is a schematic diagram of the implementation process of the method for identifying video freezes provided by the fourth embodiment of the present application;
图5是本申请实施例五提供的视频卡顿识别方法的实现流程示意图;FIG. 5 is a schematic diagram of the implementation process of the method for identifying video freezes provided by Embodiment 5 of the present application;
图6是本申请实施例六提供的视频卡顿识别方法的实现流程示意图;FIG. 6 is a schematic diagram of the implementation process of the method for identifying video freezes provided by the sixth embodiment of the present application;
图7是本申请实施例七提供的视频卡顿识别装置的结构示意图;FIG. 7 is a schematic structural diagram of a video freeze identification device provided by Embodiment 7 of the present application;
图8是本申请实施例八提供的终端设备的示意图。FIG. 8 is a schematic diagram of a terminal device provided in Embodiment 8 of the present application.
本发明的实施方式Embodiments of the present invention
为了便于理解本申请,此处先对本申请实施例进行简要说明,由于实时通话的效果会受到通话 网络各个环节的影响,其中只要任意一个环节出现问题都有可能会导致通话的视频出现卡顿。为了准确识别出视频出现卡顿的时间段,现在都是由技术人员在通话结束后手动对录制的视频进行查阅,但一方面这样查阅定位的成本高昂且效率极为低下,无法适应越来越多的视频卡顿分析数量、成本和效率上的需求,另一方面,在一些对通话流畅性要求较高特殊的场景之中,例如银行贷款视频面审的场景,通话结束后识别卡顿虽然可以为后续的通话提供一些分析保障,但对当次的通话而言并没有太大的实际意义。In order to facilitate the understanding of the application, the embodiments of the application are briefly explained here. Since the effect of real-time calls will be affected by all links of the call network, any problem in any link may cause the video of the call to freeze. In order to accurately identify the time period when the video is stuck, technicians are now manually reviewing the recorded video after the call. However, on the one hand, the cost of reviewing and positioning in this way is high and the efficiency is extremely low, and it cannot adapt to more and more Analyze the demand for quantity, cost, and efficiency of video stuttering. On the other hand, in some special scenes that require high call fluency, such as the scene of bank loan video face-to-face review, stuttering can be recognized after the call is over. It provides some analytical guarantees for subsequent calls, but it does not have much practical significance for the current call.
为了提高对通话过程中的视频卡顿识别,在本申请实施例中,预先对通话过程中的视频设置第一状态的监控状态,再对实时通话过程中产生的视频进行实时监控,当发现视频中存在人脸需要进行卡顿分析时,将视频的监控状态设置为第二状态,在视频为第一状态时对视频进行帧图像采样,并识别是否有连续多张帧图像不变的情况,若有说明视频开始卡顿,并同时将视频的监控状态标记为第三状态,在视频卡顿开始后,再继续进行帧图像采样和比对,若出现连续多张帧图像之间差异较大,即说明视频恢复变化,此时即可获取到对应的卡顿结束时刻,此时根据记录的卡顿开始时刻和卡顿结束时刻即可识别出视频对应出现卡顿的时间段,最后将视频的监控状态修改为第一状态,从而结束对视频当次的卡顿识别。本申请实施例一方面实现了对卡顿开始和结束的精确识别,另一方面,通过对不同识别阶段和卡顿阶段设置不同的监控状态,从而实现对每次卡顿的有效区分处理,保障了对每一次卡顿识别的准确性,使得本申请实施例可以实现对视频的连续卡顿识别。In order to improve the recognition of video freezes during a call, in this embodiment of the application, the first state monitoring state is set for the video during the call in advance, and then the video generated during the real-time call is monitored in real time. When the video is found When there is a face that needs to be subjected to a freeze analysis, set the monitoring state of the video to the second state, and when the video is in the first state, the frame image of the video is sampled, and it is recognized whether there are continuous multiple frame images unchanged. If it indicates that the video has started to freeze, and the monitoring status of the video is marked as the third state at the same time. After the video has started to freeze, continue frame image sampling and comparison. If there is a large difference between consecutive frames , Which means that the video has resumed and changed. At this time, the corresponding stutter end time can be obtained. At this time, according to the recorded stutter start time and stutter end time, you can identify the time period of the video corresponding to the stutter, and finally change the video The monitoring status of is changed to the first status, thus ending the current stutter recognition of the video. On the one hand, the embodiment of the application realizes the accurate identification of the start and end of the stuttering. On the other hand, by setting different monitoring states for different recognition stages and stuttering stages, the effective distinguishing and processing of each stuttering can be realized and guaranteed. This improves the accuracy of the recognition of each stutter, so that the embodiment of the present application can realize continuous stutter recognition of the video.
同时,本申请实施例中视频卡顿识别方法的执行主体为具有一定视频处理能力的终端设备,其中一定的视频处理能力,是指可以对视频进行帧图像的提取以及进行帧图像比对,终端设备的具体设备种类等此处不予限定,可由技术人员根据实际场景需求选取,包括但不限于进行视频通话的终端设备,如手机电脑等,亦可以是与进行视频通话的终端设备通信连接的第三方设备,例如服务器等。At the same time, the execution subject of the video freeze identification method in the embodiments of this application is a terminal device with a certain video processing capability. A certain video processing capability refers to the ability to extract and compare frames from the video. The terminal The specific equipment types of the equipment are not limited here, and can be selected by the technicians according to the actual needs of the scene, including but not limited to terminal equipment for video calls, such as mobile phones, computers, etc., and can also be connected to the terminal equipment for video calls. Third-party equipment, such as servers, etc.
对本申请实施例详述如下:The embodiments of this application are described in detail as follows:
图1示出了本申请实施例一提供的视频卡顿识别方法的实现流程图,详述如下:Fig. 1 shows a flow chart of the method for identifying video freezes provided in the first embodiment of the present application, and the details are as follows:
S101,当实时通话过程中的视频对应的监控状态为第一状态时,对视频进行人脸检测,并在检测到视频中存在人脸时,将视频对应的监控状态修改为第二状态。S101: When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the second state.
在本申请实施例中,监控状态用于标记视频实时所处的卡顿阶段,其中监控状态包含第一状态、第二状态和第三状态,分别对应于视频的卡顿未知阶段、卡顿开始阶段和卡顿结束阶段,同时本申请实施例会针对每个卡顿阶段的特点或实际需求,设计不同的识别策略,以实现对卡顿开始和结束的准确识别,在本申请实施例中,不对监控状态的具体标记方法进行限定,可由技术人员自行设定,例如可以采用对视频添加不同的标识等方式来进行监控状态标记,如设置第一状态、第二状态和第三状态对应的标识分别为数字1、2和3,此时只需要对视频添加/修改不同的数字标识,即可实现对视频监控状态的灵活设置。本申请实施例会预先将所有实时通话的视频均默认标记为第一状态,以保证通话开始时对视频卡顿状态的和卡顿识别策略的准确选取。In the embodiment of this application, the monitoring state is used to mark the freeze stage of the video in real time, where the monitoring state includes the first state, the second state, and the third state, which respectively correspond to the unknown stage of the video freeze and the start of the freeze. Phase and the end of the stall. At the same time, the embodiment of this application will design different identification strategies according to the characteristics or actual needs of each stall stage to realize the accurate identification of the beginning and end of the stall. In the embodiment of this application, it is wrong. The specific marking method of the monitoring state can be limited and can be set by the technicians. For example, the monitoring state can be marked by adding different marks to the video, such as setting the marks corresponding to the first state, the second state, and the third state. The numbers are 1, 2 and 3. At this time, you only need to add/modify different digital identities to the video to realize the flexible setting of the video surveillance status. In this embodiment of the application, all real-time call videos are marked as the first state by default, so as to ensure the accurate selection of the video freeze state and the freeze identification strategy at the beginning of the call.
在本申请实施例中,会在通话的过程中实时检测通话视频的监控状态,考虑到实际应用中,仅当视频通话双方有用户存在时视频通话才具有意义(当无用户存在时,即使出现卡顿也没必要进行卡顿分析和优化),因此为了提高卡顿分析的有效性,本申请实施例在检测出视频处于第一状态时,会同时开始实时检测视频中是否出现人脸,并仅会在存在人脸时,才将监控状态切换为第二状态,并对卡顿是否开始进行识别。In the embodiment of this application, the monitoring status of the call video will be detected in real time during the call. Considering the actual application, the video call is meaningful only when there are users on both sides of the video call (when there is no user, even if there is no user There is no need for freeze analysis and optimization). Therefore, in order to improve the effectiveness of the freeze analysis, when detecting that the video is in the first state, the embodiment of the application will simultaneously start real-time detection of whether a human face appears in the video, and Only when there is a human face, the monitoring state is switched to the second state, and the stuttering starts to be recognized.
S102,若监控状态为第二状态,以第一频率对视频进行采样得到N张第一帧图像,并对N张第一帧图像进行比对,其中,N为大于1的正整数。S102: If the monitoring state is the second state, sample the video at the first frequency to obtain N first frame images, and compare the N first frame images, where N is a positive integer greater than 1.
当监控状态处于第二状态时,说明这个阶段实时视频有可能会出现卡顿,因此本申请实施例会开始对视频卡顿是否开始进行分析,具体而言:When the monitoring state is in the second state, it means that the real-time video may be stuck at this stage. Therefore, the embodiment of this application will start to analyze whether the video stuck, specifically:
为了识别视频是否开始卡顿,本申请实施例会以第一频率来对视频进行帧图像的采样,以得到待分析的第一帧图像,同时考虑到卡顿并不一定充斥于整个通话的过程而是会在通话过程的某一时刻开始,因此本申请实施例会在每次采样出一张新的第一帧图像时,仅对最新采样出的N张第一帧图像进行图像比对,以实现对卡顿开始的精确识别,例如,假设第一频率为1张/秒,N=5,开始采样的时刻为00:00,在00:00~01:00共一分钟时间内,理论上每秒都可以采集到一张新的第一帧图像,且在01:00 时刻时共采集到了60张第一帧图像,此时,在每次采集到新的一张帧图像时,本申请实施例都会获取最新的5张第一帧图像进行比对,如在01:00时刻采集到第60张第一帧图像时,会将第56张至第60张共5张第一帧图像进行比对,以保证卡顿开始的识别不会受到相隔过远的历史图像的影响,保障识别的准确性。In order to identify whether the video starts to freeze, this embodiment of the application will sample the frame images of the video at the first frequency to obtain the first frame of image to be analyzed. At the same time, considering that the freeze is not necessarily full of the entire call process. It will start at a certain time during the call. Therefore, in the embodiment of the present application, each time a new first frame image is sampled, only the newly sampled N first frame images will be compared with each other to achieve Accurate identification of the start of the freeze, for example, suppose that the first frequency is 1 frame/second, N=5, and the time of sampling is 00:00, in a total of one minute from 00:00 to 01:00, theoretically every A new first frame image can be collected every second, and a total of 60 first frame images are collected at 01:00. At this time, every time a new frame image is collected, this application is implemented Examples will obtain the latest 5 first frame images for comparison. For example, when the 60th first frame image is collected at 01:00, a total of 5 first frame images from the 56th to the 60th frame will be compared. Yes, to ensure that the recognition at the beginning of the freeze will not be affected by historical images that are too far apart to ensure the accuracy of the recognition.
其中,本申请实施例不对第一频率和N的具体值进行限定,可由技术人员根据实际需求来进行设定,例如,可以将第一频率设置为1~5张/秒,设置N为5~12,或者参考到实际应用中判定是否卡顿的时长来设置好第一频率和N的转换关系,并设置好第一频率和N中任意一个具体值,再根据转换关系和已知的具体值来计算出另一个值,例如,假设当视频画面5秒内无变化,则认为视频处于卡顿状态,此时可知转换关系为N=5秒×第一频率,此时若将第一频率设置为1次/秒,根据转换关系即可计算出N=5。其中判定是否卡顿的时长,亦可由技术人员根据实际场景的需求来进行设定,此处不予限定。Among them, the embodiment of the present application does not limit the specific values of the first frequency and N, which can be set by technicians according to actual needs. For example, the first frequency can be set to 1 to 5 frames per second, and N is set to 5 to 5 frames per second. 12. Or refer to the actual application to determine whether it is stuck or not to set the conversion relationship between the first frequency and N, and set any specific value of the first frequency and N, and then according to the conversion relationship and the known specific value To calculate another value, for example, if there is no change in the video picture within 5 seconds, the video is considered to be in a stuck state. At this time, it can be seen that the conversion relationship is N = 5 seconds × the first frequency. At this time, if the first frequency is set It is 1 time/second, and N=5 can be calculated according to the conversion relationship. The length of time for determining whether it is stuck can also be set by the technician according to the needs of the actual scene, and it is not limited here.
同时,本申请实施例也不对第一帧图像之间的图像比对方法进行限定,可由技术人员根据实际需求进行选取或设定,包括但不限于如,计算N张第一帧图像中所有相邻图像之间的图像欧氏距离,并将欧式距离的均值或最大值作为对应图像差异度,或者对N张第一帧图像进行随机图像对组合,对各个图像对内的第一帧图像分别进行交叉相关计算,再基于交叉相关结果来计算对应的差异度,亦可以参考本申请实施例二至六的相关说明。At the same time, the embodiment of the present application does not limit the image comparison method between the first frame images, and can be selected or set by the technician according to actual needs, including but not limited to, calculating all phases in N first frame images. The image Euclidean distance between adjacent images, and the mean or maximum Euclidean distance as the corresponding image difference degree, or random image pair combination of N first frame images, and the first frame image in each image pair. To perform cross-correlation calculation, and then calculate the corresponding difference degree based on the cross-correlation result, you can also refer to the relevant descriptions in Embodiments 2 to 6 of the present application.
S103,若比对结果为N张第一帧图像之间的图像差异度小于第一差异阈值,将各张第一帧图像对应的采样时刻中最早的采样时刻,作为视频的卡顿起始时刻,并将监控状态设置为第三状态。S103: If the comparison result is that the image difference degree between the N first frame images is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each first frame image is used as the start time of the video freeze , And set the monitoring state to the third state.
当图像比对结果为N张第一帧图像之间的图像差异度小于第一差异阈值时,说明视频已经持续一段时间没有或者基本没有画面变动了,此时本申请实施例会直接判定视频开始了卡顿,并同时将此次处理的N张第一帧图像中最早的采样时刻作为开始卡顿的具体时刻,从而实现了对卡顿开始的准确识别,以及对卡顿开始时刻的精确定位。其中,第一差异阈值的具体值可由技术人员根据实际场景的需求来进行选取或设定,例如,可以设置为5%~15%。When the image comparison result is that the image difference between N first frame images is less than the first difference threshold, it means that the video has lasted for a period of time and there is no or almost no picture change. At this time, the embodiment of the application will directly determine that the video has started. Stalling, and at the same time, the earliest sampling time of the N first frame images processed this time is taken as the specific moment of the start of the stutter, so as to realize the accurate recognition of the start of the stutter and the precise positioning of the start of the stutter. Wherein, the specific value of the first difference threshold can be selected or set by the technician according to the requirements of the actual scene, for example, it can be set to 5% to 15%.
考虑到卡顿一般都会持续一段时间,在这段时间的若仍以S102和S103的方式来继续处理,会出现大量的卡顿开始时刻,从而导致卡顿识别出现异常,因此本申请实施例在确定出卡顿开始时刻的同时,还会将视频的监控状态修改为第三状态,由于监控状态不再是第二状态,从而使得S102和S103对视频进行卡顿开始的识别操作得以终止,同时使得后续对对卡顿终止的识别操作得以开启,进而保障了对视频卡顿起始和终止的正常识别。Taking into account that stuttering generally lasts for a period of time, if S102 and S103 are used to continue processing during this period, a large number of stuttering start moments will occur, resulting in abnormal stutter recognition. Therefore, the embodiment of the application is When determining the start time of the freeze, the monitoring state of the video is also modified to the third state. Since the monitoring state is no longer the second state, the recognition operation of the start of the video freeze in S102 and S103 is terminated, and at the same time This enables the subsequent recognition operation for the end of the freeze to be turned on, thereby ensuring the normal recognition of the start and end of the video freeze.
S104,若监控状态为第三状态,以第二频率对视频进行采样得到M张第二帧图像,并对M张第二帧图像进行比对,其中,M为大于1的正整数。S104: If the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images, where M is a positive integer greater than 1.
当监控状态处于第三状态时,说明这个阶段实时视频正处于卡顿之中,同时视频随时可能会恢复正常,即卡顿随时可能会终止,因此本申请实施例会开始对视频卡顿是否终止进行分析,其中,S104采样和图像比对的原理与S102基本相同,具体可参考对S102的说明,此处不予赘述,但应当说明地,第二频率与第一频率既可以相同也可以不同,同时M与N既可以相同也可以不同,这些参数的具体值均可由技术人员根据实际需求来进行选取设定,此处均不予限定。When the monitoring state is in the third state, it means that the real-time video is in a freeze at this stage, and the video may return to normal at any time, that is, the freeze may be terminated at any time. Therefore, the embodiment of this application will start to determine whether the video freeze is terminated. Analysis, the principle of S104 sampling and image comparison is basically the same as that of S102. For details, please refer to the description of S102, which will not be repeated here, but it should be noted that the second frequency and the first frequency may be the same or different. At the same time, M and N can be the same or different, and the specific values of these parameters can be selected and set by the technicians according to actual needs, and they are not limited here.
作为本申请的一个具体实施例,考虑到实际情况中有时候即使没有出现卡顿,也可能会出现连续多帧图像相似度较高的情况,例如视频双方都在思考某一个问题时,一小段时间内可能会出现双方基本不动的情况,因此为了提高对卡顿开始识别的精确性,本申请实施例中,会设置第一频率>第二频率,且M>N,以保证在进行视频卡顿开始识别时的采样数据量。As a specific embodiment of the present application, considering that in actual situations, even if there is no freeze, there may be situations where the similarity of consecutive multiple frames of images is high. For example, when both video parties are thinking about a certain problem, a short paragraph It may happen that both parties are basically not moving within a period of time. Therefore, in order to improve the accuracy of the recognition of the freeze start, in the embodiment of this application, the first frequency>the second frequency, and M>N are set to ensure that the video is in progress. The amount of sampled data when the freeze starts to be recognized.
S105,若比对结果为M张第二帧图像之间的图像差异度大于或等于第一差异阈值,将各张第二帧图像对应的采样时刻中最晚的采样时刻,作为视频的卡顿终止时刻,将监控状态设置为第一状态,并基于卡顿起始时刻和卡顿终止时刻识别视频的卡顿时间段。S105: If the comparison result is that the image difference degree between the M second frame images is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each second frame image is used as the video freeze At the end time, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
当图像比对结果为M张第二帧图像之间的差异度较大时,说明视频画面已经恢复了正常,此时本申请实施例会直接判定视频卡顿结束,并同时将此次处理的M张第二帧图像对应的所有采样时刻中最晚的采样时刻作为卡顿结束的具体时刻,进而实现了对此次卡顿起始时刻和终止时刻的精准定位,卡顿起始时刻至卡顿终止时刻之间的时间段即为此次视频的卡顿时间段。与此同时,本申请实施例还会将视频的监控状态还原为第一状态,此时本申请实施例又会返回执行S101的操作,进行下一次视频卡顿的 识别,因此,本申请实施例一会在视频通话过程中持续循环,直至通话结束才会终止,以实现对整个通话过程中所有卡顿情况的精准识别。When the image comparison result is that the difference between the M images of the second frame is large, it means that the video picture has returned to normal. At this time, the embodiment of this application will directly determine that the video freeze is over, and at the same time, the M The latest sampling time among all the sampling moments corresponding to the second frame of image is used as the specific time when the freeze ends, and the precise positioning of the start and end time of the freeze is realized, from the start time of the freeze to the end of the freeze The time period between the end moments is the freeze time period of this video. At the same time, the embodiment of the present application will also restore the monitoring state of the video to the first state. At this time, the embodiment of the present application will return to perform the operation of S101 to identify the next video freeze. Therefore, the embodiment of the present application For a while, it will continue to loop during the video call, and it will not be terminated until the end of the call, so as to achieve accurate identification of all jams during the entire call.
作为本申请的一个实施例,在获取到此次卡顿时间段之后,还可以将卡顿时间段发送给第三方设备,以进行后续卡顿分析和改进等操作,例如可以发送给特定的服务器,由服务器进行卡顿分析和优化等操作。As an embodiment of the present application, after obtaining the stall time period, the stall time period can also be sent to a third-party device for subsequent stall analysis and improvement operations, for example, can be sent to a specific server , The server performs operations such as jam analysis and optimization.
为了提高对通话过程中的视频卡顿识别,在本申请实施例中,预先对通话过程中的视频设置第一状态的监控状态,再对实时通话过程中产生的视频进行实时监控,当发现视频中存在人脸需要进行卡顿分析时,将视频的监控状态设置为第二状态,在视频为第一状态时对视频进行帧图像采样,并识别是否有连续多张帧图像不变的情况,若有说明视频开始卡顿,并同时将视频的监控状态标记为第三状态,在视频卡顿开始后,再继续进行帧图像采样和比对,若出现连续多张帧图像之间差异较大,即说明视频恢复变化,此时即可获取到对应的卡顿结束时刻,此时根据记录的卡顿开始时刻和卡顿结束时刻即可识别出视频对应出现卡顿的时间段,最后将视频的监控状态修改为第一状态,从而结束对视频当次的卡顿识别。In order to improve the recognition of video freezes during a call, in this embodiment of the application, the first state monitoring state is set for the video during the call in advance, and then the video generated during the real-time call is monitored in real time. When the video is found When there is a face that needs to be subjected to a freeze analysis, set the monitoring state of the video to the second state, and when the video is in the first state, the frame image of the video is sampled, and it is recognized whether there are continuous multiple frame images unchanged. If it indicates that the video has started to freeze, and the monitoring status of the video is marked as the third state at the same time. After the video has started to freeze, continue frame image sampling and comparison. If there is a large difference between consecutive frames , Which means that the video has resumed and changed. At this time, the corresponding stutter end time can be obtained. At this time, according to the recorded stutter start time and stutter end time, you can identify the time period of the video corresponding to the stutter, and finally change the video The monitoring status of is changed to the first status, thus ending the current stutter recognition of the video.
一方面,本申请实施例实现了对每次卡顿开始和结束时刻的实时精确识别,并能在卡顿结束的同时,确定出当次卡顿发生的时间段,对于通话流畅性需求较高的场景而言,在本申请实施例的基础上,只需结合一个可以进行实时卡顿原因分析和优化的方案,即可实现对视频通话的实时卡顿优化,保障通话的实时流畅性,相对仅能在视频通话结束后才能查找出卡顿的时间段而言,可以极大地提高视频通话的质量。On the one hand, the embodiments of the present application realize the real-time and accurate identification of the start and end moments of each freeze, and can determine the time period during which the current freeze occurs when the freeze ends, which requires high call fluency. In terms of scenarios, on the basis of the embodiments of this application, only a solution that can analyze and optimize the cause of stuttering in real time can realize the real-time stuttering optimization of video calls, and ensure the real-time fluency of the call. As far as the time period of freezing can be found only after the video call is over, the quality of the video call can be greatly improved.
另一方面,由于一次视频通话中可能会存在多次卡顿,且每次卡顿的时长无法预测,通过对不同卡顿阶段设置不同的监控状态,以实现对每次卡顿的起始和终止的有效区分处理,使得本申请实施例在每次对卡顿识别完成之后都可以快速进入下一次的卡顿识别,两次识别之间不会有相互干扰,保障了对每一次卡顿识别的准确性,使得本申请实施例可以实现对视频的连续卡顿识别。On the other hand, since there may be multiple stalls in a video call, and the duration of each stall cannot be predicted, different monitoring states are set for different stall stages to achieve the start and stop of each stall. The effective distinction processing of termination allows the embodiment of the present application to quickly enter the next freeze recognition after each stutter recognition is completed, and there will be no mutual interference between the two recognitions, which ensures the recognition of each stutter The accuracy of this allows the embodiment of the present application to realize continuous stutter recognition of the video.
作为本申请实施例一中进行图像比对的一种具体实现方式,考虑到实际应用中视频通话的重点往往在于用户的人脸,而人脸的核心又在于人脸的各个器官活动情况,因此本申请实施例在进行图像比对时不会对整张帧图像进行比对处理,而是会仅对其中的人脸器官进行分析比对,以提高比对的效率,如图2所示,本申请实施例二中进行图像比对的步骤,包括:As a specific implementation of image comparison in the first embodiment of the present application, considering that the focus of video calls in practical applications is often on the user's face, and the core of the face is the activity of the various organs of the face, so The embodiment of the application does not perform comparison processing on the entire frame image when performing image comparison, but only analyzes and compares the facial organs therein, so as to improve the efficiency of comparison, as shown in FIG. 2, The steps of image comparison in the second embodiment of the present application include:
S201,针对每张第一帧图像进行人脸器官坐标分析,得到N个第一人脸器官坐标集。S201: Perform face organ coordinate analysis for each first frame image to obtain N first face organ coordinate sets.
在本申请实施例中会对各张第一帧图像进行人脸器官的识别,并对各个人脸器官在第一帧图像中的坐标进行提取,从而得到每张第一帧图像分别对应的人脸器官坐标集,其中,具体识别的人脸器官种类数量可由技术人员自行设定,包括但不限于嘴部、眼部、眉部和鼻子中的任意一种或多种,同时,具体使用的人脸器官识别方法此处亦不予限定,可由技术人员根据实际需求选取或设计,包括但不限于如基于几何特征、基于神经网络模型和基于弹性模型等进行人脸器官识别,亦可以参考本申请实施例三至六。In the embodiment of the present application, the facial organs of each first frame image are recognized, and the coordinates of each facial organ in the first frame image are extracted, so as to obtain the person corresponding to each first frame image. Face organ coordinate set, where the specific number of recognized face organ types can be set by the technician, including but not limited to any one or more of the mouth, eyes, eyebrows and nose. At the same time, the specific use The face organ recognition method is also not limited here, and can be selected or designed by the technicians according to actual needs, including but not limited to face organ recognition based on geometric features, neural network models, and elastic models. You can also refer to this Application examples three to six.
S202,利用N个第一人脸器官坐标集对N个第一帧图像进行比对。S202: Use the N first face organ coordinate sets to compare the N first frame images.
在得到各张第一帧图像分别对应的第一人脸器官坐标集之后,对这些坐标集进行数据比对,得到N个第一人脸器官坐标集之间的差异度,并将该差异度作为N个第一帧图像之间的图像差异度,从而实现本申请实施例一中的图像比对,其中,具体的坐标集数据比对方法此处不予限定,可由技术人员自行选取或设定,包括但不限于如计算坐标集之间的欧氏距离,并将基于欧氏距离计算对应的差异度。After obtaining the coordinate sets of the first face organs corresponding to each of the first frame images, the coordinate sets are compared to obtain the difference degree between the N first face organ coordinate sets, and the difference degree As the image difference degree between the N first frame images, the image comparison in Embodiment 1 of the present application is realized. The specific coordinate set data comparison method is not limited here, and can be selected or set by the technician. Setting, including but not limited to calculating the Euclidean distance between the coordinate sets, and calculating the corresponding difference degree based on the Euclidean distance.
作为本申请实施例二中进行人脸器官坐标集提取的一种具体实现方式,在本申请实施例中,将每种人脸器官作为一个独立的分析对象进行分析和坐标提取,并得到各个人脸器官对应的坐标组,即第一人脸器官坐标集为多个坐标组的集合,如图3所示,本申请实施例三中,对单张待分析的第一帧图像进行人脸器官分析的操作具体包括:As a specific implementation method for extracting the coordinate set of face organs in the second embodiment of this application, in this embodiment of the application, each face organ is taken as an independent analysis object for analysis and coordinate extraction, and each person is obtained The coordinate group corresponding to the face organ, that is, the first face organ coordinate set is a collection of multiple coordinate groups, as shown in FIG. The analysis operations include:
S301,将待分析的第一帧图像作为目标图像,对目标图像进行人脸轮廓绘制,得到对应的人脸轮廓图形。S301: Use the first frame of image to be analyzed as a target image, and perform face contour drawing on the target image to obtain a corresponding face contour graphic.
考虑到实际情况中每种人脸器官在人脸中的分布位置是相对固定的,例如嘴部在人脸中的位置大致为人脸长的1/4宽的1/2处,鼻子则为人脸长的1/2宽的1/2处,因此为了提高对人脸器官的定位效率, 本申请实施例会预先设置好各种人脸器官在人脸中的相对位置,并会对目标图像进行人脸识别即定位出目标图像中的人脸,再对人脸进行轮廓绘制,从而得到对应的人脸轮廓图形,以进行后续的人脸器官粗定位。Considering that the distribution position of each face organ in the face is relatively fixed in the actual situation, for example, the position of the mouth in the face is roughly 1/4 of the length of the face and 1/2 of the width of the face, and the nose is the face of the face. It is 1/2 of the length and 1/2 of the width. Therefore, in order to improve the positioning efficiency of the facial organs, the embodiment of the present application will pre-set the relative positions of various facial organs in the face, and will perform the calculation on the target image. Face recognition is to locate the human face in the target image, and then draw the outline of the human face, so as to obtain the corresponding human face contour graphics for the subsequent rough positioning of the human face organs.
S302,获取各个人脸器官在人脸轮廓图形中的第一相对位置。S302: Obtain a first relative position of each face organ in the face contour graph.
在绘制出人脸轮廓图形之后,根据预先存储好的相对位置,即可确定出各个人脸器官在人脸轮廓图形中的第一相对位置,实现对人脸器官的粗定位。After drawing the face contour graphic, according to the pre-stored relative positions, the first relative position of each face organ in the face contour graphic can be determined, so as to realize the coarse positioning of the face organ.
作为本申请实施例三中获取人脸器官在人脸轮廓图形中第一相对位置的一种具体实现方式,虽然人脸器官在人脸中大致的位置是可以获知的,但考虑到不同的用户其人脸可能会存在一定的差异,导致不同用户实际脸型和人脸器官在人脸中的位置可能会存在一些差异,为了提高人脸器官粗定位的精确度,本申请实施例四会预先对实际生活中存在的一些脸型,以及各种脸型下人脸器官分布位置的情况进行分析,根据分析的情况绘制多个不同脸型的样本轮廓图形,并确定出每个样本轮廓图形对应的人脸器官的相对位置数据,再根据这些样本轮廓图形来进行实际人脸器官的相对位置识别,如图4所示,本申请实施例四获取第一相对位置的步骤,具体包括:As a specific implementation for obtaining the first relative position of the face organ in the face contour graph in the third embodiment of the present application, although the approximate position of the face organ in the face can be known, different users are considered There may be certain differences in their faces, resulting in differences in the actual face shapes and positions of face organs in the faces of different users. In order to improve the accuracy of rough positioning of face organs, the fourth embodiment of the present application will pre-correct Some face shapes in real life and the distribution of face organs under various face shapes are analyzed, and multiple sample contour graphics of different face shapes are drawn according to the analysis, and the facial organ corresponding to each sample contour graphic is determined According to the relative position data of the samples, the relative position recognition of the actual facial organs is performed according to these sample contour graphics. As shown in FIG. 4, the step of obtaining the first relative position in the fourth embodiment of the present application specifically includes:
S401,将人脸轮廓图形与人脸轮廓库中的多个样本轮廓图像进行图形匹配。S401: Perform graphic matching of a face contour graphic with a plurality of sample contour images in a face contour library.
在本申请实施例中,会预先将绘制好的多个样本轮廓图形储存在一个人脸轮廓库之中,在绘制出人脸轮廓图形之后再对该人脸轮廓库进行图形匹配,以筛选出合适的样本轮廓图形。In the embodiment of the present application, a plurality of drawn sample contour graphics will be stored in a face contour library in advance, and after the face contour graphics are drawn, the face contour library will be matched with graphics to filter out Appropriate sample contour graphics.
S402,若匹配成功,获取匹配成功的样本轮廓图形对应的相对位置集,相对位置集中包含各个人脸器官在样本轮廓图形中的第二相对位置。将相对位置集中的各个人脸器官对应的第二相对位置,作为各个人脸器官在人脸轮廓图形中的第一相对位置。S402: If the matching is successful, obtain a relative position set corresponding to the sample contour graphic that is successfully matched, and the relative position set includes the second relative position of each human face organ in the sample contour graphic. The second relative position corresponding to each face organ in the relative position set is taken as the first relative position of each face organ in the face contour graph.
若存在样本轮廓图形与人脸轮廓图形匹配成功,说明绘制的人脸轮廓图形与该样本轮廓图形脸型相近,此时本申请实施例会直接读取该样本轮廓图形对应人脸器官的相对位置数据,并直接作为本申请实施例三中人脸轮廓图形的人脸器官相对位置。If there is a successful match between the sample contour graphic and the facial contour graphic, it means that the drawn facial contour graphic is similar to the facial contour of the sample contour graphic. In this case, the embodiment of the application will directly read the relative position data of the facial organ corresponding to the sample contour graphic. It is directly used as the relative position of the face organs of the face contour figure in the third embodiment of the present application.
作为本申请的一个实施例,在本申请实施例四的基础上,考虑到实际情况中预存的样本轮廓图形数量一般较为有限,有时候可能会预先图形匹配失败的情况,为了在匹配失败时也能获取到人脸器官在目标图像中准确的相对位置,在本申请实施例中,会预先设置一个默认的相对位置集,其中储存好各个人脸器官对应的相对位置数据,在匹配失败时则读取该默认的相对位置集即可。As an embodiment of the present application, on the basis of the fourth embodiment of the present application, considering that the number of pre-stored sample contour graphics in actual situations is generally limited, sometimes the pre-pattern matching fails, in order to prevent the matching failure. The accurate relative position of the face organs in the target image can be obtained. In the embodiment of the present application, a default relative position set is preset, in which the relative position data corresponding to each face organ is stored, and when the matching fails, Just read the default relative position set.
S303,利用第一相对位置对目标图像进行人脸器官的定位,得到各个人脸器官在目标图像中的器官中心坐标,并基于多个器官中心坐标,识别出各个人脸器官在目标图像中分别对应的第一图像区域。S303. Use the first relative position to locate the face organs in the target image, obtain the organ center coordinates of each face organ in the target image, and identify that each face organ is in the target image based on the multiple organ center coordinates. The corresponding first image area.
在获取到人脸器官的相对位置之后,本申请实施例会根据人脸在目标图像中的实际位置,以及人脸器官在人脸中的相对位置,来确定出各个人脸在目标图像中的坐标位置,并将该位置来作为对应的器官中心坐标,从而实现在目标图像中的粗定位。After obtaining the relative position of the face organs, the embodiment of the present application will determine the coordinates of each face in the target image according to the actual position of the face in the target image and the relative position of the face organ in the face. Position, and use this position as the corresponding organ center coordinate to achieve rough positioning in the target image.
由于人脸在不同拍摄环境下,得到的人脸图像是会有一定的差异的,甚至不同的表情下人脸器官的位置也会有一点点变动,例如撇嘴的时候,嘴部的坐标就会有一定的变化,因此粗定位的坐标无法表征人脸器官,甚至不一定会处于人脸器官的区域图像之中。因此在获取到器官中心坐标之后,本申请实施例会以该器官中心坐标为起点,对周围的图像区域进行器官识别,以实现对各个人脸器官的精准定位,例如以嘴部对应的器官中心坐标为起点,对该坐标周围的一片图像区域进行嘴部的识别,从而确定出实际嘴部对应的第一图像区域,进而实现对嘴部的精准定位。Because the face is in different shooting environments, the obtained face images will be different, and even the position of the face organs under different expressions will change a little. For example, when the mouth is curled, the coordinates of the mouth will be changed. There are certain changes, so the coarse positioning coordinates cannot represent the facial organs, and may not even be in the regional image of the facial organs. Therefore, after the organ center coordinates are obtained, the embodiment of the present application will use the organ center coordinates as the starting point to perform organ recognition on the surrounding image area to achieve precise positioning of various facial organs, for example, using the organ center coordinates corresponding to the mouth As a starting point, mouth recognition is performed on a piece of image area around the coordinates, so as to determine the first image area corresponding to the actual mouth, so as to achieve precise positioning of the mouth.
作为本申请实施例四中对人脸器官进行精准定位的一种具体实现方式,考虑到实际情况中,即使已经知道了某个人脸器官的器官中心坐标,但若没有具体的检索范围的话,仍需要较多次的尝试才能确定出合适的周围检索范围,并识别出其中的人脸器官,其中需要耗费较多的计算机资源且效率较低,因此,为了提高人脸器官的检索效率,实现人脸器官的精准快速定位,如图5所示,本申请实施例五对人脸器官精准定位的步骤,具体包括:As a specific implementation of accurate positioning of face organs in the fourth embodiment of the present application, considering the actual situation, even if the organ center coordinates of a certain face organ are already known, if there is no specific retrieval range, it is still It takes more attempts to determine the appropriate surrounding search range and identify the face organs in it, which requires more computer resources and is less efficient. Therefore, in order to improve the search efficiency of the face organs, the human face Accurate and rapid positioning of facial organs, as shown in FIG. 5, the steps of precise positioning of facial organs in Embodiment 5 of the present application specifically include:
S501,获取各个人脸器官对应的检索矩形尺寸,并根据检索矩形尺寸和器官中心坐标,识别各个人脸器官在目标图像中分别对应的第二图像区域,第二图像区域的形状为矩形。S501: Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the organ center coordinates, and the shape of the second image area is a rectangle.
考虑到实际情况中,人脸器官在人脸中大致的比例大小是相对固定的,例如,鼻子一般高度约为人脸长度的1/3宽度约为人脸宽度的1/5,在已知人脸器官中心坐标的情况下,再结合人脸的尺寸数据以 及人脸器官与人脸尺寸的比例数据,即可以快速定位出人脸器官在人脸中大致所处的图像区域。基于上述原理,在本申请实施例中,会预先设置好各个人脸器官与人脸尺寸的比例数据,再根据目标图像中实际人脸的尺寸和设置的比例数据,来确定出各个人脸器官对应的检索矩阵尺寸,如仍以上述鼻子的实例为例,假设目标图像中人脸长为10cm,宽为7cm,根据高度为人脸长度的1/3宽度为人脸宽度的1/5计算,即可确定出对应的检索矩阵尺寸为3.33cm×1.4cm。Considering the actual situation, the approximate proportion of face organs in the face is relatively fixed. For example, the height of the nose is generally about 1/3 of the length of the face, and the width is about 1/5 of the width of the face. In the case of the center coordinates, combined with the size data of the face and the ratio data of the face organ to the face size, the image area where the face organ is roughly located in the face can be quickly located. Based on the above principle, in the embodiment of the application, the ratio data of each face organ to the face size will be set in advance, and then each face organ will be determined according to the actual face size in the target image and the set ratio data. Corresponding retrieval matrix size, such as still taking the example of the above nose as an example, assuming that the face length in the target image is 10cm and the width is 7cm, and the width is calculated based on the height being 1/3 of the face length and the width of the face being 1/5, that is It can be determined that the size of the corresponding retrieval matrix is 3.33cm×1.4cm.
在确定出检索矩阵尺寸后,以人脸器官的器官中心坐标为矩阵中心点,在目标图像中确定出长宽为检索矩阵尺寸的第二图像区域,即可得到各个人脸器官对应所处的大致区域。After determining the size of the retrieval matrix, take the organ center coordinates of the face organs as the center point of the matrix, and determine the second image area whose length and width are the size of the retrieval matrix in the target image to obtain the corresponding position of each face organ. The approximate area.
S502,对各个第二图像区域分别进行人脸器官检测,并根据检测结果从第二图像区域中识别出人脸器官对应的第一图像区域。S502: Perform face organ detection on each second image area, and identify the first image area corresponding to the face organ from the second image area according to the detection result.
在确定出待检索的第二图像区域之后,再在第二图像区域内进行对应的人脸器官检索,即可实现对人脸器官的精准定位,例如上述的鼻子实例中,在确定出鼻子对应的尺寸为3.33cm×1.4cm的第二图像区域之后,本申请实施例会对该第二图像区域进行鼻子的识别检索,从而确定出其中包含的鼻子,以及鼻子实际在第二图像区域中的第一图像区域。After the second image area to be retrieved is determined, the corresponding facial organ search in the second image area can be performed to achieve precise positioning of the facial organs. For example, in the nose example above, after determining the corresponding nose After the second image area with a size of 3.33cm×1.4cm, the embodiment of the present application will perform a nose recognition search on the second image area to determine the nose contained therein and the actual first image area of the nose in the second image area. An image area.
S304,对各个第一图像区域分别进行坐标提取,得到各个人脸器官在目标图像中分别对应的坐标组。S304: Perform coordinate extraction on each first image area to obtain coordinate groups corresponding to each human face organ in the target image.
在准确定位出各个人脸器官所处的第一图像区域后,本申请实施例会对第一图像区域进行坐标提取,由于第一图像区域内包含较多的图像信息,因此在进行坐标提取后每个第一图像区域都会对应得到多个坐标数据,在本申请实施例中,会将单个第一图像区域对应的所有坐标数据存储在一个坐标组之中,从而得到各个人脸器官分布对应的坐标组,进而得到本申请实施例二中所需的目标图像的第一人脸器官坐标集。After accurately locating the first image area where each face organ is located, the embodiment of the present application will extract the coordinates of the first image area. Since the first image area contains more image information, every time after the coordinate extraction is performed Each first image area will correspond to multiple coordinate data. In the embodiment of the present application, all the coordinate data corresponding to a single first image area will be stored in a coordinate group, so as to obtain the coordinates corresponding to the distribution of various facial organs. Group, and then obtain the first face organ coordinate set of the target image required in the second embodiment of the present application.
在本申请实施例三中,通过对人脸器官先进行粗定位,再基于粗定位快速进行周围图像区域人脸器官检索,即可实现对人脸器官的快速精准定位识别,相对直接在人脸中进行各个器官的识别而言,本申请实施例识别的效率更高,且运算量更小。In the third embodiment of the present application, by first performing coarse positioning of the facial organs, and then quickly performing the facial organ retrieval in the surrounding image area based on the coarse positioning, the rapid and accurate positioning and recognition of the facial organs can be realized, relatively directly on the face. In terms of the identification of various organs in the present application, the identification efficiency of the embodiment of the present application is higher, and the amount of calculation is smaller.
作为本申请实施例五中对第一图像区域进行坐标提取的一种具体实施方式,考虑到实际视频通话过程中,不同人脸器官使用的频率不同导致在视频过程中变化的频率会存在较大的差异,例如一般情况下,视频过程中人会说较多的话,此时嘴部的使用频率极高,对应视频中嘴部变化的频率也是极高的,而对于鼻子而言,若通话过程中用户头部保持不动,那视频中鼻子基本的状态和对应的坐标情况基本不会有变化,因此不同人脸器官对第一帧图像比对以判断视频是否卡顿的参考价值差异较大,为了提高坐标提取的有效性,以保证后续图像比对结果的准确可靠,如图6所示,本申请实施例六中对第一图像区域坐标提取的步骤,具体包括:As a specific implementation for extracting the coordinates of the first image area in the fifth embodiment of the present application, considering that in the actual video call process, the frequency of use of different facial organs is different, resulting in a greater frequency of change during the video process. For example, under normal circumstances, people will say a lot in the video process. At this time, the frequency of mouth usage is extremely high, and the frequency of mouth changes in the corresponding video is also extremely high. For the nose, if the call process is If the user’s head stays still in the video, the basic state of the nose and the corresponding coordinates in the video will basically not change. Therefore, the reference value of comparing the first frame of image with different facial organs to determine whether the video is stuck is quite different. In order to improve the effectiveness of coordinate extraction and ensure the accuracy and reliability of subsequent image comparison results, as shown in FIG. 6, the step of extracting the coordinates of the first image region in the sixth embodiment of the present application specifically includes:
S601,获取各个人脸器官分别对应的采样点数量,其中,嘴部、眼部、眉部和鼻子对应的采样点数量依次减小。S601: Acquire the number of sampling points corresponding to each face organ, where the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced.
在本申请实施例中,根据通话过程中各个人脸器官使用的频率,对嘴部、眼部、眉部和鼻子进行了排序,并根据频率的高低分别各个人脸器官设置了对应的采样点数量,其中频率越高采样点数量越大,以便后续实际进行特征点采样和坐标提取时,对提取的坐标数量进行区分。其中,具体的采样点数量值可由技术人员根据实际需求设定,此处不予限定。In the embodiment of this application, the mouth, eyes, eyebrows, and nose are sorted according to the frequency of use of each face organ during a call, and corresponding sampling points are set for each face organ according to the frequency. The higher the frequency, the greater the number of sampling points, so that the subsequent actual sampling of feature points and coordinate extraction can distinguish the number of extracted coordinates. Among them, the specific number of sampling points can be set by technicians according to actual needs, and is not limited here.
S602,对各个人脸器官对应的第一图像区域进行特征点采样,并获取各个特征点的坐标,得到各个人脸器官在目标图像中分别对应的坐标组,特征点采样的采样点数量为人脸器官各自对应的采样点数量。S602. Perform feature point sampling on the first image region corresponding to each face organ, and obtain the coordinates of each feature point to obtain the coordinate group corresponding to each face organ in the target image, and the number of sampling points for the feature point sampling is the face The number of sampling points corresponding to each organ.
在确定出各个第一图像区域对应的采样点数量之后,开始对第一图像区域进行特征点采样,对于每个第一图像区域而言,仅会采样对应的采样点数量的特征点,例如,假设第一图像区域A对应的采样点数量为20,此时仅会从第一图像区域A中采样出20个特征点。在采样出所需数量的特征点之后,再获取这些特征点的坐标数据,从而得到各个第一图像区域对应的坐标组,即得到各个人脸器官在目标图像中的坐标组。其中,具体的特征点采样方法此处不予限定,可由技术人员根据实际需求设定,包括但不限于如SIFT算法和Susan算法等,其中为了精确控制提取出的采样点数量,若使用的特征点提取算法本身无法设定特征点采样的数量,可以在正常提取特征点完成之后,对特征点进行删减或增选操作, 直至满足对应的采样点数量。After the number of sampling points corresponding to each first image area is determined, the feature point sampling of the first image area is started. For each first image area, only the feature points corresponding to the number of sampling points will be sampled, for example, Assuming that the number of sampling points corresponding to the first image area A is 20, only 20 feature points will be sampled from the first image area A at this time. After sampling the required number of feature points, the coordinate data of these feature points are obtained, thereby obtaining the coordinate group corresponding to each first image area, that is, obtaining the coordinate group of each human face organ in the target image. Among them, the specific feature point sampling method is not limited here, and can be set by the technician according to actual needs, including but not limited to such as SIFT algorithm and Susan algorithm, etc., in order to accurately control the number of extracted sampling points, if the feature is used The point extraction algorithm itself cannot set the number of feature point samples. After the normal feature point extraction is completed, the feature points can be deleted or added until the corresponding number of sampling points is met.
应当理解地,上述本申请实施例二至六虽然是对第一帧图像比对的细化或优化方案,但在本申请实施例中同样可以适用于对第二帧图像的比对操作,即上述本申请实施例二至六同样可以结合至本申请实施例一的S104之中进行应用,此时只需要将处理的对象由第一帧图像替换为第二帧图像即可,具体可参考上述说明,此处不予赘述。It should be understood that although the above-mentioned Embodiments 2 to 6 of the present application are refinement or optimization solutions for the comparison of the first frame of image, in the embodiment of the present application, it can also be applied to the comparison operation of the second frame of image, namely The above-mentioned Embodiments 2 to 6 of this application can also be combined with S104 of Embodiment 1 of this application for application. In this case, it is only necessary to replace the processed object from the first frame of image to the second frame of image. For details, please refer to the above Note, I won’t repeat it here.
对应于上文实施例的方法,图7示出了本申请实施例提供的视频卡顿识别装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。图7示例的视频卡顿识别装置可以是前述实施例一提供的视频卡顿识别方法的执行主体。Corresponding to the method of the above embodiment, FIG. 7 shows a structural block diagram of a video freeze identification device provided in an embodiment of the present application. For ease of description, only the parts related to the embodiment of the present application are shown. The video freeze identification device illustrated in FIG. 7 may be the main body of execution of the video freeze identification method provided in the first embodiment.
参照图7,该视频卡顿识别装置包括:Referring to Figure 7, the device for identifying video freezes includes:
人脸检测模块71,用于在实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态。The face detection module 71 is configured to perform face detection on the video when the monitoring state corresponding to the video during a real-time call is the first state, and when a face is detected in the video, the The monitoring state corresponding to the video is modified to the second state.
第一图像比对模块72,用于若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数。The first image comparison module 72 is configured to, if the monitoring state is the second state, sample the video at a first frequency to obtain N first frame images, and compare N first frame images Yes, where N is a positive integer greater than 1.
卡顿开始识别模块73,用于若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态。The stutter start recognition module 73 is configured to, if the comparison result is that the image difference degree between the N first frame images is less than the first difference threshold, the earliest one among the sampling moments corresponding to each of the first frame images The sampling time is used as the start time of the video freeze, and the monitoring state is set to the third state.
第二图像比对模块74,用于若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数。The second image comparison module 74 is configured to, if the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images Yes, where M is a positive integer greater than 1.
卡顿结束识别模块75,用于若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。The stall end recognition module 75 is configured to, if the comparison result is that the image difference degree between the M images of the second frame is greater than or equal to the first difference threshold, sample the corresponding samples of each of the second frame images The latest sampling time among the time, as the stop time of the video, the monitoring state is set to the first state, and the video is identified based on the start time of the stop time and the stop time of the stop time Caton time period.
进一步地,第一图像比对模块72,包括:Further, the first image comparison module 72 includes:
坐标分析模块,用于针对每张所述第一帧图像进行人脸器官坐标分析,得到N个第一人脸器官坐标集。The coordinate analysis module is used to analyze the coordinates of the face organs for each of the first frame images to obtain N first face organ coordinate sets.
坐标比对摸,用于利用N个所述第一人脸器官坐标集对N个所述第一帧图像进行比对。The coordinate comparison touch is used to compare the N first frame images by using the N first face organ coordinate sets.
进一步地,所述第一人脸器官坐标集为多个坐标组的集合,每个坐标组对应一种人脸器官,且每个坐标组中包含对应的人脸器官的多个坐标,坐标分析模块,包括:Further, the first face organ coordinate set is a set of multiple coordinate groups, each coordinate group corresponds to a type of face organ, and each coordinate group contains multiple coordinates of the corresponding face organ, and the coordinate analysis Modules, including:
轮廓绘制模块,用于将待分析的所述第一帧图像作为目标图像,对所述目标图像进行人脸轮廓绘制,得到对应的人脸轮廓图形。The contour drawing module is used to take the first frame image to be analyzed as a target image, and draw a face contour on the target image to obtain a corresponding face contour graphic.
位置获取模块,用于获取各个人脸器官在所述人脸轮廓图形中的第一相对位置。The position obtaining module is used to obtain the first relative position of each face organ in the face contour graph.
器官图像查找模块,用于利用所述第一相对位置对所述目标图像进行人脸器官的定位,得到各个人脸器官在所述目标图像中的器官中心坐标,并基于多个所述器官中心坐标,识别出各个人脸器官在所述目标图像中分别对应的第一图像区域。The organ image search module is used for locating the face organs of the target image by using the first relative position to obtain the organ center coordinates of each face organ in the target image, and based on a plurality of the organ centers The coordinates are used to identify the first image area corresponding to each face organ in the target image.
坐标提取模块,用于对各个所述第一图像区域分别进行坐标提取,得到各个人脸器官在所述目标图像中分别对应的所述坐标组。The coordinate extraction module is configured to extract the coordinates of each of the first image regions to obtain the coordinate groups corresponding to each of the facial organs in the target image.
进一步地,位置获取模块,包括:Further, the location acquisition module includes:
将所述人脸轮廓图形与人脸轮廓库中的多个样本轮廓图像进行图形匹配。Graphically matching the face contour graphics with a plurality of sample contour images in the face contour library.
若匹配成功,获取匹配成功的所述样本轮廓图形对应的相对位置集,所述相对位置集中包含各个人脸器官在所述样本轮廓图形中的第二相对位置。将所述相对位置集中的各个人脸器官对应的第二相对位置,作为各个人脸器官在所述人脸轮廓图形中的第一相对位置。If the matching is successful, obtain the relative position set corresponding to the sample contour graph that is successfully matched, and the relative position set includes the second relative position of each human face organ in the sample contour graph. The second relative position corresponding to each face organ in the relative position set is taken as the first relative position of each face organ in the face contour graph.
进一步地,器官图像查找模块,包括:Further, the organ image search module includes:
获取各个人脸器官对应的检索矩形尺寸,并根据所述检索矩形尺寸和所述器官中心坐标,识别各个人脸器官在所述目标图像中分别对应的第二图像区域,所述第二图像区域的形状为矩形。Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the center coordinates of the organ, and the second image area The shape is rectangular.
对各个所述第二图像区域分别进行人脸器官检测,并根据检测结果从所述第二图像区域中识别出人脸器官对应的所述第一图像区域。The face organ detection is performed on each of the second image areas, and the first image area corresponding to the face organ is identified from the second image area according to the detection result.
进一步地,坐标提取模块,包括:Further, the coordinate extraction module includes:
获取各个人脸器官分别对应的采样点数量,其中,嘴部、眼部、眉部和鼻子对应的所述采样点数量依次减小。The number of sampling points corresponding to each face organ is acquired, wherein the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced.
对各个所述人脸器官对应的所述第一图像区域进行特征点采样,并获取各个特征点的坐标,得到各个人脸器官在所述目标图像中分别对应的所述坐标组,所述特征点采样的采样点数量为所述人脸器官各自对应的所述采样点数量。Perform feature point sampling on the first image region corresponding to each of the facial organs, and obtain the coordinates of each feature point to obtain the coordinate set corresponding to each facial organ in the target image, and the feature The number of sampling points for point sampling is the number of sampling points corresponding to each of the face organs.
本申请实施例提供的视频卡顿识别装置中各模块实现各自功能的过程,具体可参考前述图1所示实施例一的描述,此处不再赘述。For the process of implementing respective functions of each module in the video freeze identification device provided in the embodiment of the present application, please refer to the description of the first embodiment shown in FIG. 1 for details, which will not be repeated here.
本申请实施例提供的视频卡顿识别方法可以应用于手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等终端设备上,本申请实施例对终端设备的具体类型不作任何限制。The video freeze recognition method provided by the embodiments of this application can be applied to mobile phones, tablet computers, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, and super mobiles. For terminal devices such as ultra-mobile personal computers (UMPC), netbooks, and personal digital assistants (personal digital assistants, PDAs), the embodiments of this application do not impose any restrictions on the specific types of terminal devices.
例如,所述终端设备可以是蜂窝电话、无绳电话、会话启动协议(Session InitiationProtocol,SIP)电话、无线本地环路(Wireless Local Loop,WLL)站、个人数字处理(Personal Digital Assistant,PDA)设备、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备、车联网终端、电脑、膝上型计算机、手持式通信设备、手持式计算设备、卫星无线设备、无线调制解调器卡、电视机顶盒(set top box,STB)、用户驻地设备(customer premise equipment,CPE)和/或用于在无线系统上进行通信的其它设备以及下一代通信系统,例如,5G网络中的移动终端或者未来演进的公共陆地移动网络(Public Land Mobile Network,PLMN)网络中的移动终端等。For example, the terminal device may be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a wireless local loop (Wireless Local Loop, WLL) station, a personal digital processing (Personal Digital Assistant, PDA) device, Handheld devices with wireless communication functions, computing devices or other processing devices connected to wireless modems, vehicle-mounted devices, car networking terminals, computers, laptop computers, handheld communication devices, handheld computing devices, satellite wireless devices, wireless modems Card, TV set top box (STB), customer premise equipment (CPE) and/or other equipment used for communication on wireless systems and next-generation communication systems, such as mobile terminals in 5G networks Or mobile terminals in the public land mobile network (PLMN) network that will evolve in the future.
作为示例而非限定,当所述终端设备为可穿戴设备时,该可穿戴设备还可以是应用穿戴式技术对日常穿戴进行智能化设计、开发出可以穿戴的设备的总称,如眼镜、手套、手表、服饰及鞋等。可穿戴设备即直接穿在身上,或是整合到用户的衣服或配件的一种便携式设备。可穿戴设备不仅仅是一种硬件设备,更是通过软件支持以及数据交互、云端交互来实现强大的功能。广义穿戴式智能设备包括功能全、尺寸大、可不依赖智能手机实现完整或者部分的功能,如智能手表或智能眼镜等,以及只专注于某一类应用功能,需要和其它设备如智能手机配合使用,如各类进行体征监测的智能手环、智能首饰等。As an example and not a limitation, when the terminal device is a wearable device, the wearable device can also be a general term for applying wearable technology to intelligently design daily wear and develop wearable devices, such as glasses, gloves, Watches, clothing and shoes, etc. A wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories. Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction. In a broad sense, wearable smart devices include full-featured, large-sized, complete or partial functions that can be implemented without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, and need to be used in conjunction with other devices such as smart phones. , Such as all kinds of smart bracelets and smart jewelry for physical sign monitoring.
图8是本申请一实施例提供的终端设备的结构示意图。如图8所示,该实施例的终端设备8包括:至少一个处理器80(图8中仅示出一个)、存储器81,所述存储器81中存储有可在所述处理器80上运行的计算机程序82。所述处理器80执行所述计算机程序82时实现上述各个视频卡顿识别方法实施例中的步骤,例如图1所示的步骤101至105。或者,所述处理器80执行所述计算机程序82时实现上述各装置实施例中各模块/单元的功能,例如图7所示模块71至75的功能。FIG. 8 is a schematic structural diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 8, the terminal device 8 of this embodiment includes: at least one processor 80 (only one is shown in FIG. Computer program 82. When the processor 80 executes the computer program 82, the steps in the foregoing embodiments of the video freeze identification method, such as steps 101 to 105 shown in FIG. 1, are implemented. Alternatively, when the processor 80 executes the computer program 82, the functions of the modules/units in the foregoing device embodiments, for example, the functions of the modules 71 to 75 shown in FIG. 7 are realized.
所述终端设备8可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备可包括,但不仅限于,处理器80、存储器81。本领域技术人员可以理解,图8仅仅是终端设备8的示例,并不构成对终端设备8的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入发送设备、网络接入设备、总线等。The terminal device 8 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal device may include, but is not limited to, a processor 80 and a memory 81. Those skilled in the art can understand that FIG. 8 is only an example of the terminal device 8 and does not constitute a limitation on the terminal device 8. It may include more or less components than shown in the figure, or a combination of certain components, or different components. For example, the terminal device may also include an input sending device, a network access device, a bus, and the like.
所称处理器80可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 80 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
所述存储器81在一些实施例中可以是所述终端设备8的内部存储单元,例如终端设备8的硬盘或内存。所述存储器81也可以是所述终端设备8的外部存储设备,例如所述终端设备8上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器81还可以既包括所述终端设备8的内部存储单元也包括外部存储设备。所述存储器81用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器81还可以用于暂时地存储已经发送或者将要发送的数据。The memory 81 may be an internal storage unit of the terminal device 8 in some embodiments, such as a hard disk or memory of the terminal device 8. The memory 81 may also be an external storage device of the terminal device 8, such as a plug-in hard disk equipped on the terminal device 8, a smart memory card (Smart Media Card, SMC), or a Secure Digital (SD). Card, Flash Card, etc. Further, the memory 81 may also include both an internal storage unit of the terminal device 8 and an external storage device. The memory 81 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 81 can also be used to temporarily store data that has been sent or will be sent.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物 理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤,即包括:当实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态;若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数;若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态;若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数;若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。The embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores a computer program, and the computer When the program is executed by the processor, the steps in each of the above method embodiments can be realized, including: when the monitoring state corresponding to the video in the real-time call is the first state, face detection is performed on the video, and the video is detected. When there is a face in the video, modify the monitoring state corresponding to the video to the second state; if the monitoring state is the second state, sample the video at the first frequency to obtain N first frames Image, and compare N images of the first frame, where N is a positive integer greater than 1, if the comparison result is that the image difference between N images of the first frame is less than the first difference threshold , The earliest sampling moment among the sampling moments corresponding to each of the first frame images is used as the starting moment of the video freeze, and the monitoring state is set to the third state; if the monitoring state is the first In three states, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive integer greater than 1; if the comparison result is The image difference degree between the M second frame images is greater than or equal to the first difference threshold, and the latest sampling time among the sampling times corresponding to each second frame image is used as the video card At the end time of the frame, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在移动终端上运行时,使得移动终端执行时实现可实现上述各个方法实施例中的步骤。The embodiments of the present application provide a computer program product. When the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号、电信信号以及软件分发介质等。If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the present application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (Read-Only Memory, ROM) , Random Access Memory (RAM), electrical carrier signal, telecommunications signal, and software distribution media, etc.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使对应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims (20)

  1. 一种视频卡顿识别方法,其中,包括:A method for identifying video freezes, which includes:
    当实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态;When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the first state. Two states
    若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数;If the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and the N first frame images are compared, where N is a positive value greater than 1. Integer
    若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态;If the comparison result is that the degree of image difference between N images of the first frame is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each of the first frame images is used as the video card Stop the start time, and set the monitoring state to the third state;
    若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数;If the monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive value greater than 1. Integer
    若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
  2. 如权利要求1所述的视频卡顿识别方法,其中,所述对N张所述第一帧图像进行比对,包括:The method for identifying video freezes according to claim 1, wherein said comparing the N images of the first frame comprises:
    针对每张所述第一帧图像进行人脸器官坐标分析,得到N个第一人脸器官坐标集;Perform face organ coordinate analysis for each of the first frame images to obtain N first face organ coordinate sets;
    利用N个所述第一人脸器官坐标集对N个所述第一帧图像进行比对。The N first frame images are compared by using the N first face organ coordinate sets.
  3. 如权利要求2所述的视频卡顿识别方法,其中,所述第一人脸器官坐标集为多个坐标组的集合,每个坐标组对应一种人脸器官,且每个坐标组中包含对应的人脸器官的多个坐标;The video freeze recognition method of claim 2, wherein the first face organ coordinate set is a set of multiple coordinate groups, each coordinate group corresponds to a face organ, and each coordinate group contains Multiple coordinates of the corresponding facial organs;
    所述针对每张所述第一帧图像进行人脸器官坐标分析,包括:The performing coordinate analysis of the face organs for each of the first frame images includes:
    将待分析的所述第一帧图像作为目标图像,对所述目标图像进行人脸轮廓绘制,得到对应的人脸轮廓图形;Taking the first frame of image to be analyzed as a target image, and performing face contour drawing on the target image to obtain a corresponding face contour graphic;
    获取各个人脸器官在所述人脸轮廓图形中的第一相对位置;Acquiring the first relative position of each face organ in the face contour graph;
    利用所述第一相对位置对所述目标图像进行人脸器官的定位,得到各个人脸器官在所述目标图像中的器官中心坐标,并基于多个所述器官中心坐标,识别出各个人脸器官在所述目标图像中分别对应的第一图像区域;Use the first relative position to locate the face organs of the target image, obtain the organ center coordinates of each face organ in the target image, and identify each face based on the multiple organ center coordinates First image regions corresponding to the organs in the target image;
    对各个所述第一图像区域分别进行坐标提取,得到各个人脸器官在所述目标图像中分别对应的所述坐标组。Coordinate extraction is performed on each of the first image regions to obtain the coordinate group corresponding to each human face organ in the target image.
  4. 如权利要求3所述的视频卡顿识别方法,其中,获取各个人脸器官在所述人脸轮廓图形中的第一相对位置,包括:The method for identifying video freezes according to claim 3, wherein acquiring the first relative position of each face organ in the face contour graph comprises:
    将所述人脸轮廓图形与人脸轮廓库中的多个样本轮廓图像进行图形匹配;Graphically matching the face contour graphics with a plurality of sample contour images in the face contour library;
    若匹配成功,获取匹配成功的所述样本轮廓图形对应的相对位置集,所述相对位置集中包含各个人脸器官在所述样本轮廓图形中的第二相对位置;将所述相对位置集中的各个人脸器官对应的第二相对位置,作为各个人脸器官在所述人脸轮廓图形中的第一相对位置。If the matching is successful, obtain a set of relative positions corresponding to the sample contour graphics that are successfully matched, and the relative position set includes the second relative position of each face organ in the sample contour graphics; each of the relative positions is collected The second relative position corresponding to the personal facial organ is used as the first relative position of each facial organ in the facial contour graph.
  5. 如权利要求3所述的视频卡顿识别方法,其中,所述基于多个所述器官中心坐标,识别出各个人脸器官在所述目标图像中分别对应的第一图像区域:The method for identifying video freezes according to claim 3, wherein the first image area corresponding to each face organ in the target image is identified based on the center coordinates of a plurality of the organs:
    获取各个人脸器官对应的检索矩形尺寸,并根据所述检索矩形尺寸和所述器官中心坐标,识别各个人脸器官在所述目标图像中分别对应的第二图像区域,所述第二图像区域的形状为矩形;Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the center coordinates of the organ, and the second image area The shape is rectangular;
    对各个所述第二图像区域分别进行人脸器官检测,并根据检测结果从所述第二图像区域中识别出人脸器官对应的所述第一图像区域。The face organ detection is performed on each of the second image areas, and the first image area corresponding to the face organ is identified from the second image area according to the detection result.
  6. 如权利要求3所述的视频卡顿识别方法,其中,所述对各个所述第一图像区域分别进行坐标提取,得到各个人脸器官在所述目标图像中分别对应的所述坐标组,包括:The method for identifying video freezes according to claim 3, wherein said extracting the coordinates of each of the first image regions to obtain the coordinate groups corresponding to each of the facial organs in the target image respectively includes :
    获取各个人脸器官分别对应的采样点数量,其中,嘴部、眼部、眉部和鼻子对应的所述采样点数量依次减小;Acquiring the number of sampling points corresponding to each face organ, wherein the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced;
    对各个所述人脸器官对应的所述第一图像区域进行特征点采样,并获取各个特征点的坐标,得到各个人脸器官在所述目标图像中分别对应的所述坐标组,所述特征点采样的采样点数量为所述人脸器官 各自对应的所述采样点数量。Perform feature point sampling on the first image region corresponding to each of the facial organs, and obtain the coordinates of each feature point to obtain the coordinate set corresponding to each facial organ in the target image, and the feature The number of sampling points for point sampling is the number of sampling points corresponding to each of the face organs.
  7. 一种视频卡顿识别装置,其中,包括:A video jam recognition device, which includes:
    人脸检测模块,用于在实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态;The face detection module is used to perform face detection on the video when the monitoring state corresponding to the video during the real-time call is the first state, and when a face is detected in the video, the video The corresponding monitoring state is modified to the second state;
    第一图像比对模块,用于若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数;The first image comparison module is configured to, if the monitoring state is the second state, sample the video at the first frequency to obtain N first frame images, and compare the N first frame images , Where N is a positive integer greater than 1;
    卡顿开始识别模块,用于若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态;The stutter start recognition module is configured to, if the comparison result is that the image difference between N images of the first frame is less than the first difference threshold, the earliest sampling of the sampling moments corresponding to each of the first frame images Time, as the start time of the video freeze, and set the monitoring state to the third state;
    第二图像比对模块,用于若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数;The second image comparison module is configured to, if the monitoring state is the third state, sample the video at the second frequency to obtain M second frame images, and compare the M second frame images , Where M is a positive integer greater than 1;
    卡顿结束识别模块,用于若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。The stutter end recognition module is configured to, if the comparison result is that the image difference between M images of the second frame is greater than or equal to the first difference threshold, compare the sampling time corresponding to each of the second frame images The latest sampling time in the video is used as the stop time of the video, the monitoring state is set to the first state, and the card of the video is identified based on the start time of the stop and the stop time of the stop. Time period.
  8. 如权利要求7所述的视频卡顿识别装置,其中,第一图像比对模块,包括:8. The video freeze recognition device of claim 7, wherein the first image comparison module comprises:
    坐标分析模块,用于针对每张所述第一帧图像进行人脸器官坐标分析,得到N个第一人脸器官坐标集;The coordinate analysis module is configured to perform face organ coordinate analysis for each of the first frame images to obtain N first face organ coordinate sets;
    坐标比对摸,用于利用N个所述第一人脸器官坐标集对N个所述第一帧图像进行比对。The coordinate comparison touch is used to compare the N first frame images by using the N first face organ coordinate sets.
  9. 一种终端设备,其中,所述终端设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现一种视频卡顿识别方法步骤,所述方法包括:A terminal device, wherein the terminal device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements a video card when the computer program is executed. Steps of the frame identification method, the method includes:
    当实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态;When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the first state. Two states
    若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数;If the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and the N first frame images are compared, where N is a positive value greater than 1. Integer
    若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态;If the comparison result is that the degree of image difference between N images of the first frame is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each of the first frame images is used as the video card Stop the start time, and set the monitoring state to the third state;
    若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数;If the monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive value greater than 1. Integer
    若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
  10. 如权利要求9所述的终端设备,其中,所述对N张所述第一帧图像进行比对,包括:The terminal device according to claim 9, wherein said comparing the N images of the first frame comprises:
    针对每张所述第一帧图像进行人脸器官坐标分析,得到N个第一人脸器官坐标集;Perform face organ coordinate analysis for each of the first frame images to obtain N first face organ coordinate sets;
    利用N个所述第一人脸器官坐标集对N个所述第一帧图像进行比对。The N first frame images are compared by using the N first face organ coordinate sets.
  11. 如权利要求10所述的终端设备,其中,所述第一人脸器官坐标集为多个坐标组的集合,每个坐标组对应一种人脸器官,且每个坐标组中包含对应的人脸器官的多个坐标;The terminal device according to claim 10, wherein the first face organ coordinate set is a set of multiple coordinate groups, each coordinate group corresponds to a face organ, and each coordinate group contains a corresponding person. Multiple coordinates of facial organs;
    所述针对每张所述第一帧图像进行人脸器官坐标分析,包括:The performing coordinate analysis of the face organs for each of the first frame images includes:
    将待分析的所述第一帧图像作为目标图像,对所述目标图像进行人脸轮廓绘制,得到对应的人脸轮廓图形;Taking the first frame of image to be analyzed as a target image, and performing face contour drawing on the target image to obtain a corresponding face contour graphic;
    获取各个人脸器官在所述人脸轮廓图形中的第一相对位置;Acquiring the first relative position of each face organ in the face contour graph;
    利用所述第一相对位置对所述目标图像进行人脸器官的定位,得到各个人脸器官在所述目标图像中的器官中心坐标,并基于多个所述器官中心坐标,识别出各个人脸器官在所述目标图像中分别对应的第一图像区域;Use the first relative position to locate the face organs of the target image, obtain the organ center coordinates of each face organ in the target image, and identify each face based on the plurality of the organ center coordinates First image regions corresponding to the organs in the target image;
    对各个所述第一图像区域分别进行坐标提取,得到各个人脸器官在所述目标图像中分别对应的所 述坐标组。The coordinate extraction is performed on each of the first image regions to obtain the coordinate group corresponding to each human face organ in the target image.
  12. 如权利要求11所述的终端设备,其中,获取各个人脸器官在所述人脸轮廓图形中的第一相对位置,包括:The terminal device according to claim 11, wherein acquiring the first relative position of each face organ in the face contour graph comprises:
    将所述人脸轮廓图形与人脸轮廓库中的多个样本轮廓图像进行图形匹配;Graphically matching the face contour graphics with a plurality of sample contour images in the face contour library;
    若匹配成功,获取匹配成功的所述样本轮廓图形对应的相对位置集,所述相对位置集中包含各个人脸器官在所述样本轮廓图形中的第二相对位置;将所述相对位置集中的各个人脸器官对应的第二相对位置,作为各个人脸器官在所述人脸轮廓图形中的第一相对位置。If the matching is successful, obtain a set of relative positions corresponding to the sample contour graphics that are successfully matched, and the relative position set includes the second relative position of each face organ in the sample contour graphics; each of the relative positions is collected The second relative position corresponding to the personal facial organ is used as the first relative position of each facial organ in the facial contour graph.
  13. 如权利要求11所述的终端设备,其中,所述基于多个所述器官中心坐标,识别出各个人脸器官在所述目标图像中分别对应的第一图像区域:The terminal device according to claim 11, wherein the first image area corresponding to each face organ in the target image is identified based on a plurality of center coordinates of the organ:
    获取各个人脸器官对应的检索矩形尺寸,并根据所述检索矩形尺寸和所述器官中心坐标,识别各个人脸器官在所述目标图像中分别对应的第二图像区域,所述第二图像区域的形状为矩形;Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the center coordinates of the organ, and the second image area The shape is rectangular;
    对各个所述第二图像区域分别进行人脸器官检测,并根据检测结果从所述第二图像区域中识别出人脸器官对应的所述第一图像区域。The face organ detection is performed on each of the second image areas, and the first image area corresponding to the face organ is identified from the second image area according to the detection result.
  14. 如权利要求11所述的终端设备,其中,所述对各个所述第一图像区域分别进行坐标提取,得到各个人脸器官在所述目标图像中分别对应的所述坐标组,包括:The terminal device according to claim 11, wherein said extracting the coordinates of each of the first image regions to obtain the coordinate groups corresponding to each of the facial organs in the target image respectively comprises:
    获取各个人脸器官分别对应的采样点数量,其中,嘴部、眼部、眉部和鼻子对应的所述采样点数量依次减小;Acquiring the number of sampling points corresponding to each face organ, wherein the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced;
    对各个所述人脸器官对应的所述第一图像区域进行特征点采样,并获取各个特征点的坐标,得到各个人脸器官在所述目标图像中分别对应的所述坐标组,所述特征点采样的采样点数量为所述人脸器官各自对应的所述采样点数量。Perform feature point sampling on the first image region corresponding to each of the facial organs, and obtain the coordinates of each feature point to obtain the coordinate set corresponding to each facial organ in the target image, and the feature The number of sampling points for point sampling is the number of sampling points corresponding to each of the face organs.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种视频卡顿识别方法的步骤,该方法包括:A computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, the steps of a method for identifying video freezes are implemented, the method comprising:
    当实时通话过程中的视频对应的监控状态为第一状态时,对所述视频进行人脸检测,并在检测到所述视频中存在人脸时,将所述视频对应的监控状态修改为第二状态;When the monitoring state corresponding to the video during the real-time call is the first state, perform face detection on the video, and when a face is detected in the video, modify the monitoring state corresponding to the video to the first state. Two states
    若所述监控状态为第二状态,以第一频率对所述视频进行采样得到N张第一帧图像,并对N张所述第一帧图像进行比对,其中,N为大于1的正整数;If the monitoring state is the second state, the video is sampled at the first frequency to obtain N first frame images, and the N first frame images are compared, where N is a positive value greater than 1. Integer
    若比对结果为N张所述第一帧图像之间的图像差异度小于第一差异阈值,将各张所述第一帧图像对应的采样时刻中最早的采样时刻,作为所述视频的卡顿起始时刻,并将所述监控状态设置为第三状态;If the comparison result is that the degree of image difference between N images of the first frame is less than the first difference threshold, the earliest sampling time among the sampling times corresponding to each of the first frame images is used as the video card Stop the start time, and set the monitoring state to the third state;
    若所述监控状态为第三状态,以第二频率对所述视频进行采样得到M张第二帧图像,并对M张所述第二帧图像进行比对,其中,M为大于1的正整数;If the monitoring state is the third state, the video is sampled at the second frequency to obtain M second frame images, and M second frame images are compared, where M is a positive value greater than 1. Integer
    若比对结果为M张所述第二帧图像之间的图像差异度大于或等于所述第一差异阈值,将各张所述第二帧图像对应的采样时刻中最晚的采样时刻,作为所述视频的卡顿终止时刻,将所述监控状态设置为第一状态,并基于所述卡顿起始时刻和所述卡顿终止时刻识别所述视频的卡顿时间段。If the comparison result is that the degree of image difference between the M images of the second frame is greater than or equal to the first difference threshold, the latest sampling time among the sampling times corresponding to each of the second frame images is taken as At the end time of the video freeze, the monitoring state is set to the first state, and the freeze time period of the video is identified based on the start time of the freeze and the end time of the freeze.
  16. 如权利要求15所述的计算机可读存储介质,其中,所述对N张所述第一帧图像进行比对,包括:15. The computer-readable storage medium of claim 15, wherein said comparing the N images of the first frame comprises:
    针对每张所述第一帧图像进行人脸器官坐标分析,得到N个第一人脸器官坐标集;Perform face organ coordinate analysis for each of the first frame images to obtain N first face organ coordinate sets;
    利用N个所述第一人脸器官坐标集对N个所述第一帧图像进行比对。The N first frame images are compared by using the N first face organ coordinate sets.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述第一人脸器官坐标集为多个坐标组的集合,每个坐标组对应一种人脸器官,且每个坐标组中包含对应的人脸器官的多个坐标;The computer-readable storage medium of claim 16, wherein the first face organ coordinate set is a set of multiple coordinate groups, each coordinate group corresponds to a face organ, and each coordinate group contains Multiple coordinates of the corresponding facial organs;
    所述针对每张所述第一帧图像进行人脸器官坐标分析,包括:The performing coordinate analysis of the face organs for each of the first frame images includes:
    将待分析的所述第一帧图像作为目标图像,对所述目标图像进行人脸轮廓绘制,得到对应的人脸轮廓图形;Taking the first frame of image to be analyzed as a target image, and performing face contour drawing on the target image to obtain a corresponding face contour graphic;
    获取各个人脸器官在所述人脸轮廓图形中的第一相对位置;Acquiring the first relative position of each face organ in the face contour graph;
    利用所述第一相对位置对所述目标图像进行人脸器官的定位,得到各个人脸器官在所述目标图像中的器官中心坐标,并基于多个所述器官中心坐标,识别出各个人脸器官在所述目标图像中分别对应的第一图像区域;Use the first relative position to locate the face organs of the target image, obtain the organ center coordinates of each face organ in the target image, and identify each face based on the plurality of the organ center coordinates First image regions corresponding to the organs in the target image;
    对各个所述第一图像区域分别进行坐标提取,得到各个人脸器官在所述目标图像中分别对应的所述坐标组。Coordinate extraction is performed on each of the first image regions to obtain the coordinate group corresponding to each human face organ in the target image.
  18. 如权利要求17所述的计算机可读存储介质,其中,获取各个人脸器官在所述人脸轮廓图形中的第一相对位置,包括:17. The computer-readable storage medium of claim 17, wherein acquiring the first relative position of each human face organ in the human face contour graph comprises:
    将所述人脸轮廓图形与人脸轮廓库中的多个样本轮廓图像进行图形匹配;Graphically matching the face contour graphics with a plurality of sample contour images in the face contour library;
    若匹配成功,获取匹配成功的所述样本轮廓图形对应的相对位置集,所述相对位置集中包含各个人脸器官在所述样本轮廓图形中的第二相对位置;将所述相对位置集中的各个人脸器官对应的第二相对位置,作为各个人脸器官在所述人脸轮廓图形中的第一相对位置。If the matching is successful, obtain a set of relative positions corresponding to the sample contour graphics that are successfully matched, and the relative position set includes the second relative position of each face organ in the sample contour graphics; each of the relative positions is collected The second relative position corresponding to the personal facial organ is used as the first relative position of each facial organ in the facial contour graph.
  19. 如权利要求17所述的计算机可读存储介质,其中,所述基于多个所述器官中心坐标,识别出各个人脸器官在所述目标图像中分别对应的第一图像区域:17. The computer-readable storage medium according to claim 17, wherein the first image area corresponding to each human face organ in the target image is identified based on a plurality of center coordinates of the organ:
    获取各个人脸器官对应的检索矩形尺寸,并根据所述检索矩形尺寸和所述器官中心坐标,识别各个人脸器官在所述目标图像中分别对应的第二图像区域,所述第二图像区域的形状为矩形;Obtain the size of the retrieval rectangle corresponding to each face organ, and identify the second image area corresponding to each face organ in the target image according to the size of the retrieval rectangle and the center coordinates of the organ, and the second image area The shape is rectangular;
    对各个所述第二图像区域分别进行人脸器官检测,并根据检测结果从所述第二图像区域中识别出人脸器官对应的所述第一图像区域。The face organ detection is performed on each of the second image areas, and the first image area corresponding to the face organ is identified from the second image area according to the detection result.
  20. 如权利要求17所述的计算机可读存储介质,其中,所述对各个所述第一图像区域分别进行坐标提取,得到各个人脸器官在所述目标图像中分别对应的所述坐标组,包括:17. The computer-readable storage medium of claim 17, wherein said extracting the coordinates of each of the first image regions to obtain the coordinate groups corresponding to each of the facial organs in the target image respectively comprises :
    获取各个人脸器官分别对应的采样点数量,其中,嘴部、眼部、眉部和鼻子对应的所述采样点数量依次减小;Acquiring the number of sampling points corresponding to each face organ, wherein the number of sampling points corresponding to the mouth, eyes, eyebrows, and nose is sequentially reduced;
    对各个所述人脸器官对应的所述第一图像区域进行特征点采样,并获取各个特征点的坐标,得到各个人脸器官在所述目标图像中分别对应的所述坐标组,所述特征点采样的采样点数量为所述人脸器官各自对应的所述采样点数量。Perform feature point sampling on the first image region corresponding to each of the facial organs, and obtain the coordinates of each feature point to obtain the coordinate set corresponding to each facial organ in the target image, and the feature The number of sampling points for point sampling is the number of sampling points corresponding to each of the face organs.
PCT/CN2020/087381 2020-02-11 2020-04-28 Video lag identification method and apparatus, and terminal device WO2021159609A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010087225.0 2020-02-11
CN202010087225.0A CN111339842A (en) 2020-02-11 2020-02-11 Video jamming identification method and device and terminal equipment

Publications (1)

Publication Number Publication Date
WO2021159609A1 true WO2021159609A1 (en) 2021-08-19

Family

ID=71183337

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087381 WO2021159609A1 (en) 2020-02-11 2020-04-28 Video lag identification method and apparatus, and terminal device

Country Status (2)

Country Link
CN (1) CN111339842A (en)
WO (1) WO2021159609A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113784216A (en) * 2021-08-24 2021-12-10 咪咕音乐有限公司 Video jamming identification method and device, terminal equipment and storage medium
CN115484454A (en) * 2022-09-09 2022-12-16 深圳健路网络科技有限责任公司 Detection method and device for video blockage acquired by non-root mobile phone
CN116110037A (en) * 2023-04-11 2023-05-12 深圳市华图测控系统有限公司 Book checking method and device based on visual identification and terminal equipment

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113296858B (en) * 2021-04-14 2024-03-19 惠州市德赛西威汽车电子股份有限公司 Method for detecting and recovering picture dead of vehicle-mounted system and storage medium
CN113627387A (en) * 2021-08-30 2021-11-09 平安国际融资租赁有限公司 Parallel identity authentication method, device, equipment and medium based on face recognition
CN114125436A (en) * 2021-11-30 2022-03-01 中国电信股份有限公司 Soft probe monitoring level testing method and device, storage medium and electronic equipment
CN114419018A (en) * 2022-01-25 2022-04-29 重庆紫光华山智安科技有限公司 Image sampling method, system, device and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160353057A1 (en) * 2015-06-01 2016-12-01 Apple Inc. Techniques to overcome communication lag between terminals performing video mirroring and annotation operations
CN106789555A (en) * 2016-11-25 2017-05-31 努比亚技术有限公司 Method of transmitting video data and device
CN108737885A (en) * 2018-06-07 2018-11-02 北京奇艺世纪科技有限公司 A kind of analysis Online Video plays the method and device of interim card
CN109145878A (en) * 2018-09-30 2019-01-04 北京小米移动软件有限公司 image extraction method and device
CN110430425A (en) * 2019-07-31 2019-11-08 北京奇艺世纪科技有限公司 A kind of video fluency determines method, apparatus, electronic equipment and medium
CN110475124A (en) * 2019-09-06 2019-11-19 广州虎牙科技有限公司 Video cardton detection method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160353057A1 (en) * 2015-06-01 2016-12-01 Apple Inc. Techniques to overcome communication lag between terminals performing video mirroring and annotation operations
CN106789555A (en) * 2016-11-25 2017-05-31 努比亚技术有限公司 Method of transmitting video data and device
CN108737885A (en) * 2018-06-07 2018-11-02 北京奇艺世纪科技有限公司 A kind of analysis Online Video plays the method and device of interim card
CN109145878A (en) * 2018-09-30 2019-01-04 北京小米移动软件有限公司 image extraction method and device
CN110430425A (en) * 2019-07-31 2019-11-08 北京奇艺世纪科技有限公司 A kind of video fluency determines method, apparatus, electronic equipment and medium
CN110475124A (en) * 2019-09-06 2019-11-19 广州虎牙科技有限公司 Video cardton detection method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113784216A (en) * 2021-08-24 2021-12-10 咪咕音乐有限公司 Video jamming identification method and device, terminal equipment and storage medium
CN115484454A (en) * 2022-09-09 2022-12-16 深圳健路网络科技有限责任公司 Detection method and device for video blockage acquired by non-root mobile phone
CN116110037A (en) * 2023-04-11 2023-05-12 深圳市华图测控系统有限公司 Book checking method and device based on visual identification and terminal equipment
CN116110037B (en) * 2023-04-11 2023-06-23 深圳市华图测控系统有限公司 Book checking method and device based on visual identification and terminal equipment

Also Published As

Publication number Publication date
CN111339842A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
WO2021159609A1 (en) Video lag identification method and apparatus, and terminal device
CN110147717B (en) Human body action recognition method and device
CN110135246B (en) Human body action recognition method and device
CN109376596B (en) Face matching method, device, equipment and storage medium
CN107679448B (en) Eyeball action-analysing method, device and storage medium
US20140010409A1 (en) Object tracking device, object tracking method, and control program
CN110163096B (en) Person identification method, person identification device, electronic equipment and computer readable medium
WO2023173646A1 (en) Expression recognition method and apparatus
CN110197149B (en) Ear key point detection method and device, storage medium and electronic equipment
CN113011403B (en) Gesture recognition method, system, medium and device
WO2019184605A1 (en) Multi-target tracking method and terminal device
WO2019062347A1 (en) Facial recognition method and related product
CN112381071A (en) Behavior analysis method of target in video stream, terminal device and medium
JP2022542199A (en) KEYPOINT DETECTION METHOD, APPARATUS, ELECTRONICS AND STORAGE MEDIA
WO2019033567A1 (en) Method for capturing eyeball movement, device and storage medium
CN113657195A (en) Face image recognition method, face image recognition equipment, electronic device and storage medium
CN111091106A (en) Image clustering method and device, storage medium and electronic device
CN112492383A (en) Video frame generation method and device, storage medium and electronic equipment
WO2021012513A1 (en) Gesture operation method and apparatus, and computer device
EP3200092A1 (en) Method and terminal for implementing image sequencing
CN111783677A (en) Face recognition method, face recognition device, server and computer readable medium
CN116761020A (en) Video processing method, device, equipment and medium
WO2022134700A1 (en) Method and apparatus for identifying target object
US8509541B2 (en) Method for eye detection for a given face
CN111507289A (en) Video matching method, computer device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918939

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.12.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20918939

Country of ref document: EP

Kind code of ref document: A1