CN112101105A - Training method and device for face key point detection model and storage medium - Google Patents

Training method and device for face key point detection model and storage medium Download PDF

Info

Publication number
CN112101105A
CN112101105A CN202010794471.XA CN202010794471A CN112101105A CN 112101105 A CN112101105 A CN 112101105A CN 202010794471 A CN202010794471 A CN 202010794471A CN 112101105 A CN112101105 A CN 112101105A
Authority
CN
China
Prior art keywords
face
target
data sequence
key point
position data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010794471.XA
Other languages
Chinese (zh)
Other versions
CN112101105B (en
Inventor
马啸
张阿强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Original Assignee
Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shuliantianxia Intelligent Technology Co Ltd filed Critical Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority to CN202010794471.XA priority Critical patent/CN112101105B/en
Publication of CN112101105A publication Critical patent/CN112101105A/en
Application granted granted Critical
Publication of CN112101105B publication Critical patent/CN112101105B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a training method, a device and a storage medium of a face key point detection model, wherein continuous multiframe face video frame images in a face video containing marked face key points are marked and smoothed to remove jitter, and a first face key point detection model is trained by utilizing the re-marked multiframe face video frame images, so that the position relation of the same face key point in the continuous face video frame images with the jitter removed is used in the training process, the stability and the accuracy of a second face key point detection model obtained by training can be effectively improved, the method, the device and the storage medium are particularly suitable for detecting the face key points in the face video, and the jitter can be effectively reduced.

Description

Training method and device for face key point detection model and storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to a training method and a training device for a face key point detection model and a storage medium.
Background
The human face key points have the functions of accurately positioning and segmenting the positions of all parts of the human face, such as the accurate outer contours of eyes, eyebrows and a mouth, the outer contour of a face and the like, and the human face key points can be applied to various fields, such as human face deformation (face thinning, large eyes and the like), virtual makeup, animation movies and the like.
With the gradual increase of the application fields of the face key points, the requirement on the accuracy of the detection of the face key points is higher and higher, and at present, a common mode is to train to obtain a face key point detection model, that is, the face key points are detected by using the face key point detection model, however, the problems of low detection stability and accuracy still exist when the face key point detection model is used, and particularly, the shake of the face key points can be caused when the face in a video is detected.
Disclosure of Invention
The invention mainly aims to provide a training method, a training device and a storage medium for a face key point detection model, which can solve the problem that the face key points detected by the face key point detection model in the prior art are inaccurate and have jitter.
In order to achieve the above object, a first aspect of the present invention provides a method for training a face keypoint detection model, where the method includes:
acquiring a target face sample data set with marked face key points, wherein the target face sample data set comprises at least one sample subset, and the sample subset comprises continuous multi-frame face video frame images in a face video;
performing annotation smoothing processing on face key points of a face video frame image in a target sample subset to obtain a re-annotated target sample subset, wherein the target sample subset is any one sample subset;
and training the first face key point detection model by using the target face sample data set containing the re-labeled sample subset to obtain the trained target face key point detection model.
In order to achieve the above object, a second aspect of the present invention provides a device for training a face keypoint detection model, the device comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target face sample data set with labeled face key points, the target face sample data set comprises at least one sample subset, and the sample subset comprises continuous multi-frame face video frame images in a face video;
the smoothing module is used for carrying out labeling smoothing processing on the face key points of the face video frame images in the target sample subsets to obtain re-labeled target sample subsets, wherein the target sample subsets are any sample subsets; and the training module is used for training the first face key point detection model by using the target face sample data set containing the re-labeled sample subset to obtain the trained target face key point detection model.
To achieve the above object, a third aspect of the present invention provides a computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to perform the steps as described in the first aspect.
The embodiment of the invention has the following beneficial effects:
the invention provides a training method of a face key point detection model, which comprises the following steps: acquiring a target face sample data set with marked face key points, wherein the target face sample data set comprises at least one sample subset, and the sample subset comprises continuous multi-frame face video frame images in a face video; and training the first face key point detection model by using the target face sample data set to obtain a trained target face key point detection model. The method comprises the steps of carrying out labeling smoothing processing on continuous multiframe human face video frame images in a human face video containing labeled human face key points to remove jitter, and training a first human face key point detection model by utilizing the re-labeled multiframe human face video frame images, so that the position relation of the same human face key point in the continuous human face video frame images with the jitter removed is used in the training process, the stability and the accuracy of the trained second human face key point detection model can be effectively improved, the method is particularly suitable for detecting the human face key points in the human face video, and the jitter can be effectively reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
FIG. 1 is a schematic flow chart of a training method of a face key point detection model in an embodiment of the present invention;
FIG. 2 is another schematic flow chart of a training method for a face keypoint detection model according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method of label smoothing according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating an embodiment of a target frame number segment after deduplication processing;
FIG. 5 is a schematic flow chart illustrating a refinement step of step 101 shown in FIG. 1 according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a training device for a face keypoint detection model in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a training method for a face keypoint detection model according to an embodiment of the present invention, where the method includes:
step 101, obtaining a target face sample data set with labeled face key points, wherein the target face sample data set comprises at least one sample subset, and the sample subset comprises continuous multi-frame face video frame images in a face video;
in an embodiment of the present invention, the training method for the face keypoint detection model is mainly implemented by a training device for the face keypoint detection model, the training device is a program module and is stored in a readable storage medium of a computer device, and a processor in the computer device can call and operate the training device for the face keypoint detection model to implement the training method.
In the embodiment of the invention, a target human face sample data set is used for training a first human face key point detection model, so that the trained target human face key point detection model is used for detecting the human face key points of a human face video to be detected, the accuracy of human face key point detection is improved, and the jitter is reduced.
The method comprises the steps of using a target face sample data set with labeled face key points, wherein the target face sample data set contains sample data used for training a first face key point detection model and comprises but is not limited to at least one sample subset, and one sample subset contains continuous multiframe face video frame images in a face video. The number of labeled face key points in each frame of the face video frame image is the same, and is N, where N is a positive integer, for example, N may be 68, that is, 68 face key points, and the 68 face key points may be numbered in sequence, and the numbering sequence is 0 to 67. It is understood that the number of face keypoints that can be detected by the trained target face keypoint model is consistent with the number of face keypoints in each frame of face video frame image in the sample subset, and may be 68 face keypoints, for example.
The labeled human face key points refer to determining a position coordinate value of each human face key point in each frame of human face video frame image according to the number of the set human face key points, and if the position coordinate value of one human face key point is determined, it indicates that the labeling of the human face key point is completed, for example, if the number 39 is preset as the position of the inner corner of the left eye of a human, the inner corner of the left eye of the human needs to be manually found during manual labeling, and the number 39 is labeled, at this time, the position labeled with the number 39 indicates one human face key point 39, and the position coordinate value where the human face key point is located indicates the position coordinate value of the human face key point 39.
And each frame of face video frame image is marked with face key points, and the number of the marked face key points is the same. For example, each frame of the face video frame image is labeled with 68 face key points. It can be understood that the face key points with the same number in different face video frame images represent the same key point, for example, if the face key point with the number 39 is used for identifying the inner canthus of the left eye of a person, the face key points with the number 39 in one sample subset all represent the inner canthus of the left eye of the person, and if the position coordinate values of the face key point with the number 39 in different face video frame images are different, the expression of the face and/or the overall position of the face in the face video frame images are different.
In addition, the labeling of the key points of the human face may be performed in a manual labeling manner, or may also be performed in other automatic labeling manners, which will be described in detail later herein, and will not be described herein again.
It should be understood that, in the embodiment of the present invention, the target face sample data set may only include the sample subset described above, or include the sample subset described above and other sample data, and in the case of including other sample data, the number of face keypoints already labeled in each frame or image is the same, and the labeling rule is consistent.
102, performing annotation smoothing on face key points of a face video frame image in a target sample subset to obtain a re-annotated target sample subset, wherein the target sample subset is any sample subset;
and 103, training the first face key point detection model by using the target face sample data set containing the re-labeled sample subset to obtain the trained target face key point detection model.
In the embodiment of the invention, although the face key points are labeled in the continuous multi-frame face video frame images contained in the sample subset of the target face sample data set, the position relationship between the face key points in the multi-frame face video frame images cannot be noticed by a manual labeling mode or other labeling modes, the position of the same face key point in the multi-frame face video frame images is not necessarily smooth, the probability of jitter is very high, the first face key point detection model is trained by using the continuous multi-frame face video frame images with unsmooth face key points, and the target face key point detection model obtained by training cannot achieve the purpose of eliminating the face key point jitter when the face key point detection is carried out on the face video to be detected. Therefore, in order to fully utilize the position relationship of the same face key point between the continuous multi-frame face video frame images in the sample subsets to realize the jitter removal when the face key point detection is carried out on the face video, the face key point of the continuous multi-frame face video frame images in each sample subset is subjected to the labeling smoothing treatment to obtain the re-labeled sample subset.
The annotation smoothing process is to update the position coordinate values of the same face key point in the continuous multi-frame face video frame images in a sample subset, so that the position coordinate values of the updated face key point in the multi-frame face video frame images can form a smooth line, and the jitter of the face key point in the multi-frame face video frame images in the sample subset can be removed in the annotation smoothing process.
It should be noted that, the target face sample data set includes at least one sample subset, and the face key points in each sample subset need to be labeled and smoothed, and in order to make the technical solution described more clear, the concept of the target sample subset is used here, and the target sample subset represents any sample subset in the target face sample data set. The method may include labeling and smoothing face key points of a face video frame image in a target sample subset to obtain a relabeled target sample subset, and further training a first face key point detection model by using a target face data set including the relabeled sample subset to obtain a trained target face key point detection model. It can be understood that at least one sample subset in the target face sample data set is subjected to label smoothing processing.
In the embodiment of the invention, continuous multiframe human face video frame images in the human face video containing the marked human face key points are marked and smoothed to remove jitter, and the first human face key point detection model is trained by utilizing the re-marked multiframe human face video frame images, so that the position relation of the same human face key point in the continuous human face video frame images with the jitter removed is used in the training process, the model can learn the accuracy of the position of each key point and the stability among the continuous frame key points, the stability and the accuracy of the second human face key point detection model obtained by training can be effectively improved, the method is particularly suitable for detecting the human face key points in the human face video, and the jitter can be effectively reduced.
For better understanding of the technical solution in the embodiment of the present invention, please refer to fig. 2, which is another schematic flow chart of the training method of the face keypoint detection model in the embodiment of the present invention, including:
step 201, obtaining a target face sample data set with labeled face key points, wherein the target face sample data set comprises at least one sample subset, and the sample subset comprises continuous multi-frame face video frame images in a face video;
it is understood that the content of step 201 is similar to that described in step 101 in the embodiment shown in fig. 1, and specific reference may be made to the related content described in step 101, which is not described herein again.
202, acquiring a first position data sequence of the target key point, wherein the first position data sequence comprises position coordinate values of the target key point in a plurality of frames of target face video frame images in the target sample subset, and the position coordinate values in the first position data sequence are ordered according to the frame number of the target face video frame images;
step 203, performing label smoothing processing on the first position data sequence to obtain a second position data sequence of the target key point, and updating labels of the target key points in the multi-frame target face video frame images according to the second position data sequence to obtain a re-labeled target sample subset;
and 204, training the first face key point detection model by using the target face sample data set containing the re-labeled sample subset to obtain the trained target face key point detection model.
In the embodiment of the present invention, the sample subset in the target face sample data set needs to be labeled and smoothed, because the sample subset contains continuous multi-frame face video frame images in the face video, and the labeled and smoothed sample subset can obtain the sample subset with reduced or removed jitter, so that the trained target face keypoint detection model can reduce jitter and obtain a better detection result when performing face keypoint detection on the face video.
Taking the target sample subset as an example, because the target sample subset includes consecutive multi-frame face video frame images, and the number of face key points included in each face video frame image is the same, and the numbering rules are the same, for example, the number of face key points may be 68, and the numbers are 0 to 67, and the same number represents the same type of face key points in different face video frames. A face keypoint, such as that numbered 39, represents the inner corner of the human left eye. Taking a target key point in the target sample subset as an example, a first position data sequence of the target key point may be obtained, where the first position data sequence includes position coordinate values of the target key point in multiple frames of target face video frame images in the target sample subset, and the position coordinate values in the first position data sequence are sorted according to a frame number of the target face video frame image. For example, if the number of the face key points is 68, the target key point may be any one of the 68 face key points. And if the face key point with the number 39 needs to be labeled and smoothed, the face key point 39 can be used as a target key point to obtain a corresponding first position data sequence. If the target sample subset includes 100 continuous frames of face video frame images, the first position data sequence of the face key point 39 includes the position coordinate values of the face key point 39 in the 100 frames of face video frame images, and the position coordinate values are ordered according to the frame number sequence of the 100 frames of face video frame images.
In the embodiment of the invention, after the first position data sequence of the target key point is obtained, the first position data sequence is subjected to labeling smoothing processing to obtain the second position data sequence of the target key point, and the label of the target key point in the multi-frame target human face video frame image in the target sample subset is updated according to the second position data sequence to obtain the re-labeled target sample subset. And labeling each key point according to a processing mode of the key point and the target key point to obtain a re-labeled target sample subset.
Specifically, in a feasible implementation manner, the method for labeling smoothing processing specifically refers to fig. 3, which is a schematic flow chart of the method for labeling smoothing processing in the embodiment of the present invention, and includes:
301, performing deduplication processing on position coordinate values belonging to a target frame number segment in the first position data sequence to enable one target frame number segment to correspond to one position coordinate value so as to obtain a third position data sequence, wherein the target frame number segment comprises at least two consecutive frame numbers, and the position coordinate values corresponding to the two consecutive frame numbers are the same;
step 302, performing label smoothing processing on the third position data sequence to obtain a fourth position data sequence;
step 303, using the position coordinate value corresponding to the target frame number segment in the fourth position data sequence as the position coordinate value corresponding to each of the two consecutive frame numbers to obtain the second position data sequence.
In the embodiment of the present invention, the above steps 301 to 303 describe a process of performing labeling smoothing processing on the first position data sequence to obtain a second position data sequence of the target keypoint.
Each position coordinate value in the first position data sequence corresponds to a frame number, a target frame number segment in the first position data sequence can be determined, wherein the target frame number segment comprises at least two continuous frame numbers, the position coordinate values corresponding to the at least two continuous frame numbers are the same, the position coordinate values belonging to the target frame number segment in the first position data sequence can be subjected to de-duplication processing, one target frame number segment corresponds to one position coordinate value, and a third position data sequence can be obtained. It is understood that the first position data sequence is formed by position coordinate values and may form a curve, and the label smoothing process in the embodiment of the present invention may also be understood as a smoothing process on the curve.
And further, performing labeling smoothing processing on the third position data sequence to obtain a fourth position data sequence, and after the fourth position data sequence is obtained, taking the position coordinate values corresponding to the target frame number segment in the fourth position data sequence as the position coordinate values corresponding to the two continuous frame numbers respectively to obtain a second position coordinate sequence.
For example, in the target sample subset, the first position data sequence of the target keypoint i may be represented as fi(x, y, t), wherein x, y represents coordinate values in a standard two-dimensional coordinate system, and t represents a frame number. Since the face may be stationary for more than 2 frames or more in a certain period of time in the face video corresponding to the target sample subset, the position coordinate values of the target keypoint i in the consecutive frames of the face video may be the same, and a part of the position coordinate values need to be removed by a way of reprocessing. If the first position data sequence includes t 1-t 7 position coordinate values, and the frame numbers of t4, t5 and t6 are consecutive and the position coordinate values are the same, then t4-t7 are used as the target frame number segment, and after the target frame number segment is subjected to the de-duplication processing, as shown in fig. 4, the de-duplication processing is performed on the target frame number segment, which is a schematic diagram of the de-duplication processing of the target frame number segment in the embodiment of the present invention, in fig. 4, the position coordinate values of t1, t2 and t3 are different and need to be reserved, the position coordinate values of t4-t6 are the same, and after the de-duplication processing, the position coordinate values of t4-t6 correspond to one positionThe position coordinate values are used to obtain a third position data sequence, wherein the third position data sequence comprises 5 position coordinate values, which are respectively a position coordinate value corresponding to t1, t2, t3, a position coordinate value corresponding to t4-t6, and a position coordinate value corresponding to t 7.
After the third position data sequence is obtained, the third position data sequence is subjected to labeling smoothing processing to obtain a fourth position data sequence, wherein the fourth position data sequence comprises 5 position coordinate values after smoothing processing, the 5 position coordinate values respectively correspond to t1, t2, t3, t4-t6 and t7, and the position coordinate values are respectively a1, a2, a3, a4 and a 5. And after the smooth processing is labeled, restoring the target frame number segment in the fourth position coordinate value, specifically, taking the position coordinate value corresponding to the target frame number segment in the fourth position coordinate sequence as the position coordinate value corresponding to each of two consecutive frame numbers to obtain the second position data sequence. If the position coordinate value after the smoothing processing of the target frame number segment t4-t6 is a4, it is determined that t4 corresponds to the position coordinate value a4, t5 corresponds to the position coordinate value a4, and t6 corresponds to the position coordinate value a4, and the obtained second position data sequence is a1, a2, a3, a4, a4, a4, and a5, so that each frame number has a corresponding position coordinate value.
In the embodiment of the invention, the duplication elimination processing is performed on the first position data sequence, so as to realize effective annotation smoothing processing on the target key points, so that the annotation smoothing processing effect is better, and the fourth position data sequence obtained by the annotation smoothing processing is restored, so that the omission of data of the face video image frame is avoided, and the data of the target sample subset finally re-annotated is complete.
Further, in the embodiment of the present invention, the step 302 specifically includes the following steps:
b1, placing a preset sliding window at the beginning of the third position data sequence, and determining a position coordinate set in the preset sliding window in the third position data sequence, wherein the length of the preset sliding window is K, and K is a positive odd number, wherein the placing of the preset sliding window at the beginning of the third position data sequence means that a first value in the preset sliding window is a first position coordinate value in the third position data sequence;
b2, determining the side length ratio of the external rectangle of the position coordinate set and a fitting straight line of the position coordinate set, wherein the side length ratio is the ratio of the longest edge and the shortest edge of the external rectangle;
b3, updating the ith position coordinate value in the preset sliding window according to the side length ratio and the fitted straight line, wherein i is (K +1)/2, moving the preset sliding window, and executing the step of determining the position coordinate set in the preset sliding window in the third position data sequence until the preset sliding window is arranged at the tail of the third position data sequence, wherein the fact that the last value in the preset sliding window is the last position coordinate value in the third position data sequence is that the preset sliding window is arranged at the tail of the third position data sequence;
and b4, after the preset sliding window is arranged at the end of the third position data sequence, determining the position data sequence obtained after the ith position coordinate value in the preset sliding window is updated as a fourth position data sequence.
In the embodiment of the present invention, a preset sliding window is provided, where the length of the preset sliding window is K, where K is a positive odd number, the preset sliding window may be located at the beginning of the third location data sequence, and a position coordinate set in the preset sliding window is determined in the third location data sequence, where when the preset sliding window is located at the beginning of the third location data sequence, it means that a first value in the preset sliding window is a first position coordinate value in the third location data sequence. For example, if the third position data sequence includes 100 position coordinate values sorted according to the frame number and the value of K is 7, when the predetermined sliding window is set at the beginning of the third position data sequence, the predetermined sliding window includes 7 position coordinate values from t1 to t7, and the position coordinate value t1 is the first position coordinate value in the third position data sequence.
And the circumscribed rectangle of the position coordinate set is to be determined, specifically, the long side of the circumscribed rectangle is determined, if the position coordinate value is taken as the value in the two-dimensional standard coordinate system, the maximum value in the x coordinate direction in the position coordinate set is determined in the x coordinate direction,namely max (K)x) And determining the minimum value in the x coordinate direction, i.e., min (K), in the set of position coordinatesx) The side length of the circumscribed rectangle in the x coordinate direction is
Lx=max(Kx)-min(Kx)
Similarly, the side length of the external rectangle in the y coordinate direction can be obtained as follows:
Ly=max(Ky)-min(Ky)
wherein L isxRepresenting the side length, L, of the circumscribed rectangle in the x-coordinate directionyRepresenting the side length of the circumscribed rectangle in the y-coordinate direction, KyRepresenting the component of the position coordinate values in the set of position coordinates in the y coordinate direction.
Solving the ratio of the longest edge and the shortest edge of the external rectangle to serve as the side length ratio of the external rectangle, wherein the formula of the side length ratio is as follows: max (L)x,Ly)/min(Lx,Ly)。
Furthermore, a fitting straight line of the position coordinate set may be determined, where the formula of the fitting straight line is y ═ kx + b, and the ith position coordinate value in the preset sliding window may be updated according to the side length ratio of the circumscribed rectangle of the position coordinate set and the fitting straight line of the position coordinate set, where i is (K +1)/2, for example, if the value of K is 7, the 4 th position coordinate value in the preset sliding window is updated.
In the above, the preset sliding window is set in the third position data sequence as an example, it is understood that the sliding step length of the preset sliding window may also be set to be 1, namely, each time the position coordinate data is moved, the ith position coordinate value in the sliding window is updated according to the above mode every time the sliding window is moved once until the preset sliding window is arranged at the end of the third position data sequence, the ith position coordinate value in the preset sliding window is updated when the preset sliding window is arranged at the end of the third position data sequence, updating the position coordinate value of the whole third position data sequence is completed to obtain a fourth position data sequence, the preset sliding window is arranged at the end of the third position data sequence, and the last value in the preset sliding window is the coordinate value of the last position in the third position data sequence. For example, if the third position data sequence includes 100 position coordinate values, and the corresponding frame numbers are 1 to 100, respectively, and the size of the preset sliding window is 7, the number of the position coordinate values in the preset sliding window is 7 each time, when the preset sliding window is located at the beginning of the third position data sequence, the position coordinate set includes the position coordinate values from t1 to t7, the 4 th position coordinate value, i.e., the position coordinate value of t4, is updated, and is moved by one position coordinate value according to the step length, the position coordinate set includes the position coordinate values from t2 to t8, and t4 is the updated position coordinate value, at this time, the 4 th position coordinate value, i.e., the position coordinate value of t5, and so on, the update of the position coordinate values from t4 to t96 is completed, and each update, the last updated position data sequence is used to continue to slide the preset sliding window by one step length, the sliding window will reach the end of the third position data sequence, including the position coordinate values from t94 to t100, and update the 4 th position coordinate value, i.e. the position coordinate value of t97, wherein t100 is the last position coordinate value of the third position data sequence, so that the preset sliding window cannot be moved continuously, and the update of the position coordinate value of the third position data sequence is finished, so as to obtain the final fourth position data sequence.
It can be understood that, in the embodiment of the present invention, since the ith position coordinate value in the sliding window is updated, and i is (K +1)/2, the first (K-1)/2 position coordinate values and the last (K-1)/2 position coordinate values in the third position data sequence cannot be updated using the sliding window, in practical applications, since the influence of the ending position coordinate value on the whole is relatively small, the update may be abandoned, or after the smoothing process on the target key points in a target sample subset is completed, the face video image frames in the target sample subset whose head and tail position coordinate values are not updated are deleted. In practical applications, the retention or deletion may be selected according to needs, and is not limited herein. In addition, the above-mentioned i ═ k +1)/2 is a preferable mode for determining the position coordinate value to be updated, and may be other, for example, i ═ [ (k +1)/2] +1, or i ═ [ (k +1)/2] — 1, or may be any number other than the first and last two numbers of k, and i may be determined as necessary in practical application, and is not limited herein.
In an embodiment of the present invention, in the step b3, the manner of updating the ith position coordinate value in the preset sliding window according to the side length ratio and the fitted straight line may be specifically: when the side length ratio is smaller than a preset first threshold and the judgment coefficient of the fitted straight line is smaller than a preset second threshold, determining a gravity center point of a position coordinate value contained in the position coordinate set; and determining the distance from each position coordinate value contained in the position coordinate set to the gravity center point, and updating the ith position coordinate value to the position coordinate value of the minimum distance.
Wherein a decision coefficient (coefficient of determination) for fitting the straight line is to be determined, which may be denoted as R2And the judgment coefficient represents the proportion of the regression sum of squares to the sum of squares of the total error, the reaction is the fitting of a fitting straight line, and the value range is [0,1 ]]In between, the closer R2 is to 1, the better the fit of the straight line is; the closer R2 approaches 0, the worse the fit of the fitted line.
The judgment coefficient of the fitting straight line is compared with a preset second threshold, and when the judgment coefficient is smaller than the preset second threshold, the fitting straight line is more prone to a curve or a broken line, and the comparison of fitting is poor.
When the side length ratio is smaller than the preset first threshold, the direction of a connecting line of K position coordinate values in the preset sliding window is likely to tend to a curve or a broken line instead of a straight line.
When the side length ratio is smaller than a preset first threshold and the decision coefficient of the fitted straight line is smaller than a preset second threshold, it indicates that jitter exists among the K position coordinate values in the preset sliding window, and the ith position coordinate value needs to be eliminated and updated, so that the jitter can be eliminated.
The specific way of updating the ith position coordinate value is as follows: and calculating the gravity center point of the K position coordinate values in the preset sliding window, solving the distance from each position coordinate value to the gravity center point, and taking the position coordinate value corresponding to the minimum distance as the updated ith position coordinate value. For example, if the value of K is 7, the K position coordinate values are t1 to t7, the calculated gravity center point is o, the distances from t1 to t7 to the gravity center point o are calculated, and the distances are d1 to d7, respectively, and the minimum value among the 7 distances is selected as d2, the position coordinate value of t2 is used as the position coordinate value after updating of t 4.
In the embodiment of the invention, the jitter of the target key point in the target sample subset can be effectively removed in a mode of label smoothing processing, so that a target face key point detection model obtained by training the target sample subset can refer to the position relation of the target key point in different face video image frames, and the accuracy of face key point detection by using the target face key point detection model is improved.
It is understood that the above-mentioned label smoothing method is applicable to a curve formed by a first position data sequence of a target key point, and is particularly applicable to a small range of knots, shakes and the like appearing in the curve, and is generally applied to smoothing shakes appearing in a case where the face motion is small or static, and may be referred to as a first shaking scene.
In addition, under another shaking condition, specifically, when the human face moves relatively fast, the curve formed by the first position data sequence of the target key point is not smooth and smooth, and appears in a broken line shape, which may be referred to as a second shaking scene.
For the second jitter scenario, a smoothing process may also be performed, and the labeling smoothing process performed on the third position data sequence in step 302 to obtain the fourth position data sequence specifically includes:
and carrying out labeling smoothing processing on the third position data sequence by utilizing a preset smoothing algorithm to obtain a fourth position data sequence, wherein the smoothing algorithm is a median filtering algorithm, a Gaussian filtering algorithm or a Savitzky-Golay filter. By the method, the jitter in the second jitter scene can be effectively removed.
It can be understood that, in practical application, the jitter in the first type of jitter scene may be removed first, and then the jitter in the second type of jitter scene is removed, or the jitter in the second type of jitter scene may be removed first, and then the jitter in the first type of jitter scene is removed, or whether the jitter in the first type of jitter scene and the jitter in the second type of jitter scene exists may be determined first, and a corresponding manner is selected to remove the jitter, which may be specifically set according to actual needs, and is not limited herein.
In the embodiment of the present invention, the multi-frame face video frame images in the sample subset of the target face sample data set may be pre-labeled in a manual labeling manner, and then re-labeled in a manner of label smoothing processing, or may be pre-labeled in an automatic labeling manner without a manual labeling manner, so that the pre-labeling is realized in an automatic labeling manner. Specifically, referring to fig. 5, a schematic flow chart of the refining step of step 101 shown in fig. 1 according to the embodiment of the present invention includes:
501, obtaining an initial face sample data set, wherein the initial face sample data set comprises a plurality of common face sample images with face key points marked, and the common face sample images are single-person single images which are discontinuous in time;
step 502, training a first face key point detection model by using an initial face sample data set to obtain a second face key point detection model;
step 503, inputting the face video into the second face key point detection model, so as to perform face key point labeling on the continuous multi-frame face video frame images in the face video, obtain a sample subset of the face video, and obtain a target face sample data set.
In the embodiment of the invention, an initial face sample data set is obtained, the initial face sample data set comprises a plurality of common face sample images marked with face key points, the common face sample image is a single-person single-image discontinuous in time, the initial face sample data set is firstly utilized to train a first face key point detection model to obtain a second face key point detection model, and as can be understood, the second face key point detection module can detect face key points and can be used for labeling the face key points, therefore, the face video can be input into the second face key point detection model to label the face key points of the continuous multi-frame face video frame image information in the face video to obtain a sample subset of the face video, and the sample subset of the face video can form the target face sample data set.
Further, after labeling the multiple frames of continuous video frame images in the sample subset based on the second face keypoint detection model, in step 103, the first face keypoint detection model may be trained by using a target face sample data set including the re-labeled sample subset, or by using the initial face sample data set and the re-labeled target face sample data set, so as to obtain a trained target face keypoint detection module, so that the model training can be completed.
Or, in another feasible implementation manner, the target face sample data set after re-labeling may be utilized, or the initial face sample data set and the target face sample data set after re-labeling are utilized to perform fine adjustment on the second face key point detection model, so as to obtain the target face key point detection model.
The fine tuning process generally performs two types of operations: one class is the output of the modification model (e.g., number or kind of modification classification types, type or parameters of modification loss functions); one is that initial values of parameters of the network are not generated randomly during training, but the parameters of the model trained on a large data set are used as initial values of training. The reason for this is that the parameters in the model trained on a large data set already contain a large number of useful convolution filters, rather than initializing all the parameters of the model from scratch, using the already trained parameters as the starting point for training. By doing so, not only can a large amount of training time be saved, but also the performance of the classifier can be improved. The second face key point detection model obtained by training by using initial face sample data is finely adjusted, so that a large amount of training time can be effectively saved, the performance of the target face key point detection model obtained by training is improved, and at least a re-labeled multi-frame face video frame image is used for fine adjustment, so that the finely adjusted target face key point detection model can refer to the position relation among face key points in the multi-frame face video frame image, when the target face key point detection model is used for detecting the face key points, the face key points with little or no shake can be obtained, and the detection effect is good.
In the embodiment of the invention, the continuous multiframe human face video frame images in the human face video containing the marked human face key points are marked and smoothed to remove the jitter, and the first human face key point detection model is trained by utilizing the re-marked multiframe human face video frame images, so that the position relation of the same human face key point in the continuous human face video frame images with the jitter removed is used in the training process, the stability and the accuracy of the trained second human face key point detection model can be effectively improved, the method is particularly suitable for detecting the human face key points in the human face video, and the jitter can be effectively reduced.
It can be understood that the target face key point detection model obtained through training is preferentially applied to detection of face key points in a face video, specifically, a face video to be detected can be input into the target face key point detection model, face key points contained in a face video frame image of each frame in the face video are output, and the face key points are small in smooth jitter and good in detection effect.
Please refer to fig. 6, which is a schematic structural diagram of a training apparatus for a face keypoint detection model according to an embodiment of the present invention, including:
an obtaining module 601, configured to obtain a target face sample data set with labeled face key points, where the target face sample data set includes at least one sample subset, and the sample subset includes multiple continuous frames of face video frame images in a face video;
a smoothing module 602, configured to perform labeling smoothing processing on face key points of a face video frame image in a target sample subset to obtain a target sample subset that is re-labeled, where the target sample subset is any sample subset;
the training module 603 is configured to train the first face keypoint detection model by using the target face sample data set including the re-labeled sample subset, so as to obtain a trained target face keypoint detection model.
In the embodiment of the present invention, the content described in the apparatus embodiment shown in fig. 6 is similar to the content described in the foregoing method embodiment, and may specifically refer to the related content in the foregoing method embodiment, which is not described herein again.
In the embodiment of the invention, the continuous multiframe human face video frame images in the human face video containing the marked human face key points are marked and smoothed to remove the jitter, and the first human face key point detection model is trained by utilizing the re-marked multiframe human face video frame images, so that the position relation of the same human face key point in the continuous human face video frame images with the jitter removed is used in the training process, the stability and the accuracy of the trained second human face key point detection model can be effectively improved, the method is particularly suitable for detecting the human face key points in the human face video, and the jitter can be effectively reduced.
In one embodiment, a computer-readable storage medium is proposed, in which a computer program is stored which, when executed by a processor, causes the processor to carry out the steps of:
acquiring a target face sample data set with marked face key points, wherein the target face sample data set comprises at least one sample subset, and the sample subset comprises continuous multi-frame face video frame images in a face video;
performing annotation smoothing processing on face key points of a face video frame image in a target sample subset to obtain a re-annotated target sample subset, wherein the target sample subset is any one sample subset;
and training the first face key point detection model by using the target face sample data set containing the re-labeled sample subset to obtain the trained target face key point detection model.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A training method for a face key point detection model is characterized by comprising the following steps:
acquiring a target face sample data set with marked face key points, wherein the target face sample data set comprises at least one sample subset, and the sample subset comprises continuous multi-frame face video frame images in a face video;
performing annotation smoothing processing on face key points of a face video frame image in a target sample subset to obtain a re-annotated target sample subset, wherein the target sample subset is any one sample subset;
and training the first face key point detection model by using the target face sample data set containing the re-labeled sample subset to obtain the trained target face key point detection model.
2. The method according to claim 1, wherein the performing annotation smoothing on the face key points of the face video frame image in the target sample subset to obtain a re-annotated target sample subset comprises:
acquiring a first position data sequence of a target key point, wherein the first position data sequence comprises position coordinate values of the target key point in multi-frame target face video frame images in the target sample subset, the position coordinate values in the first position data sequence are sorted according to the frame number of the target face video frame images, and the target key point is any face key point;
and performing annotation smoothing processing on the first position data sequence to obtain a second position data sequence of the target key point, and updating the annotation of the target key point in the multi-frame target face video frame image according to the second position data sequence to obtain a re-annotated target sample subset.
3. The method according to claim 2, wherein the performing labeling smoothing processing on the first position data sequence to obtain a second position data sequence of the target keypoint comprises:
carrying out deduplication processing on position coordinate values belonging to a target frame number segment in the first position data sequence to enable one target frame number segment to correspond to one position coordinate value so as to obtain a third position data sequence, wherein the target frame number segment comprises at least two continuous frame numbers, and the position coordinate values corresponding to the two continuous frame numbers are the same;
performing label smoothing processing on the third position data sequence to obtain a fourth position data sequence;
and taking the position coordinate value corresponding to the target frame number segment in the fourth position data sequence as the position coordinate value corresponding to each of the two continuous frame numbers to obtain the second position data sequence.
4. The method of claim 3, wherein the label smoothing the third position data sequence to obtain a fourth position data sequence comprises:
placing a preset sliding window at the beginning of a third position data sequence, and determining a position coordinate set in the preset sliding window in the third position data sequence, wherein the length of the preset sliding window is K, and K is a positive odd number, and the fact that the preset sliding window is placed at the beginning of the third position data sequence means that a first value in the preset sliding window is a first position coordinate value in the third position data sequence;
determining a side length ratio of an external rectangle of the position coordinate set and a fitting straight line of the position coordinate set, wherein the side length ratio is the ratio of the longest edge and the shortest edge of the external rectangle;
updating the ith position coordinate value in the preset sliding window according to the side length ratio and the fitting straight line, wherein i is any value in [1, K ], moving the preset sliding window, and executing the step of determining the position coordinate set in the preset sliding window in the third position data sequence until the preset sliding window is arranged at the tail of the third position data sequence, wherein the step that the last value in the preset sliding window is the last position coordinate value in the third position data sequence is realized when the preset sliding window is arranged at the tail of the third position data sequence;
and after the preset sliding window is arranged at the end of the third position data sequence, determining a position data sequence obtained after the ith position coordinate value in the preset sliding window is updated as the fourth position data sequence.
5. The method of claim 4, wherein the updating the i-th position coordinate value within the preset sliding window according to the side length ratio and the fitted straight line comprises:
when the side length ratio is smaller than a preset first threshold and the judgment coefficient of the fitting straight line is smaller than a preset second threshold, determining a gravity center point of a position coordinate value contained in the position coordinate set;
and determining the distance from each position coordinate value contained in the position coordinate set to the gravity center point, and updating the ith position coordinate value to the position coordinate value of the minimum distance.
6. The method of claim 3, wherein the label smoothing the third position data sequence to obtain a fourth position data sequence comprises:
and performing labeling smoothing processing on the third position data sequence by using a preset smoothing algorithm to obtain a fourth position data sequence, wherein the smoothing algorithm is a median filtering algorithm, a Gaussian filtering algorithm or a Savitzky-Golay filter.
7. The method according to any one of claims 1 to 6, wherein the obtaining the target face sample data set with labeled face key points comprises:
acquiring an initial face sample data set, wherein the initial face sample data set comprises a plurality of common face sample images with face key points marked, and the common face sample images are single-person single images which are discontinuous in time;
training the first face key point detection model by using the initial face sample data set to obtain a second face key point detection model;
inputting the face video into the second face key point detection model to perform face key point labeling on continuous multi-frame face video frame images in the face video, so as to obtain the sample subset of the face video and obtain the target face sample data set.
8. The method of claim 7, wherein the training the first face keypoint detection model using the target face sample set comprising the re-labeled sample subset to obtain the trained target face keypoint detection model comprises:
and fine-tuning the second face key point detection model by using the target face sample data set after re-labeling or by using the initial face sample data set and the target face sample data set after re-labeling to obtain the target face key point detection model.
9. A training device for a face key point detection model, the device comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target face sample data set with labeled face key points, the target face sample data set comprises at least one sample subset, and the sample subset comprises continuous multi-frame face video frame images in a face video;
the smoothing module is used for carrying out labeling smoothing processing on the face key points of the face video frame images in the target sample subsets to obtain re-labeled target sample subsets, wherein the target sample subsets are any sample subsets;
and the training module is used for training the first face key point detection model by using the target face sample data set containing the re-labeled sample subset to obtain the trained target face key point detection model.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 8.
CN202010794471.XA 2020-08-07 2020-08-07 Training method and device for human face key point detection model and storage medium Active CN112101105B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010794471.XA CN112101105B (en) 2020-08-07 2020-08-07 Training method and device for human face key point detection model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010794471.XA CN112101105B (en) 2020-08-07 2020-08-07 Training method and device for human face key point detection model and storage medium

Publications (2)

Publication Number Publication Date
CN112101105A true CN112101105A (en) 2020-12-18
CN112101105B CN112101105B (en) 2024-04-09

Family

ID=73754402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010794471.XA Active CN112101105B (en) 2020-08-07 2020-08-07 Training method and device for human face key point detection model and storage medium

Country Status (1)

Country Link
CN (1) CN112101105B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750139A (en) * 2021-01-18 2021-05-04 腾讯科技(深圳)有限公司 Image processing method and device, computing equipment and storage medium
CN113158982A (en) * 2021-05-17 2021-07-23 广东中卡云计算有限公司 Semi-intrusive target key point marking method
CN113822254A (en) * 2021-11-24 2021-12-21 腾讯科技(深圳)有限公司 Model training method and related device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359575A (en) * 2018-09-30 2019-02-19 腾讯科技(深圳)有限公司 Method for detecting human face, method for processing business, device, terminal and medium
CN109508678A (en) * 2018-11-16 2019-03-22 广州市百果园信息技术有限公司 Training method, the detection method and device of face key point of Face datection model
WO2020037898A1 (en) * 2018-08-23 2020-02-27 平安科技(深圳)有限公司 Face feature point detection method and apparatus, computer device, and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020037898A1 (en) * 2018-08-23 2020-02-27 平安科技(深圳)有限公司 Face feature point detection method and apparatus, computer device, and storage medium
CN109359575A (en) * 2018-09-30 2019-02-19 腾讯科技(深圳)有限公司 Method for detecting human face, method for processing business, device, terminal and medium
CN109508678A (en) * 2018-11-16 2019-03-22 广州市百果园信息技术有限公司 Training method, the detection method and device of face key point of Face datection model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姚丽莎;张军委;房波;张绍雷;周欢;赵凤;: "基于LBP和SVM的人脸表情识别系统的设计与实现", 贵州师范大学学报(自然科学版), no. 01 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750139A (en) * 2021-01-18 2021-05-04 腾讯科技(深圳)有限公司 Image processing method and device, computing equipment and storage medium
CN113158982A (en) * 2021-05-17 2021-07-23 广东中卡云计算有限公司 Semi-intrusive target key point marking method
CN113822254A (en) * 2021-11-24 2021-12-21 腾讯科技(深圳)有限公司 Model training method and related device
CN113822254B (en) * 2021-11-24 2022-02-25 腾讯科技(深圳)有限公司 Model training method and related device

Also Published As

Publication number Publication date
CN112101105B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN112101105A (en) Training method and device for face key point detection model and storage medium
CN108009465B (en) Face recognition method and device
CN105144239B (en) Image processing apparatus, image processing method
CN109035246B (en) Face image selection method and device
US11403874B2 (en) Virtual avatar generation method and apparatus for generating virtual avatar including user selected face property, and storage medium
US20160283780A1 (en) Positioning feature points of human face edge
CN110874594A (en) Human body surface damage detection method based on semantic segmentation network and related equipment
CN107633237B (en) Image background segmentation method, device, equipment and medium
CN109492576B (en) Image recognition method and device and electronic equipment
JP2015069495A (en) Person recognition apparatus, person recognition method, and person recognition program and recording medium therefor
CN113689324B (en) Automatic portrait object adding and deleting method and device based on two classification labels
CN111968134A (en) Object segmentation method and device, computer readable storage medium and computer equipment
CN109726195A (en) A kind of data enhancement methods and device
CN111368632A (en) Signature identification method and device
CN112200056A (en) Face living body detection method and device, electronic equipment and storage medium
CN113657370B (en) Character recognition method and related equipment thereof
CN112084855B (en) Outlier elimination method for video stream based on improved RANSAC method
CN112036256A (en) Human face key point training device
CN117253110A (en) Diffusion model-based target detection model generalization capability improving method
CN112270747A (en) Face recognition method and device and electronic equipment
CN112464839A (en) Portrait segmentation method, device, robot and storage medium
WO2020244076A1 (en) Face recognition method and apparatus, and electronic device and storage medium
CN111125391B (en) Database updating method and device, electronic equipment and computer storage medium
CN111178200A (en) Identification method of instrument panel indicator lamp and computing equipment
CN115511731A (en) Noise processing method and noise processing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant