CN113065457B

CN113065457B - Face detection point processing method and device, computer equipment and storage medium

Info

Publication number: CN113065457B
Application number: CN202110343050.XA
Authority: CN
Inventors: 朱耀宇; 巩汝何
Original assignee: Guangzhou Fanxing Huyu IT Co Ltd
Current assignee: Guangzhou Fanxing Huyu IT Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2024-05-17
Anticipated expiration: 2041-03-30
Also published as: CN113065457A

Abstract

The embodiment of the application discloses a face detection point processing method, a face detection point processing device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring a coordinate variation based on the first coordinate and the second coordinate of the same face detection point; determining a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the coordinate variation and the detection error in response to the coordinate variation belonging to a reference threshold range; and weighting the first coordinate and the second coordinate based on the first weight and the second weight to obtain a third coordinate of the face detection point in the second video frame. According to the method, the first coordinates and the second coordinates are subjected to smoothing processing according to the weights corresponding to the coordinates of the face detection points in the first video frame and the second video frame, so that the correction of the second coordinates is realized, and the coordinate accuracy of the face detection points is improved.

Description

Face detection point processing method and device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a face detection point processing method, a face detection point processing device, computer equipment and a storage medium.

Background

Along with the development of computer technology and face detection technology, the application of the face detection technology is also more and more widespread, but the position of a face detection point detected by adopting the face detection technology has errors, so that the detected face detection point is inaccurate.

Disclosure of Invention

The embodiment of the application provides a face detection point processing method, a device, computer equipment and a storage medium, which improve the coordinate accuracy of the face detection point. The technical scheme is as follows:

in one aspect, a face detection point processing method is provided, and the method includes:

Acquiring a coordinate variation based on a first coordinate and a second coordinate of the same face detection point, wherein the first coordinate is a coordinate of the face detection point in a first video frame, the second coordinate is a coordinate of the face detection point in a second video frame, and the first video frame is a previous video frame of the second video frame;

Determining a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the coordinate variation and a detection error in response to the coordinate variation belonging to a reference threshold range, wherein the detection error is used for representing an error between coordinates of the same face detection point in the face in two adjacent video frames when the face is in a static state;

and weighting the first coordinate and the second coordinate based on the first weight and the second weight to obtain a third coordinate of the face detection point in the second video frame.

In one possible implementation manner, the first coordinate includes a first abscissa and a first ordinate, the second coordinate includes a second abscissa and a second ordinate, the coordinate variation includes an abscissa variation and an ordinate variation, and the acquiring the coordinate variation based on the first coordinate and the second coordinate of the same face detection point includes:

determining an absolute value of a difference between the first abscissa and the second abscissa as the abscissa variation amount;

and determining the absolute value of the difference between the first ordinate and the second ordinate as the ordinate variation.

In one possible implementation manner, before the acquiring the coordinate variation based on the first coordinate and the second coordinate of the same face detection point, the method further includes:

acquiring the size of the first video frame or the second video frame, wherein the size comprises a horizontal length and a vertical length;

The first coordinates include a first abscissa and a first ordinate, the second coordinates include a second abscissa and a second ordinate, and the acquiring the coordinate variation based on the first coordinates and the second coordinates of the same face detection point includes:

acquiring a first difference between the first abscissa and the second abscissa and a second difference between the first ordinate and the second ordinate;

and adjusting the first difference value and the second difference value, taking the adjusted first difference value as the abscissa variation, and taking the adjusted second difference value as the ordinate variation, so that the ratio between the abscissa variation and the ordinate variation is consistent with the ratio between the horizontal length and the vertical length.

In one possible implementation manner, the adjusting the first difference value and the second difference value, taking the adjusted first difference value as the abscissa variation, and taking the adjusted second difference value as the ordinate variation, so that the ratio between the abscissa variation and the ordinate variation matches the ratio between the horizontal length and the vertical length, includes:

Determining a product of the first difference and the horizontal length as the abscissa variation;

determining a product of the second difference and the vertical length as the ordinate variation.

In one possible implementation manner, the acquiring the coordinate variation based on the first coordinate and the second coordinate of the same face detection point includes:

acquiring an absolute value of a difference value between the first coordinate and the second coordinate, and taking the absolute value as the coordinate variation;

The determining, in response to the coordinate variation belonging to a reference threshold range, a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the coordinate variation and a detection error includes:

responding to the coordinate variation belonging to the reference threshold range, and carrying out normalization processing on the coordinate variation to obtain a processed coordinate variation;

And determining the first weight and the second weight based on the processed coordinate variation and the detection error.

In one possible implementation manner, the coordinate variation includes an abscissa variation and an ordinate variation, the reference threshold range includes an abscissa threshold range and an ordinate threshold range, and the normalizing the coordinate variation in response to the coordinate variation belonging to the reference threshold range to obtain a processed coordinate variation includes:

Responding to the abscissa variation belongs to the abscissa threshold range, and carrying out normalization processing on the abscissa variation to obtain processed abscissa variation; or alternatively

And responding to the ordinate variation belongs to the ordinate threshold value range, and carrying out normalization processing on the ordinate variation to obtain the processed ordinate variation.

In one possible implementation manner, the coordinate variation includes an abscissa variation and an ordinate variation, and the determining, based on the coordinate variation and the detection error, a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate includes:

Determining a first abscissa weight corresponding to the first coordinate and a second abscissa weight corresponding to the second coordinate based on the abscissa variation and the detection error, wherein the sum of the first abscissa weight and the second abscissa weight is 1;

and determining a first ordinate weight corresponding to the first coordinate and a second ordinate weight corresponding to the second coordinate based on the ordinate variation and the detection error, wherein the sum of the first ordinate weight and the second ordinate weight is 1.

In one possible implementation manner, the determining, based on the abscissa variation and the detection error, a first abscissa weight corresponding to the first coordinate and a second abscissa weight corresponding to the second coordinate includes:

determining the sum of the product of the abscissa variation and the detection parameter and the detection error as the first abscissa weight, wherein the detection parameter is the difference value between 1 and the detection error;

determining a difference of 1 and the first abscissa weight as the second abscissa weight;

the determining, based on the ordinate variation and the detection error, a first ordinate weight corresponding to the first coordinate and a second ordinate weight corresponding to the second coordinate includes:

Determining the sum of the product of the ordinate variation and the detection parameter and the detection error as the first ordinate weight;

And determining a difference between 1 and the first ordinate weight as the second ordinate weight.

In one possible implementation manner, the third coordinate includes a third abscissa and a third ordinate, and the weighting processing is performed on the first coordinate and the second coordinate based on the first weight and the second weight to obtain the third coordinate of the face detection point in the second video frame, where the weighting processing includes:

Weighting the first abscissa of the first coordinates and the second abscissa of the second coordinates based on the first abscissa weight and the second abscissa weight to obtain the third abscissa;

And carrying out weighting processing on a first ordinate of the first coordinates and a second ordinate of the second coordinates based on the first ordinate weight and the second ordinate weight to obtain the third ordinate.

And respectively carrying out face detection on the first video frame and the second video frame to obtain the first coordinate and the second coordinate.

In another aspect, a face detection point processing apparatus is provided, the apparatus including:

The change amount acquisition module is used for acquiring a coordinate change amount based on a first coordinate and a second coordinate of the same face detection point, wherein the first coordinate is the coordinate of the face detection point in a first video frame, the second coordinate is the coordinate of the face detection point in a second video frame, and the first video frame is the last video frame of the second video frame;

The weight determining module is used for determining a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the coordinate variation and detection errors in response to the coordinate variation belonging to a reference threshold range, wherein the detection errors are used for representing errors between coordinates of the same face detection point in the face in two adjacent video frames when the face is in a static state;

And the coordinate determining module is used for carrying out weighting processing on the first coordinate and the second coordinate based on the first weight and the second weight to obtain a third coordinate of the face detection point in the second video frame.

In one possible implementation manner, the first coordinate includes a first abscissa and a first ordinate, the second coordinate includes a second abscissa and a second ordinate, the coordinate variation includes an abscissa variation and an ordinate variation, and the variation obtaining module includes:

A first acquisition unit configured to determine an absolute value of a difference between the first abscissa and the second abscissa as the abscissa variation amount;

And a second acquisition unit configured to determine an absolute value of a difference between the first ordinate and the second ordinate as the ordinate variation.

In one possible implementation, the apparatus further includes:

A size acquisition module, configured to acquire a size of the first video frame or the second video frame, where the size includes a horizontal length and a vertical length;

the first coordinate includes a first abscissa and a first ordinate, the second coordinate includes a second abscissa and a second ordinate, and the variation obtaining module includes:

A difference value obtaining unit configured to obtain a first difference value between the first abscissa and the second abscissa, and a second difference value between the first ordinate and the second ordinate;

And the difference value adjusting unit is used for adjusting the first difference value and the second difference value, taking the adjusted first difference value as the abscissa variation and taking the adjusted second difference value as the ordinate variation so as to enable the ratio between the abscissa variation and the ordinate variation to be consistent with the ratio between the horizontal length and the vertical length.

In one possible implementation manner, the difference value adjusting unit is configured to:

In one possible implementation manner, the variation obtaining module is configured to obtain an absolute value of a difference value between the first coordinate and the second coordinate, and take the absolute value as the coordinate variation;

The weight determination module includes:

The normalization unit is used for responding to the coordinate variation belonging to the reference threshold range and carrying out normalization processing on the coordinate variation to obtain the processed coordinate variation;

and a weight determining unit configured to determine the first weight and the second weight based on the processed coordinate variation and the detection error.

In one possible implementation, the coordinate variation includes an abscissa variation and an ordinate variation, the reference threshold range includes an abscissa threshold range and an ordinate threshold range, and the normalizing unit is configured to:

In one possible implementation, the coordinate variation includes an abscissa variation and an ordinate variation, and the weight determining module includes:

The weight determining unit is used for determining a first abscissa weight corresponding to the first coordinate and a second abscissa weight corresponding to the second coordinate based on the abscissa variation and the detection error, and the sum of the first abscissa weight and the second abscissa weight is 1;

the weight determining unit is further configured to determine, based on the ordinate variation and the detection error, a first ordinate weight corresponding to the first coordinate and a second ordinate weight corresponding to the second coordinate, where a sum of the first ordinate weight and the second ordinate weight is 1.

In one possible implementation manner, the weight determining unit is configured to:

The weight determining unit is further configured to:

In one possible implementation, the third coordinate includes a third abscissa and a third ordinate, and the coordinate determining module is configured to:

In one possible implementation, the apparatus further includes:

and the coordinate detection module is used for respectively carrying out face detection on the first video frame and the second video frame to obtain the first coordinate and the second coordinate.

In another aspect, there is provided a computer device including a processor and a memory in which at least one computer program is stored, the at least one computer program being loaded and executed by the processor to implement the operations performed in the face detection point processing method as described in the above aspect.

In another aspect, there is provided a computer readable storage medium having stored therein at least one computer program loaded and executed by a processor to implement the operations performed in the face detection point processing method as described in the above aspect.

In another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising computer program code, the computer program code being stored in a computer readable storage medium, the computer program code being loaded and executed by a processor to implement the operations performed in the face detection point processing method as described in the above aspect.

The technical scheme provided by the embodiment of the application has the beneficial effects that at least:

According to the method, the device, the computer equipment and the storage medium provided by the embodiment of the application, the coordinates of the same face detection point in the first video frame and the second video frame are obtained, the first coordinates and the second coordinates are subjected to smoothing processing according to the obtained weights, so that the correction of the second coordinates is realized, the corrected coordinates can more accurately represent the positions of the face detection point in the second video frame compared with the second coordinates, the coordinate accuracy of the face detection point is improved, and the inaccuracy of the coordinates caused by jitter of the face detection point is avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a face detection point processing method provided in an embodiment of the present application;

Fig. 2 is a flowchart of another face detection point processing method according to an embodiment of the present application;

fig. 3 is a schematic coordinate diagram of a face detection point according to an embodiment of the present application;

fig. 4 is a flowchart of another face detection point processing method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a face detection point processing device according to an embodiment of the present application;

Fig. 6 is a schematic structural diagram of another face detection point processing apparatus according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings.

It is to be understood that the terms "first," "second," and the like, as used herein, may be used to describe various concepts, but are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first video frame may be referred to as a second video frame and a second video frame may be referred to as a first video frame without departing from the scope of the present application.

The terms "at least one", "a plurality", "each", "any" and the like as used herein, at least one includes one, two or more, a plurality includes two or more, each means each of the corresponding plurality, and any one means any of the plurality. For example, the plurality of video frames includes 3 video frames, and each video frame refers to each of the 3 video frames, and any one of the 3 video frames may be the first, the second, or the third.

In order to facilitate understanding of the embodiments of the present application, keywords related to the embodiments of the present application are explained:

face shake: the face jitter refers to unstable coordinate data of a face detection point, namely, the detected coordinates of the face detection point are not completely consistent with the actual coordinates of the face detection point. For example, the same detection point of a face in the same video frame or image is detected at different times, and coordinates of the two obtained detection points of the face are different; or for two adjacent video frames, when the faces in the two video frames are in a static state, the coordinates of the same face detection points in the faces are different. The face position changes suddenly, and the face detection is affected, so that the face shake is more obvious.

The method provided by the embodiment of the application can be applied to various scenes.

For example, in a scenario where virtual elements are added to a face. When a user live broadcast or shoots a video, virtual elements can be added in a human face, taking adding the virtual elements in a human eye area as an example, determining the human eye area according to a plurality of detected human eye detection points by detecting the plurality of human eye detection points corresponding to the human eye area, then adding the virtual elements in the human eye area, and if the detected detection points are inaccurate in coordinates, the positions of the added virtual elements are offset.

The face detection point processing method provided by the embodiment of the application can be executed by a terminal, a server or both, and is described below by taking the terminal as an example.

Fig. 1 is a flowchart of a face detection point processing method provided in an embodiment of the present application. The execution main body of the embodiment of the application is a terminal. Referring to fig. 1, the method comprises the steps of:

101. The terminal obtains the coordinate variation based on the first coordinate and the second coordinate of the same face detection point.

The terminal adopts a face detection technology to detect a first coordinate and a second coordinate of the same face detection point in different video frames. The first coordinates are coordinates of the face detection point in a first video frame, the second coordinates are coordinates of the face detection point in a second video frame, and the first video frame is a previous frame of the second video frame.

The coordinate change amount represents a change amount of the face detection point from the first coordinate to the second coordinate. Where the coordinates include an abscissa and an ordinate, the coordinate variation includes an abscissa variation and an ordinate variation.

102. And the terminal responds to the coordinate variation belonging to the reference threshold range, and determines a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the coordinate variation and the detection error.

Wherein, the coordinate variation belongs to the reference threshold range to indicate that the detected second coordinate is inaccurate and needs to be corrected; the detection error is used for representing the error between coordinates of the same face detection point in the face in two adjacent video frames under the condition that the face is in a static state; the first weight is used for representing the weight occupied by the first coordinate when the second coordinate is corrected, the second weight is used for representing the weight occupied by the second coordinate when the second coordinate is corrected, and the sum of the first weight and the second weight is 1.

103. And the terminal performs weighting processing on the first coordinate and the second coordinate based on the first weight and the second weight to obtain a third coordinate of the face detection point in the second video frame.

The terminal obtains a third coordinate by weighting the first coordinate and the second coordinate, wherein the third coordinate is the coordinate after correcting the second coordinate.

According to the method provided by the embodiment of the application, the coordinates of the same face detection point in the first video frame and the second video frame are obtained, the first coordinates and the second coordinates are subjected to smoothing processing according to the obtained weights, so that the correction of the second coordinates is realized, the corrected coordinates can more accurately represent the positions of the face detection point in the second video frame compared with the second coordinates, the coordinate accuracy of the face detection point is improved, and the inaccuracy of the coordinates caused by jitter of the face detection point is avoided.

Fig. 2 is a flowchart of a face detection point processing method according to an embodiment of the present application. The execution main body of the embodiment of the application is a terminal. Referring to fig. 2, the method includes the steps of:

201. and the terminal respectively carries out face detection on the first video frame and the second video frame to obtain a first coordinate of a face detection point in the first video frame and a second coordinate of the face detection point in the second video frame.

The first video frame is the last frame of the second video frame, and the first video frame and the second video frame are two adjacent video frames in any video. Any video is a live video, or a video currently being shot, or a video uploaded by other devices acquired by a terminal from a server, or other types of video, and the embodiment of the application does not limit the types of video.

The first video frame and the second video frame comprise the same face, the terminal carries out face detection on the first video frame aiming at any face detection point in the face to obtain a first coordinate of the face detection point in the first video frame, and then carries out face detection on the second video frame to obtain a second coordinate of the face detection point in the second video frame. Wherein, any face detection point is a canthus detection point, a forehead detection point, a chin detection point, a nose tip detection point or other detection points.

In one possible implementation, when the coordinates detected by the terminal are two-dimensional coordinates, the first coordinates and the second coordinates include the first coordinates and the second coordinates, respectively.

In addition, the face detection mode is not limited in the embodiment of the present application, and for example, a face detection mode based on face features, a face detection mode based on a neural network, or other detection modes may be used.

202. The terminal determines a coordinate variation amount based on the first coordinate and the second coordinate.

The coordinate variation is used for representing the variation of the face detection point from the first video frame to the second video frame, and the coordinate variation is not smaller than 0.

In one possible implementation manner, the terminal obtains an absolute value of a difference value between the first coordinate and the second coordinate, and takes the absolute value as a coordinate variation, so that the determined coordinate variation is not less than 0.

In the case where the first coordinate includes a first abscissa and a first ordinate, and the second coordinate includes a second abscissa and a second ordinate, the coordinate variation amount includes an abscissa variation amount and an ordinate variation amount. The terminal determines the absolute value of the difference value between the first abscissa and the second abscissa as the abscissa variation; the absolute value of the difference between the first ordinate and the second ordinate is determined as the ordinate variation. The horizontal coordinate variation is used for representing the horizontal moving distance of the face detection point, and the vertical coordinate variation is used for representing the vertical moving distance of the face detection point.

For example, the first coordinate is (x ₁,y₁), the second coordinate is (x ₂,y₂), and the abscissa variation is: Δx= |x ₁-x₂ |, the ordinate change is: Δy= |y ₁-y₂ |.

In one possible implementation, since the horizontal length and the vertical length of the video frame are different, for example, the horizontal length is 256 and the vertical length is 512, the effect of horizontally moving one coordinate and vertically moving one coordinate is different, and the effect of horizontally moving the coordinates is more obvious. Therefore, in order to ensure that the same manner can be adopted for the abscissa and the ordinate, so that the processed abscissa and the processed ordinate show the same movement effect, the terminal acquires the size of the first video frame or the second video frame, wherein the size comprises a horizontal length and a vertical length, acquires a first difference value between the first abscissa and the second abscissa and a second difference value between the first ordinate and the second ordinate, adjusts the first difference value and the second difference value, takes the adjusted first difference value as an abscissa variation amount, and takes the adjusted second difference value as an ordinate variation amount, so that the ratio between the abscissa variation amount and the ordinate variation amount is consistent with the ratio between the horizontal length and the vertical length. Even if the amount of change in the abscissa increases in proportion to the ratio between the horizontal length and the vertical length, or the amount of change in the ordinate increases in proportion to the ratio between the horizontal length and the vertical length.

For a dimension-based difference approach, in one possible implementation, the terminal determines the product of the first difference and the horizontal length as an abscissa variation; the product of the second difference and the vertical length is determined as the ordinate variation. For example, if the horizontal length is w and the vertical length is h, the abscissa variation is: Δx= |x ₁-x₂ |w, the ordinate change is: Δy= |y ₁-y₂ |h.

In another possible implementation manner, the terminal obtains a ratio between the horizontal length and the vertical length, and adjusts the first difference value and the second difference value based on the ratio, so as to obtain an abscissa variation and an ordinate variation. For example, if the ratio between the horizontal length and the vertical length is 1:2, determining the product of the first difference value and 1 as the abscissa variation; the product of the second difference and 2 is determined as the ordinate variation.

After determining the coordinate variation, the terminal determines whether the coordinate variation belongs to the reference threshold range, if so, executes 203, and if not, the detected second coordinate is accurate without correcting the second coordinate.

The minimum value in the reference threshold range represents a change value when the face detection point in two adjacent video frames is in a static state, the minimum value is 0, 0.01 or other smaller values, and the maximum value in the reference threshold range represents the maximum value of the moving distance of the face detection point in two adjacent video frames, namely the moving distance of the face detection point in two adjacent video frames when the face detection is not invalid. For example, in two adjacent video frames, the fastest moving distance of the face is 100, and the maximum value is 100.

In one possible implementation, since the coordinate variation includes an abscissa variation and an ordinate variation, the reference threshold range includes an abscissa threshold range and an ordinate threshold range, and the reference threshold range means that the abscissa variation belongs to at least one of the abscissa threshold range or the ordinate variation belongs to the ordinate threshold range. The abscissa threshold range and the ordinate threshold range may be the same or different, which is not limited in the embodiment of the present application.

In addition, the reference threshold ranges corresponding to the videos with different sizes may be the same or different, which is not limited in the embodiment of the present application. If the reference threshold ranges corresponding to the videos with different sizes are different, after the terminal obtains the coordinate variation, the reference threshold range corresponding to the size of the video frame is determined according to the corresponding relation between the size and the reference threshold range, and whether the coordinate variation belongs to the reference threshold range is determined.

203. And the terminal responds that the coordinate variation belongs to the reference threshold range, and normalizes the coordinate variation to obtain the processed coordinate variation.

The sizes of the video frames displayed are different due to the sizes of the video, or the sizes of the video frames displayed are different due to different display devices, so that the coordinate variation is normalized to perform subsequent smoothing processing on the processed coordinate variation in order to enable the video frames with different sizes to be subjected to smoothing processing in the same way.

In one possible implementation manner, the terminal performs normalization processing on the coordinate variation according to the reference threshold range, so that the value range of the processed coordinate variation is [0,1].

For example, the terminal performs normalization processing on the coordinate variation by using a normalization function f (delta, min, max), and obtains the processed coordinate variation. The maximum value max and the minimum value min in the normalization function are related to the reference threshold range, the minimum value is the minimum value in the reference threshold range, and the maximum value is the maximum value in the reference threshold range.

In one possible implementation, the reference threshold range includes an abscissa threshold range and an ordinate threshold range, and the coordinate variation includes an abscissa variation and an ordinate variation. In this case, the abscissa and ordinate amounts of change determine whether correction is required according to the respective corresponding threshold ranges, that is, the terminal normalizes the abscissa amount of change in response to the abscissa amount of change belonging to the abscissa threshold range, to obtain the processed abscissa amount of change; or the terminal responds that the ordinate variation belongs to the ordinate threshold value range, and performs normalization processing on the ordinate variation to obtain the processed ordinate variation.

For example, the normalization function is f ₁(x,min₁,max₁ for the abscissa) and f ₂(y,min₂,max₂ for the ordinate). Wherein, minimum value min ₁ is the minimum value in the abscissa threshold range, maximum value max ₁ is the maximum value in the abscissa threshold range, minimum value min ₂ is the minimum value in the ordinate threshold range, and maximum value max ₂ is the maximum value in the ordinate threshold range.

In one possible implementation, if only the abscissa variation belongs to the abscissa threshold range, the abscissa variation is normalized, and then 204 is performed, and the ordinate variation is not processed; or only if the ordinate variation belongs to the ordinate threshold value range, the ordinate variation is normalized, and then the process 204 is executed, and the abscissa variation is not normalized.

204. And the terminal determines a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the processed coordinate variation and the detection error.

The detection error is used for representing the error between coordinates of the same face detection point in the face in two adjacent video frames under the condition that the face is in a static state. For example, the face in two adjacent video frames is in a static state, a face detection technology is adopted to detect the coordinates (x ₁,y₁) of the eye angle detection point in the first video frame and the coordinates (x ₂,y₂) in the second video frame, the difference between the two coordinates is divided by the time length of the interval between the two video frames, and the obtained value is the detection error, that is, the rate of the change from the previous video frame to the next video frame is adopted to represent the error under the condition that the face is in the static state.

The first weight is used for representing the weight occupied by the first coordinate when the second coordinate is corrected, the second weight is used for representing the weight occupied by the second coordinate when the second coordinate is corrected, and the sum of the first weight and the second weight is 1. For example, the first weight is 0.1 and the second weight is 0.9.

In one possible implementation manner, the terminal determines a first abscissa weight corresponding to the first coordinate and a second abscissa weight corresponding to the second coordinate based on the abscissa variation and the detection error; and determining a first ordinate weight corresponding to the first coordinate and a second ordinate weight corresponding to the second coordinate based on the ordinate variation and the detection error. Wherein the sum of the first abscissa weight and the second abscissa weight is 1, and the sum of the first ordinate weight and the second ordinate weight is 1.

In one possible implementation, for an abscissa weight, the terminal determines a sum of a product of an abscissa variation and a detection parameter and a detection error as a first abscissa weight, and determines a difference of 1 and the first abscissa weight as a second abscissa weight. For the ordinate weight, determining the sum of the product of the ordinate variation and the detection parameter and the detection error as a first ordinate weight; the difference of 1 and the first ordinate weight is determined as the second ordinate weight. Wherein the detection parameter is the difference between 1 and the detection error.

For example, the first abscissa weight and the first ordinate weight are determined using the following formulas:

px＝f(ΔX，min₁,max₁)*(1-α)+α；

py＝f(ΔY，min₂,max₂)*(1-α)+α；

Wherein px is a first abscissa weight, py is a first ordinate weight, α is a detection error, and 1- α is a detection parameter.

205. And the terminal performs weighting processing on the first coordinate and the second coordinate based on the first weight and the second weight to obtain a third coordinate of the face detection point in the second video frame.

And the terminal performs weighted summation on the first coordinate and the second coordinate based on the first weight and the second weight to obtain a third coordinate. The third coordinate is the coordinate after the second coordinate is corrected.

In one possible implementation, the third coordinate is obtained by adjusting the abscissa and the ordinate, respectively. The terminal carries out weighting processing on the first abscissa in the first coordinates and the second abscissa in the second coordinates based on the first abscissa weight and the second abscissa weight to obtain a third abscissa; and weighting the first ordinate of the first coordinates and the second ordinate of the second coordinates based on the first ordinate weight and the second ordinate weight to obtain a third ordinate.

For example, the first coordinate is (x ₁,y₁) and the second coordinate is (x ₂,y₂) and the third abscissa and third ordinate are determined using the following formula:

x₃＝x₂*px+x₁*(1-px)；

y₃＝y₂*py+y₁*(1-py)；

where x ₃ is the third abscissa, y ₃ is the third ordinate, px is the first abscissa weight, and py is the first ordinate weight.

For example, referring to the schematic diagram shown in fig. 3, the first video frame 301 is changed into the second video frame 302, the first coordinates (solid dots) of the corner of eye detection point in the first video frame 301 are at the corner of eye, and the second coordinates (solid dots) of the corner of eye detection point in the second video frame 302 are below the corner of eye, and the detected second coordinates have errors, so the second coordinates are modified to obtain modified third coordinates (open dots) by adopting the above embodiment.

In one possible implementation, in the case where either the abscissa or the ordinate variation does not belong to the threshold range, only the coordinates corresponding to the coordinate variation that does not belong to the threshold range are adjusted, and the other coordinates are not required to be adjusted. For example, in the case where the abscissa variation amount does not belong to the abscissa threshold range, the terminal performs weighting processing on the first abscissa of the first coordinates and the second abscissa of the second coordinates based on the first abscissa weight and the second abscissa weight, to obtain a third abscissa, and the third ordinate is the same as the second ordinate.

It should be noted that, in the above embodiment, only one face detection point is taken as an example for illustration, and in another embodiment, the plurality of detection points in the face may all be smoothed in the above manner, so as to obtain more accurate coordinates corresponding to the face detection points.

Another point to be described is that the above embodiment is described by taking the execution subject as a terminal as an example, and in another embodiment, the execution subject is a server, or a terminal and a server. In one possible implementation manner, under the condition of interactive execution by the terminal and the server, the terminal performs face detection on the first video frame and the second video frame to obtain a first coordinate and a second coordinate, sends the first coordinate and the second coordinate to the server, obtains a coordinate variation based on the first coordinate and the second coordinate, determines a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the coordinate variation and a detection error in response to the coordinate variation belonging to a reference threshold range, performs weighting processing on the first coordinate and the second coordinate based on the first weight and the second weight to obtain a third coordinate, and sends the third coordinate to the terminal.

Before the coordinates of the face detection points are subjected to smoothing processing, the coordinate variation is normalized, so that the coordinate variation corresponding to the video frames with different sizes can be unified, and the coordinates corresponding to the video frames with different sizes can be corrected by adopting the same smoothing processing mode.

For the embodiment shown in fig. 2, the detailed procedure of applying the face detection point processing method to the scene of adding the virtual element is as follows:

401. The terminal determines at least one face detection point corresponding to the virtual element based on the virtual element to be added.

For example, if the virtual element to be added is glasses, the face detection points corresponding to the glasses are two inner corner detection points and two outer corner detection points.

402. The terminal performs face detection on the first video frame, determines the first coordinate of each face detection point, and adds virtual elements into the first video frame based on the first coordinate of each face detection point.

For example, according to each detected corner detection point, the addition position of the glasses is determined, and the glasses are added at the positions corresponding to the eyes in the face.

403. And the terminal performs face detection on the second video frame and determines the second coordinate of the at least one face detection point.

404. The terminal obtains the coordinate variation of each face detection point based on the first coordinate and the second coordinate of at least one face detection point.

405. The terminal responds to each face detection point that the coordinate variation of the face detection point belongs to a reference threshold range, and determines a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the coordinate variation and the detection error.

406. And the terminal performs weighting processing on the first coordinate and the second coordinate based on the first weight and the second weight aiming at each face detection point to obtain a third coordinate.

407. And the terminal adds virtual elements in the second video frame according to the third coordinates of each face detection point.

If the glasses are directly added according to the second coordinates after the second coordinates of each face detection point are determined, the addition positions of the glasses are not completely consistent with the positions corresponding to the eyes due to the inaccuracy of the second coordinates, the display effect of the glasses is poor, and the glasses can be added to the positions corresponding to the eyes according to the corrected third coordinates, so that the display effect is improved.

When the virtual element is added, the coordinate is corrected, so that the following degree of the virtual element to the human face can be improved.

In the above embodiment, the implementation of 404-406 is the same as that in the embodiment shown in fig. 2 and will not be described again here.

Of course, the face detection point processing method provided by the embodiment of the application can also be applied to other scenes, for example, face detection scenes and the like.

Fig. 5 is a schematic structural diagram of a face detection point processing device according to an embodiment of the present application. Referring to fig. 5, the apparatus includes:

The variable quantity obtaining module 501 is configured to obtain a coordinate variable quantity based on a first coordinate and a second coordinate of the same face detection point, where the first coordinate is a coordinate of the face detection point in a first video frame, and the second coordinate is a coordinate of the face detection point in a second video frame, and the first video frame is a previous video frame of the second video frame;

The weight determining module 502 is configured to determine, based on the coordinate variation and a detection error, a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate in response to the coordinate variation belonging to the reference threshold range, where the detection error is used to represent an error between coordinates of the same face detection point in the face in two adjacent video frames when the face is in a static state;

the coordinate determining module 503 is configured to perform a weighting process on the first coordinate and the second coordinate based on the first weight and the second weight, so as to obtain a third coordinate of the face detection point in the second video frame.

According to the device provided by the embodiment of the application, the coordinates of the same face detection point in the first video frame and the second video frame are obtained, the first coordinates and the second coordinates are subjected to smoothing processing according to the obtained weights, so that the correction of the second coordinates is realized, the corrected coordinates can more accurately represent the positions of the face detection point in the second video frame compared with the second coordinates, the coordinate accuracy of the face detection point is improved, and the inaccuracy of the coordinates caused by jitter of the face detection point is avoided.

In one possible implementation, the first coordinate includes a first abscissa and a first ordinate, the second coordinate includes a second abscissa and a second ordinate, the coordinate variation includes an abscissa variation and an ordinate variation, and referring to fig. 6, the variation obtaining module 501 includes:

a first acquisition unit 511 for determining an absolute value of a difference between the first abscissa and the second abscissa as an abscissa variation amount;

a second acquisition unit 521 for determining an absolute value of a difference between the first ordinate and the second ordinate as an ordinate variation.

In one possible implementation, referring to fig. 6, the apparatus further includes:

a size obtaining module 504, configured to obtain a size of the first video frame or the second video frame, where the size includes a horizontal length and a vertical length;

The first coordinate includes a first abscissa and a first ordinate, the second coordinate includes a second abscissa and a second ordinate, and the variation obtaining module 501 includes:

a difference value obtaining unit 531 for obtaining a first difference value between the first abscissa and the second abscissa, and a second difference value between the first ordinate and the second ordinate;

And a difference adjustment unit 541 configured to adjust the first difference and the second difference, and take the adjusted first difference as an abscissa variation and the adjusted second difference as an ordinate variation, so that a ratio between the abscissa variation and the ordinate variation matches a ratio between the horizontal length and the vertical length.

In one possible implementation, referring to fig. 6, the difference adjustment unit 541 is configured to:

Determining the product of the first difference value and the horizontal length as an abscissa variation;

the product of the second difference and the vertical length is determined as the ordinate variation.

In one possible implementation, referring to fig. 6, a variation obtaining module 501 is configured to obtain an absolute value of a difference between the first coordinate and the second coordinate, and take the absolute value as a coordinate variation;

The weight determination module 502 includes:

A normalizing unit 512, configured to normalize the coordinate variation in response to the coordinate variation belonging to the reference threshold range, to obtain a processed coordinate variation;

the weight determining unit 522 is configured to determine the first weight and the second weight based on the processed coordinate variation and the detection error.

In one possible implementation, the coordinate variation includes an abscissa variation and an ordinate variation, the reference threshold range includes an abscissa threshold range and an ordinate threshold range, and referring to fig. 6, the normalization unit 512 is configured to:

responding to the abscissa variation belonging to the abscissa threshold range, normalizing the abscissa variation to obtain the processed abscissa variation; or alternatively

And responding to the fact that the ordinate variation belongs to the ordinate threshold value range, normalizing the ordinate variation to obtain the processed ordinate variation.

In one possible implementation, the coordinate variation includes an abscissa variation and an ordinate variation, referring to fig. 6, the weight determining module 502 includes:

A weight determining unit 522, configured to determine, based on the abscissa variation and the detection error, a first abscissa weight corresponding to the first coordinate and a second abscissa weight corresponding to the second coordinate, where a sum of the first abscissa weight and the second abscissa weight is 1;

The weight determining unit 522 is further configured to determine, based on the ordinate variation and the detection error, a first ordinate weight corresponding to the first coordinate and a second ordinate weight corresponding to the second coordinate, where a sum of the first ordinate weight and the second ordinate weight is 1.

In one possible implementation, referring to fig. 6, the weight determining unit 522 is configured to:

determining the sum of the product of the abscissa variation and the detection parameter and the detection error as a first abscissa weight, wherein the detection parameter is the difference value between 1 and the detection error;

determining the difference value between the 1 and the first abscissa weight as a second abscissa weight;

The weight determining unit 522 is further configured to:

determining the sum of the product of the ordinate variation and the detection parameter and the detection error as a first ordinate weight;

The difference of 1 and the first ordinate weight is determined as the second ordinate weight.

In one possible implementation, the third coordinates include a third abscissa and a third ordinate, and the coordinate determining module 503 is configured to:

Weighting the first abscissa of the first coordinates and the second abscissa of the second coordinates based on the first abscissa weight and the second abscissa weight to obtain a third abscissa;

And weighting the first ordinate of the first coordinates and the second ordinate of the second coordinates based on the first ordinate weight and the second ordinate weight to obtain a third ordinate.

The coordinate detection module 505 is configured to perform face detection on the first video frame and the second video frame, to obtain a first coordinate and a second coordinate.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.

It should be noted that: in the face detection point processing device provided in the above embodiment, only the division of the above functional modules is used for illustration when processing the face detection point, in practical application, the above functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the face detection point processing device and the face detection point processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments, which are not repeated herein.

The embodiment of the application also provides a computer device, which comprises a processor and a memory, wherein at least one computer program is stored in the memory, and the at least one computer program is loaded and executed by the processor to realize the operations executed in the face detection point processing method of the embodiment.

Optionally, the computer device is provided as a terminal. Fig. 7 is a schematic structural diagram of a terminal 700 according to an embodiment of the present application. The terminal 700 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

The terminal 700 includes: a processor 701 and a memory 702.

Processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 701 may also include a main processor and a coprocessor, wherein the main processor is a processor for processing data in an awake state, and is also called a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 701 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 702 is used to store at least one computer program for execution by the processor 701 to implement the face detection point processing method provided by the method embodiment of the present application.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, a display 705, a camera assembly 706, audio circuitry 707, a positioning assembly 708, and a power supply 709.

A peripheral interface 703 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 701 and memory 702. In some embodiments, the processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 704 is configured to receive and transmit RF (Radio Frequency) signals, also referred to as electromagnetic signals. The radio frequency circuitry 704 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 704 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 704 may further include NFC (NEAR FIELD Communication) related circuits, which is not limited by the present application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 705 is a touch display, the display 705 also has the ability to collect touch signals at or above the surface of the display 705. The touch signal may be input to the processor 701 as a control signal for processing. At this time, the display 705 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 705 may be one and disposed on the front panel of the terminal 700; in other embodiments, the display 705 may be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in other embodiments, the display 705 may be a flexible display disposed on a curved surface or a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 705 may be made of LCD (Liquid CRYSTAL DISPLAY), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 706 is used to capture images or video. Optionally, the camera assembly 706 includes a front camera and a rear camera. The front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing, or inputting the electric signals to the radio frequency circuit 704 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 707 may also include a headphone jack.

The location component 708 is operative to locate the current geographic location of the terminal 700 for navigation or LBS (Location Based Service, location-based services). The positioning component 708 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati positioning system of Russia, or the Galileo positioning system of the European Union.

A power supply 709 is used to power the various components in the terminal 700. The power supply 709 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 709 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 700 further includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyroscope sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display the user interface in a horizontal view or a vertical view according to the gravitational acceleration signal acquired by the acceleration sensor 711. The acceleration sensor 711 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may collect a 3D motion of the user to the terminal 700 in cooperation with the acceleration sensor 711. The processor 701 may implement the following functions based on the data collected by the gyro sensor 712: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 713 may be disposed at a side frame of the terminal 700 and/or at a lower layer of the display screen 705. When the pressure sensor 713 is disposed at a side frame of the terminal 700, a grip signal of the user to the terminal 700 may be detected, and the processor 701 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at the lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 714 is used to collect a fingerprint of the user, and the processor 701 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 714 may be provided on the front, back, or side of the terminal 700. When a physical key or vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical key or vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 705 is turned up; when the ambient light intensity is low, the display brightness of the display screen 705 is turned down. In another embodiment, the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 706 based on the ambient light intensity collected by the optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is provided on the front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front face of the terminal 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the off screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually increases, the processor 701 controls the display screen 705 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the terminal 700 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Optionally, the computer device is provided as a server. Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPUs) 801 and one or more memories 802, where at least one computer program is stored in the memories 802, and the at least one computer program is loaded and executed by the processors 801 to implement the methods provided in the above-described method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

The embodiment of the application also provides a computer readable storage medium, in which at least one computer program is stored, and the at least one computer program is loaded and executed by a processor, so as to implement the operations performed in the face detection point processing method of the above embodiment.

Embodiments of the present application also provide a computer program product or computer program comprising computer program code stored in a computer readable storage medium. The computer program code is loaded and executed by a processor to realize the operations performed in the face detection point processing method of the above embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the above storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing is merely an alternative embodiment of the present application and is not intended to limit the embodiment of the present application, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the embodiment of the present application should be included in the protection scope of the present application.

Claims

1. A face detection point processing method, the method comprising:

Determining a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the coordinate variation and a detection error in response to the coordinate variation belonging to a reference threshold range, wherein the detection error is used for representing an error between coordinates of the same face detection point in two adjacent video frames in the case that a face is in a static state, and the detection error is a value obtained by dividing a difference value between the first coordinate and the second coordinate by a time length of an interval between the first video frame and the second video frame;

2. The method of claim 1, wherein the first coordinate comprises a first abscissa and a first ordinate, the second coordinate comprises a second abscissa and a second ordinate, the coordinate variation comprises an abscissa variation and an ordinate variation, and the obtaining the coordinate variation based on the first coordinate and the second coordinate of the same face detection point comprises:

3. The method according to claim 1, wherein before the acquiring the coordinate variation amount based on the first coordinate and the second coordinate of the same face detection point, the method further comprises:

and adjusting the first difference value and the second difference value, taking the adjusted first difference value as an abscissa variation, and taking the adjusted second difference value as an ordinate variation, so that the ratio between the abscissa variation and the ordinate variation is consistent with the ratio between the horizontal length and the vertical length.

4. A method according to claim 3, wherein said adjusting the first difference and the second difference to take the adjusted first difference as the abscissa change and the adjusted second difference as the ordinate change so that the ratio between the abscissa change and the ordinate change coincides with the ratio between the horizontal length and the vertical length comprises:

5. The method according to claim 1, wherein the acquiring the coordinate variation amount based on the first coordinate and the second coordinate of the same face detection point includes:

6. The method of claim 5, wherein the coordinate variation includes an abscissa variation and an ordinate variation, the reference threshold range includes an abscissa threshold range and an ordinate threshold range, and the normalizing the coordinate variation in response to the coordinate variation belonging to the reference threshold range includes:

7. The method of claim 1, wherein the amount of change in coordinates includes an amount of change in abscissa and an amount of change in ordinate, wherein the determining a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the amount of change in coordinates and a detection error comprises:

8. The method of claim 7, wherein the determining a first abscissa weight corresponding to the first coordinate and a second abscissa weight corresponding to the second coordinate based on the abscissa variation and the detection error comprises:

9. The method of claim 8, wherein the third coordinates include a third abscissa and a third ordinate, wherein the weighting the first coordinates and the second coordinates based on the first weights and the second weights to obtain the third coordinates of the face detection point in the second video frame includes:

10. The method according to any one of claims 1 to 9, wherein before the acquiring the coordinate variation amount based on the first coordinate and the second coordinate of the same face detection point, the method further comprises:

11. A face detection point processing apparatus, the apparatus comprising:

The weight determining module is used for determining a first weight corresponding to the first coordinate and a second weight corresponding to the second coordinate based on the coordinate variation and detection errors in response to the coordinate variation belonging to a reference threshold range, wherein the detection errors are used for representing errors between coordinates of the same face detection point in two adjacent video frames in the case that the face is in a static state, and the detection errors are values obtained by dividing a difference value between the first coordinate and the second coordinate by a time length of an interval between the first video frame and the second video frame;

12. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one program code that is loaded and executed by the processor to implement the operations performed in the face detection point processing method of any of claims 1 to 10.

13. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the operations performed in the face detection point processing method of any one of claims 1 to 10.