CN114723932A

CN114723932A - Sight direction correction method

Info

Publication number: CN114723932A
Application number: CN202110011861.XA
Authority: CN
Inventors: 黄怡瑄; 黄文聪
Original assignee: Realtek Semiconductor Corp
Current assignee: Realtek Semiconductor Corp
Priority date: 2021-01-06
Filing date: 2021-01-06
Publication date: 2022-07-08

Abstract

The invention provides a sight line direction correction method, which comprises the following steps: obtaining a first image; obtaining a first sight line characteristic according to the first image; judging whether the first sight line characteristic falls in the sight line area or not; if the first sight line feature does not fall into the sight line area, obtaining a second image corresponding to the first image and an eye image corresponding to the second image according to the first image; synthesizing a temporary storage image according to the first image and the eye image; the temporary storage image is corrected to generate a face output image.

Description

Sight direction correction method

Technical Field

The present invention relates to a vision data processing method, and more particularly, to a gaze direction correction method.

Background

Nowadays, the number of users using the electronic device for video call is greatly increased, so that the users can watch the dynamic image of the other party during the video call, thereby generating the presence feeling like face-to-face communication, and the dynamic image of the other party can also feel the familiarity which cannot be felt by voice call. However, due to hardware limitations, a lens for capturing a video image on an electronic device is not usually disposed in a central area of a screen, if the lens is placed around the screen, a user usually looks at a dynamic image displayed in the screen directly when using a video call, and when the user does not look at the lens, the opposite party is not easy to feel the presence and the sense of intimacy, that is, both parties of the video call cannot visually communicate with the opposite party, and the good experience of the video call is reduced.

Disclosure of Invention

In some embodiments, a gaze direction correction method includes obtaining a first image; obtaining a first sight line characteristic according to the first image; judging whether the first sight line characteristic falls in the sight line area or not; if the first sight line feature does not fall into the sight line area, obtaining a second image corresponding to the first image and an eye image corresponding to the second image according to the first image; synthesizing a temporary storage image according to the first image and the eye image; the temporary storage image is corrected to generate a face output image.

In some embodiments, the step of modifying the facial output image comprises: calculating a pixel difference value corresponding to the same pixel position between the first image and the second image to generate a difference image; and synthesizing the second image, the difference image and the eye contour image of the first image to generate a face output image.

In some embodiments, the reference face image is acquired before the first image is acquired; calculating a pixel difference value corresponding to the same pixel position between the first image and the second image to generate a difference image; calculating the pixel sum according to the pixel data of each pixel of the difference image; comparing the pixel sum with a pixel threshold to generate a comparison result; synthesizing the second image, the difference image and the eye contour image of the first image to generate a synthesized face image; calculating a first weight parameter corresponding to the synthesized face image according to the comparison result; and generating a face output image according to the first weight parameter, the reference face image and the synthesized face image.

In some embodiments, the sum of the first weight parameter and the second weight parameter is 1, and the step of generating the face output image according to the first weight parameter, the reference face image and the synthesized face image comprises: multiplying the first weight parameter by the pixel data of the synthesized face image to generate a first pixel product; multiplying the second weight parameter by the pixel data of the reference face image to generate a second pixel product; the first pixel product and the second pixel product are added to generate pixel data of the face output image.

In some embodiments, the step of obtaining the first image comprises: obtaining a first sight line direction according to the first image; before the first sight line direction is judged, pixel data of the first image is matched with pixel data of a plurality of face images contained in the sight line correction model, and first sight line characteristics are corrected according to the matched first image; and matching the pixel data of the second image to the pixel data of the first image after correcting the first sight line characteristic.

In some embodiments, the step of obtaining the first image comprises: obtaining a first sight line direction according to the first image; before the first sight line direction is judged, the first sight line characteristic is matched with sight line characteristics of a plurality of human eye images contained in a sight line correction model, and the first sight line characteristic is corrected according to the matched first image; and matching the character features of the second image to the character features of the first image after correcting the first sight line features.

In some embodiments, the step of correcting the first gaze feature comprises: and correcting the first sight line characteristic into a second sight line characteristic according to the deep learning result, wherein the deep learning result corresponds to a plurality of learning data, and the learning data comprises a plurality of human eye images, the corrected sight line angle of each human eye image and each human eye image after the sight line angle is corrected.

In some embodiments, a plurality of facial features are determined from the first image; judging the head orientation direction according to the facial features; when the head facing direction is not facing the image capturing apparatus, the determination of whether the first line of sight direction is facing the position of the image capturing apparatus is not performed.

In some embodiments, determining whether the eye is in a blinking state according to the first sight line characteristic; when the image capture device is in the blinking state, the determination of whether the first sight line direction faces the position of the image capture device is not performed.

In some embodiments, a plurality of facial features are determined from the first image; judging whether the distance between the face and the image capturing equipment is smaller than a preset distance value or not according to the face features; when the distance is smaller than the preset distance value, the judgment of whether the first sight line direction faces the position of the image capture equipment is not executed.

The features, operation, and technical effects of the present invention will be described in detail with reference to the accompanying drawings and preferred embodiments.

Drawings

Fig. 1 is a block diagram illustrating an electronic device according to an embodiment of a gaze direction correction method of the present invention.

Fig. 2A is a flowchart of an embodiment of a gaze direction correction method according to the present invention.

FIG. 2B is a schematic view of a view area of an image capturing apparatus according to the present invention and a view of a target.

FIG. 2C is a diagram illustrating a situation where the line of sight of the target of the present invention does not fall within the line of sight area of the image capturing device.

FIG. 2D is a diagram illustrating selection of a second image by the gaze correction model according to the present invention.

FIG. 3 is a flow diagram of another embodiment of the flow diagram of FIG. 2A.

Fig. 4 is a schematic diagram of a difference image after the first eye image and the second eye image pass through image masking according to the invention.

FIG. 5 is a flow diagram of another embodiment of the flow diagram of FIG. 2A.

Fig. 6 is a schematic diagram illustrating the composition of the first image and the second image.

Description of the symbols

1: electronic device

11: image capturing device

12: processing circuit

2: electronic device with a detachable cover

3: sight correction model

S1: the first image

S2: second image

S3: face output image

S01-S07: flow of steps

S051 to S052: flow of steps

S061 to S064: flow of steps

T: target

R: eye image

Z: region of sight

r: radius of the pipe

R1: eye image

R2: eye image

410: image shielding

420: difference image

Detailed Description

Fig. 1 is a block diagram of an electronic device to which a gaze direction correction method according to an embodiment of the present invention is applied, and fig. 2A is a flowchart of a gaze direction correction method according to an embodiment of the present invention. Referring to fig. 1 and fig. 2A together, the electronic device 1 includes an image capturing apparatus 11 and a processing circuit 12, wherein the image capturing apparatus 11 is coupled to the processing circuit 12. When the target face is within the image capturing range of the image capturing apparatus 11, a first image S1 of the target is captured by the image capturing apparatus 11 (step S01), and the image capturing apparatus 11 sends the first image S1 to the processing circuit 12.

Referring to fig. 2B and 2C, the processing circuit 12 calculates a first sight line characteristic according to the eye image R1 of the first image S1 (step S02). The processing circuit 12 may perform facial feature recognition on the target T in the first image S1, and define an eye image R (as shown in fig. 2B) from the feature position in the human face. The processing circuit 12 determines whether the first sight-line feature falls in the sight-line region Z (shown in fig. 2C) (step S03). The sight line area Z may be the center of the image capturing device 11, or may be the surrounding space of the image capturing device 11. Fig. 2B shows the sight line feature of the target T falling in the sight line zone Z. Fig. 2C shows that the sight feature of the target T does not fall within the sight zone Z. The processing circuit 12 calculates a feature vector of the sight line, which is hereinafter referred to as sight line feature, based on the eyeball (which is a non-white portion such as pupil and iris) of the target T and the image capturing device 11.

In the case of video conferencing, the image capturing device 11 is usually mounted above, below or at other places. Therefore, the processing circuit 12 can set the space formed by the image capturing device 11 and the radius r as the sight line zone Z, please refer to the dashed line range at the image capturing device 11 in fig. 2B. Fig. 2B is a schematic view illustrating a sight line area Z and a sight line feature of a target T of the image capturing apparatus according to the present invention. The left side of fig. 2B shows a situation where the target T and the image capturing apparatus 11 are at a specific relative position (for illustration purposes only, and not limited thereto). Shown on the right side of fig. 2B is a digital image of the target T captured by the image capturing apparatus 11 under the aforementioned situation, wherein a gray dashed box in the digital image is represented as an eye image (which may correspond to a sight line feature). Since the target T is focused on the image capturing apparatus 11 in fig. 2B, the sight line of the target T is within the sight line region Z.

When the processing circuit 12 determines that the sight line direction does not fall within the sight line region Z (the determination result is "does not fall"), please refer to fig. 2C. Fig. 2C is a schematic diagram illustrating a situation in which the sight line of the target T does not fall in the sight line region Z of the image capturing apparatus 11 according to the present invention. Of course, the sight line of the target may be in the horizontal direction and not fall in the sight line region Z, and this is also regarded as a determination result of "not fall in". The processing circuit 12 inputs the first image S1 into the gaze correction model 3 (step S04), and obtains a second gaze feature of the target T and a second image S2.

The processing circuit 12 executes the gaze correction model 3. The gaze correction model 3 is the result of machine learning and training obtained from the facial image of the target T (the process will be described in detail later). The gaze correction model 3 may include data used to train the model and training results. In addition to the face image and the sight line feature of the recognition target T, the sight line correction model 3 also records the face images of different rotation angles of the sight line feature of the recognition target T in the sight line region Z. Fig. 2D illustrates a second image S2 selected according to the first image S1 by the gaze correction model of some embodiments. The gaze correction model 3 may select at least one second image S2. Second image S2 includes eye image R2 of target T (e.g., the top image of the correctly focused image capture device 11 may be selected, which includes eye image R2 similar to the sight line feature shown in fig. 2B). The processing circuit 12 may calculate the second sight line feature according to the second image S2 and the eye image R2.

Then, the processing circuit 12 synthesizes the eye image R2 of the second image S2 with the first image S1 according to the second sight line feature, and generates a temporary image (step S05). In order to avoid the appearance of unnaturalness or discontinuity at the boundary of the eye image in the temporary image or the appearance of noise in the eye image that does not need to be corrected, the processing circuit 12 may modify the eye image of the temporary image according to the eye image R1 of the first image S1 (step S06) to generate a more natural face output image S3.

In some embodiments, when the electronic device 1 is in video communication with another electronic device 2 through a wired network or a wireless network, the electronic device 2 may receive the facial image of the target T (S1 or S3). If the target T is not looking at the image capturing device 11, the user of the electronic apparatus 2 can see the face output image S3 synthesized by the processing circuit 12. The direction of the line of sight of the target T in the face output image S3 is toward the image capturing device 11, which can improve the visual communication between the user of the electronic apparatus 2 and the user of the electronic apparatus 1 (i.e., the target T in this embodiment).

In some embodiments, the image capturing apparatus 11 may be an RGB image capturing apparatus to generate the first image S1, the second image S2 and the face output image S3 as RGB images, the electronic device 1 may be a notebook computer, a tablet computer or a mobile phone, and the image capturing apparatus 11 may be a camera of the notebook computer, the tablet computer or the mobile phone. The processing circuit 12 can determine the difference between the first image S1 and the second image S2 including the second sight line feature to obtain the pixel position of the corrected pixel. Since the image capturing apparatus 11 photographs the target T in a streaming (steam) manner. In particular an image capturing device 11 supporting high frame rate filming, for example a camera with 30 frames per second. When the target T is in a video call, the face images of the target T can be considered similar (i.e. the face positions are close) for two consecutive frames of digital images. Therefore, the processing circuit 12 can determine the position change and the rotation of the human face from the continuous images of a plurality of frames.

In some embodiments, the processing circuit 12 may perform a smoothing process on the temporary storage image of the single frame generated in step S05, please refer to fig. 3. For example, the processing circuit 12 subtracts the pixel data of the first image S1 and the second image S2 corresponding to the same pixel position to calculate the pixel with difference and the pixel difference value, so as to generate the difference image between the first image S1 and the second image S2 (step S051).

Processing circuit 12 generates an image mask (mask) to represent an eye contour image of the eye region. The image mask 410 may be a blur filter (blur filter), for example: low pass filtering or median value filtering. The processing circuit 12 obtains a difference image 420 of the eye after passing the eye image through the image mask 410. Please refer to fig. 4, which is a schematic diagram of a difference image after the first eye image and the second eye image pass through image masking according to the present invention. The image mask 410 in fig. 4 shows the eye area in white and the facial skin in the non-eye area in black. The difference image 420 is represented as white at the corrected pixels, and black at the uncorrected pixels. The gray-scale color represents the magnitude of the correction amplitude, the larger the amplitude is, the whiter is, and the opposite is, the blacker is. The eye images R1 and R2 are respectively passed through the image mask 410 to obtain the difference image 420.

The processing circuit 12 further synthesizes the second image S2, the difference image 420 and the eye contour image to generate a synthesized face image of the target T (step S052). In this embodiment, the synthesized face image reduces the problem of artifacts, discontinuities, or artifacts in the eye region that do not require correction compared to the second image S2. The processing circuit 12 may directly output the synthesized face image as the face output image S3.

In some embodiments, the smoothing process between two adjacent frames of images can be performed according to the difference image 420 and the synthesized face image obtained in steps S051 and S052 to generate the face output image S3, please refer to fig. 5. Because the images of two adjacent frames can cause the eyeball to jump in the continuous images if only one of the images is corrected. In steps S061 to S064, in the process of synthesizing the face output image S3, the processing circuit 12 may execute an image smoothing process and output the face output image S3 to the electronic apparatus 2.

In the image smoothing processing procedureIn this embodiment, the processing circuit 12 may sum up the pixel values of the pixel positions of the difference image 420 to calculate a pixel sum Σ diff (step S061), the processing circuit 12 then compares the pixel sum Σ diff with the pixel threshold TH to generate a comparison result (step S062), the processing circuit 12 may subtract the sum Σ diff with the pixel threshold TH to generate a difference as the comparison result, and the processing circuit 12 then calculates the weighting parameter α corresponding to the synthesized face image (e.g., according to equation 1.1) according to the comparison result^t(step S063), and the weight parameter α^tIs less than 1.

The image capturing apparatus 11 may capture other facial images of the target T at other time points before the image capturing apparatus 11 captures the first image S1. For example, the image capturing device 11 may acquire the other face image of the target T at an earlier first time point and acquire the first image S1 at a later second time point, and the other face image corresponding to the first time point may be used as a reference face image for the processing circuit 12 to execute the image smoothing process, so that the processing circuit 12 may execute the image smoothing process according to the weight parameter α^tThe reference face image and the synthesized face image (e.g., according to equation 1.2) generate a face output image S3 (step S064). The weight parameter α will be described below^tIs a first weight parameter, (1-alpha)^t) A second weight parameter.

In step S064, the processing circuit 12 may multiply the pixel value of each pixel position of the synthesized face image by the first weight parameter α^tTo produce a first pixel product. Processing circuit 12 subtracts the first weight parameter α from 1^tObtaining a second weight parameter (1-alpha)^t) The pixel values of the pixel positions of the reference face image are multiplied by each other to generate a second pixel product, the processing circuit 12 sums the first pixel product and the second pixel product corresponding to each pixel position to generate a pixel value of each pixel position of the smoothed composite face image, and the processing circuit 12 outputs the smoothed composite face image as a face output image S3, as shown in fig. 6. In fig. 6, a block surrounded by a gray dotted line is used as noise representing the second image S2 and the eye image thereof.

Wherein alpha is^t: the proportion parameter of the current frame; alpha is alpha^t-1: the scale parameter of the previous frame; TH: judging a threshold value of the sum of the difference images;

the pixel values of the current frame at the x and y coordinates,

pixel values at x, y coordinates of the last frame. As can be seen from the equation 1.1, the larger the pixel sum Σ diff is, the larger the first weighting parameter α calculated by the processing circuit 12 is^tThe larger, i.e., smoothed, face output image S3 contains more pixel data of the synthesized face image and less pixel data of the reference face image. Conversely, if the total sum Σ diff of pixels is smaller, the first weight parameter α calculated by the processing circuit 12 is smaller^tThe smaller, i.e., smoothed, face output image S3 includes less pixel data of the synthesized face image and more pixel data of the reference face image. Accordingly, the processing circuit 12 can prevent the face output image S3 from having afterimages by referring to the synthesized face image and the reference face image.

In some embodiments, in the image smoothing process, the processing circuit 12 may also determine whether the eyeball or the face of the target T is in a rotation state according to the pixel sum Σ diff. The processing circuit 12 may determine whether the pixel sum Σ diff is larger than a threshold value that can distinguish whether the eyeball is in the rotation state and whether the face is in the rotation state. When the sum Σ diff is greater than the threshold, the processing circuit 12 may determine not to perform the image smoothing process, and when the sum Σ diff is less than the threshold, the processing circuit 12 starts to determine to perform the image smoothing process.

In some embodiments, processing circuitry 12 may perform a Histogram of gradient distribution (HOG) algorithm to locate faces and eyes in first image S1 to locate sets of gaze features of target T in first image S1. Furthermore, the plurality of first images S1 of the target T captured by the image capturing apparatus 11 at different times may have different color intensities and background colors. Therefore, the processing circuit 12 may perform a smoothing process on the plurality of first images S1 by an Optical flow method (Optical flow) according to the plurality of first images S1 of the target T acquired by the image capturing apparatus 11 at different times to locate stable face positions and eye positions.

In some embodiments, the gaze direction correction model 3 may correct the gaze direction of the target T by the learning result of the deep learning. In detail, in the learning stage, the gaze correction model 3 receives two learning data including input learning data and output learning data, and the input learning data includes a human eye image and a gaze angle to be corrected. The image of the human eye may be a synthetic image or a real image of the human eye, and the viewing angle may be represented by a two-dimensional vector, i.e., the viewing angle may include an X-direction angle and a Y-direction angle. The output learning data includes the corrected human eye image, and the processing circuit 12 performs deep learning based on the learning data and generates a learning result. The gaze correction model 3 further generates a second image S2 in step S04 according to the learning result, wherein the second image S2 includes the corrected gaze direction. In some embodiments, the processing circuit 12 may perform deep learning training in step S04 with either warping of CNN model or encor, Decoder architecture or GAN-based model architecture.

In some embodiments, referring to fig. 2A, based on the human eye image used for the depth learning, the first image S1 and the human eye image used for performing the depth learning have the same image domain. For example, the first picture S1 is made to have consistent pixel data (e.g., contrast and chroma) and character feature information (e.g., skin color) with a human eye picture used for performing deep learning. The processing circuit 12 may match the sight line feature of the first image S1 with the sight line feature of the human eye image of the learning data and the learning result based on the gradient distribution histogram algorithm before determining the sight line direction in step S02, so that the contrast, the color saturation, and the above-mentioned human feature information of the first image S1 are matched with the sight line correction model 3. The processing circuit 12 then determines the gaze direction of the target T according to the matched first image S1 (step S02), and the processing circuit 12 may generate the second image S2 by deep learning using the gaze correction model 3 in step S02. Then, after the second video S2 is generated by correcting the direction of the line of sight of the first video S1, the processing circuit 12 matches the contrast, color saturation, and character feature information of the second video S2 with the first video S1 (step S04), thereby reducing the color deviation of the two frames of video caused by the line of sight correction model 3.

In some embodiments, if the head of the target T is not facing the image capturing device 11, the processing circuit 12 may determine the head facing angle of the target T according to a plurality of facial features of the first image S1, for example, a specific orientation defined by feature values of the face, such as the positions or relative distance changes of five sense organs, etc. The processing circuit 12 compares the head orientation angle with the angle range of the head orientation to the image capturing device 11, and if the processing circuit 12 determines that the head orientation angle has exceeded the angle range of the head orientation to the image capturing device 11, indicating that the head orientation direction of the target T is not facing the image capturing device 11, the processing circuit 12 outputs the first image S1 without performing the line of sight correction (step S07). Alternatively, the processing circuit 12 may determine whether the first image S1 includes the side face image of the target T according to the facial features of the first image S1. If the first image S1 includes a side face image of the target T, indicating that the head of the target T is not facing the image capturing apparatus 11, i.e., the head facing angle is out of the angle range facing the image capturing apparatus 11, the processing circuit 12 outputs the first image S1 without performing line of sight correction (step S07).

In some embodiments, if the target T is in the blinking state, the first image S1 acquired by the image capturing apparatus 11 does not include the direction of the eye. The processing circuit 12 determines whether the target T is in the blinking state according to the first image S1, and the processing circuit 12 may determine whether the target T is in the blinking state according to the feature points of the eye region. The processing circuit 12 determines the distance between the uppermost feature point and the lowermost feature point of the eye region (hereinafter referred to as a first distance), and the processing circuit 12 determines the distance between the leftmost feature point and the rightmost feature point (hereinafter referred to as a second distance), and calculates the ratio between the first distance and the second distance. The processing circuit 12 compares the ratio with a threshold value for distinguishing between the blink state and the non-blink state, and if the ratio is smaller than the threshold value (no), it indicates that the target T is in the blink state, and the processing circuit 12 outputs the first image S1 without performing the line-of-sight correction (step S07).

In some embodiments, when the target T is too close to or too far away from the image capturing apparatus 11, if the processing circuit 12 corrects the gaze direction of the target T to be toward the image capturing apparatus 11, the gaze direction of the face output image S3 output by the image capturing apparatus 11 will be caused to be less natural. Therefore, in step S04, the processing circuit 12 may determine the distance between the image capturing device 11 and the target T based on the position of the facial feature point of the first image S1, the processing circuit 12 then determines whether the distance between the target T and the image capturing device 11 is smaller than a preset distance value, when the distance is smaller than the preset distance value, the processing circuit 12 determines that the first sight direction of the first image S1 does not need to be corrected, and the processing circuit 12 outputs the first image S1 without performing correction on the first image S1 (step S07).

In some embodiments, Processing circuitry 12 may be implemented as a Central Processing Unit (CPU), a Microcontroller (MCU), or a Graphics Processing Unit (GPU).

In summary, according to an embodiment of the gaze direction calibration method of the present invention, when the lens of the electronic device is disposed around the screen but not in the central region of the screen, the electronic device can calibrate the collected face (mainly the eye) image of the user, so that the gaze direction of the user looks towards the lens of the electronic device, and the user can have visual communication experience between different electronic devices using video calls; moreover, the electronic device can perform additional image processing on the eye images in the video images, perform denoising on the eye images and perform smoothing processing according to the eye images acquired by the image capturing equipment at different time points, and can unify and optimize the sight line correction program through model training of a machine learning method so as to output video face images with better image quality.

Although the present application has been disclosed with reference to the specific embodiments, the embodiments are not intended to limit the present application, and those skilled in the art can make modifications or adjustments to the technical solution of the present application based on the explicit or implicit contents of the present application without departing from the spirit and scope of the present application, and all such changes may fall within the scope of the patent protection sought by the present application, in other words, the scope of the present application should be determined by the scope of the claims.

Claims

1. A gaze direction correction method, the method comprising:

obtaining a first image;

obtaining a first sight feature according to the first image;

judging whether the first sight line characteristic falls in a sight line area;

if the first sight line feature does not fall into the sight line area, obtaining a second image corresponding to the first image and an eye image corresponding to the second image according to the first image;

synthesizing a temporary storage image according to the first image and the eye image; and

and correcting the temporary storage image to generate a face output image.

2. The gaze direction correction method of claim 1, wherein the step of modifying the facial output image comprises:

calculating a pixel difference value corresponding to the same pixel position between the first image and the second image to generate a difference image; and

and synthesizing the second image, the difference image and the eye contour image of the first image to generate the face output image.

3. The gaze direction correction method of claim 1, wherein the step of modifying the facial output image further comprises:

acquiring a reference face image before the first image is acquired;

calculating a pixel difference value corresponding to the same pixel position between the first image and the second image to generate a difference image;

calculating the pixel sum according to the pixel data of each pixel of the difference image;

comparing the pixel sum to a pixel threshold to produce a comparison result;

synthesizing the second image, the difference image and the eye contour image of the first image to generate a synthesized face image;

calculating a first weight parameter corresponding to the synthesized face image according to the comparison result; and

generating the face output image according to the first weight parameter, the reference face image and the synthesized face image.

4. The gaze direction correction method of claim 3, wherein the sum of the first and second weight parameters is 1, and the step of generating the face output image based on the first weight parameter, the reference face image and the synthesized face image comprises:

multiplying the first weight parameter by pixel data of the composite face image to produce a first pixel product;

multiplying the second weight parameter by pixel data of the reference face image to produce a second pixel product; and

summing the first pixel product and the second pixel product to generate pixel data of the face output image.

5. The gaze direction correction method of claim 1, wherein the step of obtaining the first image further comprises:

obtaining a first sight line direction according to the first image;

before the first sight line direction is judged, matching pixel data of the first image with pixel data of a plurality of face images contained in a sight line correction model, and correcting the first sight line characteristic according to the matched first image; and

and matching the pixel data of the second image to the pixel data of the first image after correcting the first sight line characteristic.

6. The gaze direction correction method of claim 1, wherein the step of obtaining the first image further comprises:

obtaining a first sight line direction according to the first image;

before the first sight direction is judged, matching the first sight characteristic with sight characteristics of a plurality of human eye images contained in a sight correction model, and correcting the first sight characteristic according to the matched first image; and

and matching the character features of the second image to the character features of the first image after correcting the first sight line features.

7. The gaze direction correction method according to claim 5 or 6, characterized in that the step of correcting the first gaze feature comprises:

and correcting the first sight line characteristic into a second sight line characteristic according to a deep learning result, wherein the deep learning result corresponds to a plurality of learning data, and the learning data comprises a plurality of human eye images, corrected sight line angles of the human eye images and the human eye images after the sight line angles are corrected.

8. The gaze direction correction method according to claim 5 or 6, characterized in that the correction method further comprises:

judging a plurality of facial features according to the first image;

judging the head orientation direction according to the facial features; and

when the head facing direction is not facing the image capturing device, the determination of whether the first line of sight direction is facing the position of the image capturing device is not performed.

9. The gaze direction correction method of claim 5 or 6, further comprising:

judging whether the eye is in a blinking state or not according to the first sight line characteristic; and

when the eye blink state is achieved, the judgment of whether the first sight line direction faces to the position of the image capture equipment is not carried out.

10. The gaze direction correction method according to claim 5 or 6, characterized in that the correction method further comprises:

judging a plurality of facial features according to the first image;

judging whether the distance between the face and the image capturing equipment is smaller than a preset distance value or not according to the facial features; and

and when the distance is smaller than the preset distance value, judging whether the first sight line direction faces the position of the image capturing equipment is not executed.